Grouped Linear Regression with Covariance Analysis

Menu location: Analysis_Regression and Correlation_Grouped Linear_Covariance.

This function compares the slopes and separations of two or more simple linear regression lines.

The method involves examination of regression parameters for a group of xY pairs in relation to a common fitted function. This provides an analysis of variance that shows whether or not there is a significant difference between the slopes of the individual regression lines as a whole. StatsDirect then compares all of the slopes individually. The vertical distance between each regression line is then examined using analysis of covariance and the corrected means are given (Armitage and Berry, 1994).

Assumptions:

Y replicates are a random sample from a normal distribution
deviations from the regression line (residuals) follow a normal distribution
deviations from the regression line (residuals) have uniform variance

This is just one facet of analysis of covariance; there are additional and alternative methods. For further information, see Kleinbaum et al. (1998) and Armitage and Berry (1994). Analysis of covariance is best carried out as part of a broader regression modelling exercise by a Statistician.

Technical Validation

Slopes of several regression lines are compared by analysis of variance as follows (Armitage, 1994):

- where SS_common is the sum of squares due to the common slope of k regression lines, SS_between is the sum of squares due to differences between the slopes, SS_total is the total sum of squares and the residual sum of squares is the difference between SS_total and SS_common. Sxx_j is the sum of squares about the mean x observation in the jth group, SxY_j is the sum of products of the deviations of xY pairs from their means in the jth group and SYY_j is the sum of squares about the mean Y observation in the jth group.

Vertical separation of slopes of several regression lines is tested by analysis of covariance as follows (Armitage, 1994):

- where SS are corrected sums of squares within the groups, total and between the groups (subtract within from total). The constituent sums of products or squares are partitioned between groups, within groups and total as above.

Data preparation

If there are equal numbers of replicate Y observations or single Y observations for each x then you are best prepare and select your data using a group identifier variable. For example with three replicates you would prepare five columns of data: group identifier, x, y1, y2, and y3. Remember to choose the "Groups by identifier" option in this case.

If there are unequal numbers of replicate Y observations for each x then you must prepare the x data in separate columns by group, prepare the Y data in separate columns by group and observation (i.e. Y for group 1 observation 1… r rows long where r is the number of repeat observations). Remember to choose the "Groups by column" option in this case. This is done in the example below.

Example

From Armitage and Berry (1994).

Test workbook (Regression worksheet: Log Dose_Std, BD 1_Std, BD 2_Std, BD 3_Std, Log Dose_I, BD 1_I, BD 2_I, BD 3_I, Log Dose_F, BD 1_F, BD 2_F, BD 3_F).

Three different preparations of Vitamin D are tested for their effect on bones by feeding them to rats that have an induced lack of mineral in their bones. X-ray methods are used to test the re-mineralisation of bones in response to the Vitamin D.

For the standard preparation:

Log dose of Vit D
0.544	0.845	1.146
Bone density score
0	1.5	2
0	2.5	2.5
1	5	5
2.75	6	4
2.75	4.25	5
1.75	2.75	4
2.75	1.5	2.5
2.25	3	3.5
2.25		3
2.5		2
		3
		4
		4

For alternative preparation I:

Log dose of Vit D
0.398	0.699	1.000	1.301	1.602
Bone density score
0	1	1.5	3	3.5
1	1.5	1	3	3.5
0	1.5	2	5.5	4.5
0	1	3.5	2.5	3.5
0	1	2	1	3.5
0.5	0.5	0	2	3

For alternative preparation F:

Log dose of Vit D
0.398	0.699	1.000
Bone density score
2.75	2.5	3.75
2	2.75	5.25
1.25	2.25	6
2	2.25	5.5
0	3.75	2.25
0.5		3.5

To analyse these data in StatsDirect you must first enter them into 14 columns in the workbook appropriately labelled. The first column is just three rows long and contains the three log doses of vitamin D for the standard preparation. The next three columns represent the repeated measures of bone density for each of the three levels of log dose of vitamin D which are represented by the rows of the first column. This is then repeated for the other two preparations. Alternatively, open the test workbook using the file open function of the file menu. Then select covariance from the groups section of the regression and correlation section of the analysis menu. Select the columns marked "Log Dose_Std", "Log Dose_I" and "Log Dose_F" when you are prompted for the predictor (x) variables, these contain the log dose levels (logarithms are taken because, from previous research, the relationship between bone re-mineralisation and Vitamin D is known to be log-linear). Make sure that the "use Y replicates" option is checked when you are prompted for it. Then select the outcome (Y) variables that represent the replicates. You will have to select three, five and three columns in just three selection actions because these are the number of corresponding dose levels in the x variables in the order in which you selected them.

Alternatively, these data could have been entered in just three pairs of workbook columns representing the three preparations with a log dose column and column of the mean bone density score for each dose level. By accepting the more long winded input of replicates, StatsDirect is encouraging you to run a test of linearity on your data.

For this example:

Grouped linear regression

Source of variation	SSq	DF	MSq	VR
Common slope	78.340457	1	78.340457	67.676534	P < 0.0001
Between slopes	4.507547	2	2.253774	1.946984	P = 0.1501
Separate residuals	83.34518	72	1.157572
Within groups	166.193185	75

Common slope is significant

Difference between slopes is NOT significant

Slope comparisons:

slope 1 (Log Dose_Std) v slope 2 (Log Dose_I) = 2.616751 v 2.796235

Difference (95% CI) = 0.179484 (-1.576065 to 1.935032)

t = -0.203808, P = 0.8391

slope 1 (Log Dose_Std) v slope 3 (Log Dose_F) = 2.616751 v 4.914175

Difference (95% CI) = 2.297424 (-0.245568 to 4.840416)

t = -1.800962, P = 0.0759

slope 2 (Log Dose_I) v slope 3 (Log Dose_F) = 2.796235 v 4.914175

Difference (95% CI) = 2.11794 (-0.135343 to 4.371224)

t = -1.873726, P = 0.065

Covariance analysis

Uncorrected:

Source of variation	YY	xY	xx	DF
Between groups	17.599283	-3.322801	0.988515	2
Within	166.193185	25.927266	8.580791	8
Total	183.792468	22.604465	9.569306	10

Corrected:

Source of variation	SSq	DF	MSq	VR
Between groups	42.543829	2	21.271915	1.694921
Within	87.852727	7	12.55039
Total	130.396557	9

P = 0.251

Corrected Y means ± SE for baseline mean predictor of 0.884372:

Y' = 2.901917 ± 2.045389

Y' = 1.533957 ± 1.590482

Y' = 3.398345 ± 2.057601

Line separations (common slope =3.021547):

line 1 (Log Dose_Std) vs line 2 (Log Dose_I) Vertical separation = 1.367959

95% CI = -4.760348 to 7.496267

t = 0.527831, (7 df), P = 0.6139

line 1 (Log Dose_Std) vs line 3 (Log Dose_F) Vertical separation = -0.496428

95% CI = -7.354566 to 6.36171

t = -0.171164, (7 df), P = 0.8689

line 2 (Log Dose_I) vs line 3 (Log Dose_F) Vertical separation = -1.864388

95% CI = -8.042375 to 4.3136

t = -0.713594, (7 df), P = 0.4986

The common slope is highly significant and the test for difference between the slopes overall was non-significant. If our assumption of linearity holds true we can conclude that these lines are reasonably parallel. Looking more closely at the individual slopes preparation F is almost shown to be significantly different from the other two but this difference was not large enough to throw the overall slope comparison into a significant heterogeneity.

The analysis of covariance did not show any significant vertical separation of the three regression lines.

P values