Grouped Linear Regression with Covariance Analysis
Menu location: Analysis_Regression and Correlation_Grouped Linear_Covariance.
This function compares the slopes and separations of two or more simple linear regression lines.
The method involves examination of regression parameters for a group of xY pairs in relation to a common fitted function. This provides an analysis of variance that shows whether or not there is a significant difference between the slopes of the individual regression lines as a whole. StatsDirect then compares all of the slopes individually. The vertical distance between each regression line is then examined using analysis of covariance and the corrected means are given (Armitage and Berry, 1994).
Assumptions:
- Y replicates are a random sample from a normal distribution
- deviations from the regression line (residuals) follow a normal distribution
- deviations from the regression line (residuals) have uniform variance
This is just one facet of analysis of covariance; there are additional and alternative methods. For further information, see Kleinbaum et al. (1998) and Armitage and Berry (1994). Analysis of covariance is best carried out as part of a broader regression modelling exercise by a Statistician.
Technical Validation
Slopes of several regression lines are compared by analysis of variance as follows (Armitage, 1994):
- where SScommon is the sum of squares due to the common slope of k regression lines, SSbetween is the sum of squares due to differences between the slopes, SStotal is the total sum of squares and the residual sum of squares is the difference between SStotal and SScommon. Sxxj is the sum of squares about the mean x observation in the jth group, SxYj is the sum of products of the deviations of xY pairs from their means in the jth group and SYYj is the sum of squares about the mean Y observation in the jth group.
Vertical separation of slopes of several regression lines is tested by analysis of covariance as follows (Armitage, 1994):
- where SS are corrected sums of squares within the groups, total and between the groups (subtract within from total). The constituent sums of products or squares are partitioned between groups, within groups and total as above.
Data preparation
If there are equal numbers of replicate Y observations or single Y observations for each x then you are best prepare and select your data using a group identifier variable. For example with three replicates you would prepare five columns of data: group identifier, x, y1, y2, and y3. Remember to choose the "Groups by identifier" option in this case.
If there are unequal numbers of replicate Y observations for each x then you must prepare the x data in separate columns by group, prepare the Y data in separate columns by group and observation (i.e. Y for group 1 observation 1… r rows long where r is the number of repeat observations). Remember to choose the "Groups by column" option in this case. This is done in the example below.
Example
From Armitage and Berry (1994).
Test workbook (Regression worksheet: Log Dose_Std, BD 1_Std, BD 2_Std, BD 3_Std, Log Dose_I, BD 1_I, BD 2_I, BD 3_I, Log Dose_F, BD 1_F, BD 2_F, BD 3_F).
Three different preparations of Vitamin D are tested for their effect on bones by feeding them to rats that have an induced lack of mineral in their bones. X-ray methods are used to test the re-mineralisation of bones in response to the Vitamin D.
For the standard preparation:
Log dose of Vit D |
||
0.544 |
0.845 |
1.146 |
Bone density score |
||
0 |
1.5 |
2 |
0 |
2.5 |
2.5 |
1 |
5 |
5 |
2.75 |
6 |
4 |
2.75 |
4.25 |
5 |
1.75 |
2.75 |
4 |
2.75 |
1.5 |
2.5 |
2.25 |
3 |
3.5 |
2.25 |
|
3 |
2.5 |
|
2 |
|
|
3 |
|
|
4 |
|
|
4 |
For alternative preparation I:
Log dose of Vit D | ||||
0.398 | 0.699 | 1.000 | 1.301 | 1.602 |
Bone density score | ||||
0 | 1 | 1.5 | 3 | 3.5 |
1 | 1.5 | 1 | 3 | 3.5 |
0 | 1.5 | 2 | 5.5 | 4.5 |
0 | 1 | 3.5 | 2.5 | 3.5 |
0 | 1 | 2 | 1 | 3.5 |
0.5 | 0.5 | 0 | 2 | 3 |
For alternative preparation F:
Log dose of Vit D | ||
0.398 | 0.699 | 1.000 |
Bone density score | ||
2.75 | 2.5 | 3.75 |
2 | 2.75 | 5.25 |
1.25 | 2.25 | 6 |
2 | 2.25 | 5.5 |
0 | 3.75 | 2.25 |
0.5 | 3.5 |
To analyse these data in StatsDirect you must first enter them into 14 columns in the workbook appropriately labelled. The first column is just three rows long and contains the three log doses of vitamin D for the standard preparation. The next three columns represent the repeated measures of bone density for each of the three levels of log dose of vitamin D which are represented by the rows of the first column. This is then repeated for the other two preparations. Alternatively, open the test workbook using the file open function of the file menu. Then select covariance from the groups section of the regression and correlation section of the analysis menu. Select the columns marked "Log Dose_Std", "Log Dose_I" and "Log Dose_F" when you are prompted for the predictor (x) variables, these contain the log dose levels (logarithms are taken because, from previous research, the relationship between bone re-mineralisation and Vitamin D is known to be log-linear). Make sure that the "use Y replicates" option is checked when you are prompted for it. Then select the outcome (Y) variables that represent the replicates. You will have to select three, five and three columns in just three selection actions because these are the number of corresponding dose levels in the x variables in the order in which you selected them.
Alternatively, these data could have been entered in just three pairs of workbook columns representing the three preparations with a log dose column and column of the mean bone density score for each dose level. By accepting the more long winded input of replicates, StatsDirect is encouraging you to run a test of linearity on your data.
For this example:
Grouped linear regression
Source of variation | SSq | DF | MSq | VR | |
Common slope | 78.340457 | 1 | 78.340457 | 67.676534 | P < 0.0001 |
Between slopes | 4.507547 | 2 | 2.253774 | 1.946984 | P = 0.1501 |
Separate residuals | 83.34518 | 72 | 1.157572 | ||
Within groups | 166.193185 | 75 |
Common slope is significant
Difference between slopes is NOT significant
Slope comparisons:
slope 1 (Log Dose_Std) v slope 2 (Log Dose_I) = 2.616751 v 2.796235
Difference (95% CI) = 0.179484 (-1.576065 to 1.935032)
t = -0.203808, P = 0.8391
slope 1 (Log Dose_Std) v slope 3 (Log Dose_F) = 2.616751 v 4.914175
Difference (95% CI) = 2.297424 (-0.245568 to 4.840416)
t = -1.800962, P = 0.0759
slope 2 (Log Dose_I) v slope 3 (Log Dose_F) = 2.796235 v 4.914175
Difference (95% CI) = 2.11794 (-0.135343 to 4.371224)
t = -1.873726, P = 0.065
Covariance analysis
Uncorrected:
Source of variation | YY | xY | xx | DF |
Between groups | 17.599283 | -3.322801 | 0.988515 | 2 |
Within | 166.193185 | 25.927266 | 8.580791 | 8 |
Total | 183.792468 | 22.604465 | 9.569306 | 10 |
Corrected:
Source of variation | SSq | DF | MSq | VR |
Between groups | 42.543829 | 2 | 21.271915 | 1.694921 |
Within | 87.852727 | 7 | 12.55039 | |
Total | 130.396557 | 9 |
P = 0.251
Corrected Y means ± SE for baseline mean predictor of 0.884372:
Y' = 2.901917 ± 2.045389
Y' = 1.533957 ± 1.590482
Y' = 3.398345 ± 2.057601
Line separations (common slope =3.021547):
line 1 (Log Dose_Std) vs line 2 (Log Dose_I) Vertical separation = 1.367959
95% CI = -4.760348 to 7.496267
t = 0.527831, (7 df), P = 0.6139
line 1 (Log Dose_Std) vs line 3 (Log Dose_F) Vertical separation = -0.496428
95% CI = -7.354566 to 6.36171
t = -0.171164, (7 df), P = 0.8689
line 2 (Log Dose_I) vs line 3 (Log Dose_F) Vertical separation = -1.864388
95% CI = -8.042375 to 4.3136
t = -0.713594, (7 df), P = 0.4986
The common slope is highly significant and the test for difference between the slopes overall was non-significant. If our assumption of linearity holds true we can conclude that these lines are reasonably parallel. Looking more closely at the individual slopes preparation F is almost shown to be significantly different from the other two but this difference was not large enough to throw the overall slope comparison into a significant heterogeneity.
The analysis of covariance did not show any significant vertical separation of the three regression lines.