Nonparametric Linear Regression
Menu location: Analysis_Nonparametric_Nonparametric Linear Regression.
This is a distribution free method for investigating a linear relationship between two variables Y (dependent, outcome) and X (predictor, independent).
The slope b of the regression (Y=bX+a) is calculated as the median of the gradients from all possible pairwise contrasts of your data. A confidence interval based upon Kendall's t is constructed for the slope.
Nonparametric linear regression is much less sensitive to extreme observations (outliers) than is simple linear regression based upon the least squares method. If your data contain extreme observations which may be erroneous but you do not have sufficient reason to exclude them from the analysis then nonparametric linear regression may be appropriate.
Assumptions:
- The sample is random (X can be non-random provided that Ys are independent with identical conditional distributions).
- The regression of Y on X is linear (this implies an interval measurement scale for both X and Y).
This function also provides you with an approximate two sided Kendall's rank correlation test for independence between the variables.
Technical Validation
Note that the two sided confidence interval for the slope is the inversion of the two sided Kendall's test. The approximate two sided P value for Kendall's t or tb is given but the exact quantile from Kendall's distribution is used to construct the confidence interval, therefore, there may be slight disagreement between the P value and confidence interval. If there are many ties then this situation is compounded (Conover, 1999).
Example
From Conover (1999, p. 338).
Test workbook (Nonparametric worksheet: GPA, GMAT).
The following data represent test scores for 12 graduates respectively:
GPA | GMTA |
4.0 | 710 |
4.0 | 610 |
3.9 | 640 |
3.8 | 580 |
3.7 | 545 |
3.6 | 560 |
3.5 | 610 |
3.5 | 530 |
3.5 | 560 |
3.3 | 540 |
3.2 | 570 |
3.2 | 560 |
To analyse these data in StatsDirect you must first enter them into two columns in the workbook. Alternatively, open the test workbook using the file open function of the file menu. Then select Nonparametric Linear Regression from the Nonparametric section of the analysis menu. Select the columns marked "GPA" and "GMTA" when prompted for Y and X variables respectively.
For this example:
GPA vs. GMTA
Observations per sample = 12
Median slope (95% CI) = 0.003485 (0 to 0.0075)
Y-intercept = 1.581061
Kendall's rank correlation coefficient tau b = 0.439039
Two sided (on continuity corrected z) P = .0678
If you plot GPA against GMTA scores using the scatter plot function in the graphics menu, you will see that there is a reasonably straight line relationship between GPA and GMTA. Here we can infer with 95% confidence that the true population value of the slope of a linear regression line for these two variables lies between 0 and 0.008. The regression equation is estimated at Y = 1.5811 + 0.0035X.
From the two sided Kendall's rank correlation test, we can not reject the null hypothesis of mutual independence between the pairs of results for the twelve graduates. Note that the zero lower confidence interval is a marginal result and we may have rejected the null hypothesis had we used a different method for testing independence.