Conditional Logistic Regression
Menu location: Analysis_Regression and Correlation_Conditional Logistic.
This function fits and analyses conditional logistic models for binary outcome/response data with one or more predictors, where observations are not independent but are matched or grouped in some way.
Binomial distributions are used for handling the errors associated with regression models for binary/dichotomous responses (i.e. yes/no, dead/alive) in the same way that the standard normal distribution is used in general linear regression. Other, less commonly used binomial models include normit/probit and complimentary log-log. The logistic model is widely used and has many desirable properties (Hosmer and Lemeshow, 1989; Armitage and Berry, 1994; Altman 1991; McCullagh and Nelder, 1989; Cox and Snell, 1989; Pregibon, 1981).
Odds = π/(1-π)
[π = proportional response, i.e. r out of n responded so π = r/n]
Logit = log odds = log(π/(1-π))
When a logistic regression model has been fitted, estimates of p are marked with a hat symbol above the Greek letter pi to denote that the proportion is estimated from the fitted regression model. Fitted proportional responses are often referred to as event probabilities (i.e. π hat n events out of n trials).
The following information about the difference between two logits demonstrates one of the important uses of logistic regression models:
Logistic models provide important information about the relationship between response/outcome and exposure. It makes no difference to logistic models, whether outcomes have been sampled prospectively or retrospectively, this is not the case with other binomial models.
The conditional logistic model can cope with 1:1 or 1:m case-control matching. In the simplest case, this is an extension of McNemar's test for matched studies.
Data preparation
You must prepare your data case by case, i.e. ungrouped, one subject/observation per row, this is unlike the unconditional logistic function that accepts grouped or ungrouped data.
The binary outcome variable must contain only 0 (control) or 1 (case).
There must be a stratum indicator variable to denote the strata. In case-control studies with 1:1 matching this would mean a code for each pair (i.e. two rows marked stratum x, one with a case + covariates and the other with a control + covariates). For 1:m matched studies there will be 1+m rows of data for each stratum/matching-group.
Technical validation
The regression is fitted by maximisation of the natural logarithm of the conditional likelihood function using Newton-Raphson iteration as described by Krailo et al. (1984), Smith et al. (1981) and Howard (1972).
Example
From Hosmer and Lemeshow (1989).
Test workbook (Regression worksheet: PAIRID, LBWT, RACE (b), SMOKE, HT, UI, PTD, LWT).
These are artificially matched data from a study of the risk factors associated with low birth weight in Massachusetts in 1986. The predictors studied here are black race (RACE (b)), smoking status (SMOKE), hypertension (HT), uterine irritability (UI), previous preterm delivery (PTD) and weight of the mother at her last menstrual period (LWT).
To analyse these data using StatsDirect you must first open the test workbook using the file open function of the file menu. Then select Conditional Logistic from the Regression and Correlation section of the analysis menu. Select the column marked "PAIRID" when asked for the stratum (match group) indicator. Then select "LBWT" when asked for the case-control indicator. Then select "RACE (b)", "SMOKE", "HT", "UI", "PTD", and "LWT" in one action when you are asked for predictors.
For this example:
Conditional logistic regression
Deviance (-2 log likelihood) = 51.589852
Deviance (likelihood ratio) chi-square = 26.042632 P = 0.0002
Pseudo (McFadden) R-square = 0.33546
Label | Parameter estimate | Standard error | ||
RACE (b) | 0.582272 | 0.620708 | z = 0.938078 | P = 0.3482 |
SMOKE | 1.410799 | 0.562177 | z = 2.509528 | P = 0.0121 |
HT | 2.351335 | 1.05135 | z = 2.236492 | P = 0.0253 |
UI | 1.399261 | 0.692244 | z = 2.021341 | P = 0.0432 |
PTD | 1.807481 | 0.788952 | z = 2.290989 | P = 0.022 |
LWT | -0.018222 | 0.00913 | z = -1.995807 | P = 0.046 |
Label | Odds ratio | 95% confidence interval |
RACE (b) | 1.790102 | 0.53031 to 6.042622 |
SMOKE | 4.099229 | 1.361997 to 12.337527 |
HT | 10.499579 | 1.3374 to 82.429442 |
UI | 4.052205 | 1.043404 to 15.737307 |
PTD | 6.095073 | 1.298439 to 28.611218 |
LWT | 0.981943 | 0.964529 to 0.999673 |
You may infer from the results above that hypertension, smoking status and previous pre-term delivery are convincing predictors of low birth weight in the population studied.
Note that the selection of predictors for regression models such as this can be complex and is best done with the help of a Statistician. Hosmer and Lemeshow (1989) give a good discussion of the example above, but with non-standard dummy variables (StatsDirect uses a standard dummy/design variable coding scheme adopted by most other statistical software). The optimal selection of predictors depends not only upon their numerical performance in the model, with or without appropriate transformations or study of interactions, but also upon their biophysical importance in the study.