Conditional Logistic Regression

Menu location: Analysis_Regression and Correlation_Conditional Logistic.

This function fits and analyses conditional logistic models for binary outcome/response data with one or more predictors, where observations are not independent but are matched or grouped in some way.

Binomial distributions are used for handling the errors associated with regression models for binary/dichotomous responses (i.e. yes/no, dead/alive) in the same way that the standard normal distribution is used in general linear regression. Other, less commonly used binomial models include normit/probit and complimentary log-log. The logistic model is widely used and has many desirable properties (Hosmer and Lemeshow, 1989; Armitage and Berry, 1994; Altman 1991; McCullagh and Nelder, 1989; Cox and Snell, 1989; Pregibon, 1981).

Odds = π/(1-π)

[π = proportional response, i.e. r out of n responded so π = r/n]

Logit = log odds = log(π/(1-π))

When a logistic regression model has been fitted, estimates of p are marked with a hat symbol above the Greek letter pi to denote that the proportion is estimated from the fitted regression model. Fitted proportional responses are often referred to as event probabilities (i.e. π hat n events out of n trials).

The following information about the difference between two logits demonstrates one of the important uses of logistic regression models:

Logistic models provide important information about the relationship between response/outcome and exposure. It makes no difference to logistic models, whether outcomes have been sampled prospectively or retrospectively, this is not the case with other binomial models.

The conditional logistic model can cope with 1:1 or 1:m case-control matching. In the simplest case, this is an extension of McNemar's test for matched studies.

Data preparation

You must prepare your data case by case, i.e. ungrouped, one subject/observation per row, this is unlike the unconditional logistic function that accepts grouped or ungrouped data.

The binary outcome variable must contain only 0 (control) or 1 (case).

There must be a stratum indicator variable to denote the strata. In case-control studies with 1:1 matching this would mean a code for each pair (i.e. two rows marked stratum x, one with a case + covariates and the other with a control + covariates). For 1:m matched studies there will be 1+m rows of data for each stratum/matching-group.

Technical validation

The regression is fitted by maximisation of the natural logarithm of the conditional likelihood function using Newton-Raphson iteration as described by Krailo et al. (1984), Smith et al. (1981) and Howard (1972).

Example

From Hosmer and Lemeshow (1989).

Test workbook (Regression worksheet: PAIRID, LBWT, RACE (b), SMOKE, HT, UI, PTD, LWT).

These are artificially matched data from a study of the risk factors associated with low birth weight in Massachusetts in 1986. The predictors studied here are black race (RACE (b)), smoking status (SMOKE), hypertension (HT), uterine irritability (UI), previous preterm delivery (PTD) and weight of the mother at her last menstrual period (LWT).

To analyse these data using StatsDirect you must first open the test workbook using the file open function of the file menu. Then select Conditional Logistic from the Regression and Correlation section of the analysis menu. Select the column marked "PAIRID" when asked for the stratum (match group) indicator. Then select "LBWT" when asked for the case-control indicator. Then select "RACE (b)", "SMOKE", "HT", "UI", "PTD", and "LWT" in one action when you are asked for predictors.

For this example:

Conditional logistic regression

Deviance (-2 log likelihood) = 51.589852

Deviance (likelihood ratio) chi-square = 26.042632 P = 0.0002

Pseudo (McFadden) R-square = 0.33546

Label	Parameter estimate	Standard error
RACE (b)	0.582272	0.620708	z = 0.938078	P = 0.3482
SMOKE	1.410799	0.562177	z = 2.509528	P = 0.0121
HT	2.351335	1.05135	z = 2.236492	P = 0.0253
UI	1.399261	0.692244	z = 2.021341	P = 0.0432
PTD	1.807481	0.788952	z = 2.290989	P = 0.022
LWT	-0.018222	0.00913	z = -1.995807	P = 0.046

Label	Odds ratio	95% confidence interval
RACE (b)	1.790102	0.53031 to 6.042622
SMOKE	4.099229	1.361997 to 12.337527
HT	10.499579	1.3374 to 82.429442
UI	4.052205	1.043404 to 15.737307
PTD	6.095073	1.298439 to 28.611218
LWT	0.981943	0.964529 to 0.999673

You may infer from the results above that hypertension, smoking status and previous pre-term delivery are convincing predictors of low birth weight in the population studied.

Note that the selection of predictors for regression models such as this can be complex and is best done with the help of a Statistician. Hosmer and Lemeshow (1989) give a good discussion of the example above, but with non-standard dummy variables (StatsDirect uses a standard dummy/design variable coding scheme adopted by most other statistical software). The optimal selection of predictors depends not only upon their numerical performance in the model, with or without appropriate transformations or study of interactions, but also upon their biophysical importance in the study.

P values

confidence intervals