Variance, Standard Deviation and Spread
The standard deviation of the mean (SD) is the most commonly used measure of the spread of values in a distribution. SD is calculated as the square root of the variance (the average squared deviation from the mean).
Variance in a population is:
[x is a value from the population, μ is the mean of all x, n is the number of x in the population, Σ is the summation]
Variance is usually estimated from a sample drawn from a population. The unbiased estimate of population variance calculated from a sample is:
[xi is the ith observation from a sample of the population, x-bar is the sample mean, n (sample size) -1 is degrees of freedom, Σ is the summation]
The spread of a distribution is also referred to as dispersion and variability. All three terms mean the extent to which values in a distribution differ from one another.
SD is the best measure of spread of an approximately normal distribution. This is not the case when there are extreme values in a distribution or when the distribution is skewed, in these situations interquartile range or semi-interquartile are preferred measures of spread. Interquartile range is the difference between the 25th and 75th centiles. Semi-interquartile range is half of the difference between the 25th and 75th centiles. For any symmetrical (not skewed) distribution, half of its values will lie one semi-interquartile range either side of the median, i.e. in the interquartile range. When distributions are approximately normal, SD is a better measure of spread because it is less susceptible to sampling fluctuation than (semi-)interquartile range.
If a variable y is a linear (y = a + bx) transformation of x then the variance of y is b² times the variance of x and the standard deviation of y is b times the variance of x.
The standard error of the mean is the expected value of the standard deviation of means of several samples, this is estimated from a single sample as:
[s is standard deviation of the sample mean, n is the sample size]