HcWjnyVHiTd8hN_8STvJ2rWaXvhPz4wXYCNGvD4qDkU
 
 
 

The Mean and Standard Deviation
Lecture 2

The Mean and Standard Deviation

 

  1. Mean – the average for a data set

    1. Median does not use all information

    2. Calculate the mean by

Equation 1

    1. Notation

      1. Xi is a data point, or an observation

      2. n is the total number of observations

      3. i is an index number

      4. S is the summation symbol

    2. Mean is central tendency; however, it is sensitive to outliers

    3. Mode – the data point that occurs most frequently

      1. If the probability distribution is symmetric, then the mean = mode = median

A symmetric distribution

      1. If the probability distribution is skewed, then the mean does not equal the mode and the mode does not equal the median

A skewed distribution

    1. Example

      1. Unordered: 10 32 5 6 7 5 4 5

      2. Ordered: 4 5 5 5 6 7 10 32

      3. The sum of the numbers is 74

      4. Statistics

        1. The mean is 74 / 8 = 9.25

        2. The mode is 5

        3. The median is (5 + 6)/2 = 5.5

      5. Thus, the distribution is skewed

  1. Standard Deviation – how spread out the distribution is

    1. Uses all the data points

Equation 2

    1. The s2 is the variance

      1. The hat means it is estimated

      2. n – 1 is called the degrees of freedom

      3. We are calculating (estimating) the variance, then we lose one piece of information

      4. This is the sample variance

    2. Population – all data that is included in your analysis

      1. Maybe too costly, or too large, etc to collect population data

      2. Sample – randomly select out of the population

      3. The population variance is:

Equation 3

      1. Notice – there is no hat; we have all data points and can calculate the population variance; it does not have to be estimated!

      2. It is easy to calculate the sample variance from the population variance and vice versa

Equation 4

      1. Usually rare to have the whole population data, so sample is always used

    1. The population variance is written as:

Equation 5

    1. Very easy to derive

Equation 6

    1. The trick to the derivation

      1. S is a linear operator

      2. X bar and 2 are constant and can be distributed out

    2. Calculate the variance for the sample

Observations XiThe mean Equation 7
5 5 – 4.6 = 0.4 0.16
6 6 – 4.6 = 1.4 1.96
3 3 – 4.6 = -1.6 2.56
5 5 – 4.6 = 0.4 0.16
4 4 – 4.6 = -0.6 0.36


5.2

Equation 8

Equation 9

      1. Variance has one problem. If data is in $’s, then units for variance is $2

      2. Take the standard deviation (SD)

Equation 10

      1. Standard deviation has the same units as the mean and data

 

Probability Distributions

 

  1. Statistics has many probability distributions

    1. At least 20 distributions are popular

    2. The most common is the Normal or Gaussian Distribution

      1. “Bell shaped curve”

      2. The mean and standard deviation can completely describe this distribution

The normal distribution

    1. Normal distribution – as the sample size increases to infinity, many of the other distributions become normal

      • If the sample size > 50, then the normal distribution can be a good approximation

    2. Confidence intervals

      1. From the last example, The mean=4.6 and s = 1.141

      2. 68% of the data lies between

        1. [4.6 – 1.141(1), 4.6 + 1.141(1)] = [3.46, 5.74]

      3. 95% of the data lies between

        1. [4.6 – 1.141(2), 4.6 + 1.141(2)] = [2.32 6.88]

      4. 99% of the data lies between

        1. [4.6 – 1.141(3), 4.6 + 1.141(3)] = [1.18, 8.02]

 

Data Transformations

 

  1. If you have a positively skewed distribution, then use a transformation to make distribution “more symmetric.”

  2. An example of a positively skewed distribution

A positively skewed distribution

  1. Use natural logarithm

    1. This function flattens the distribution

Data Natural logarithm
. .
45 ln45 = 3.8066
. .
50 ln50 = 3.912 This is the mean
. .
100 ln100 = 4.605 An outlier
    1. Note – the mean of the data and the mean of log of the data will not equal

Equation 11

    1. ln and exp are inverses of each other

  1. The natural logarithm of a negatively skewed distribution will not work

    • The smaller numbers are made smaller with this transformation

A left sided skewed distribution

 

Measurement Errors

 

  1. Measurement Errors – errors in measuring the data

    1. Within subject (or intra subject) – if you take another measurement on the same person, you get a different measurement

      1. We can measure this variability

      2. Coefficient of Variability (CV) is

Equation 12

      1. Use CV to check variability of our measurement on one person

    1. Between subject (or inter subject) – measurement error on each subject in sample

      • We cannot use the CV

    2. Example

      1. One person’s heart beat is 60 beats per second and CV = 3%

      2. Another persons’ heart beat is 80 beats per second and CV = 10%

      3. Each person’s heart is different

      4. Each sample has intra and inter measurement errors

 

FOLLOW ME