﻿ Statistical Methods - Lecture 2 HcWjnyVHiTd8hN_8STvJ2rWaXvhPz4wXYCNGvD4qDkU

# The Mean and Standard DeviationLecture 2

## The Mean and Standard Deviation

1. Mean – the average for a data set

1. Median does not use all information

2. Calculate the mean by

1. Notation

1. Xi is a data point, or an observation

2. n is the total number of observations

3. i is an index number

4. S is the summation symbol

2. Mean is central tendency; however, it is sensitive to outliers

3. Mode – the data point that occurs most frequently

1. If the probability distribution is symmetric, then the mean = mode = median

1. If the probability distribution is skewed, then the mean does not equal the mode and the mode does not equal the median

1. Example

1. Unordered: 10 32 5 6 7 5 4 5

2. Ordered: 4 5 5 5 6 7 10 32

3. The sum of the numbers is 74

4. Statistics

1. The mean is 74 / 8 = 9.25

2. The mode is 5

3. The median is (5 + 6)/2 = 5.5

5. Thus, the distribution is skewed

1. Standard Deviation – how spread out the distribution is

1. Uses all the data points

1. The s2 is the variance

1. The hat means it is estimated

2. n – 1 is called the degrees of freedom

3. We are calculating (estimating) the variance, then we lose one piece of information

4. This is the sample variance

2. Population – all data that is included in your analysis

1. Maybe too costly, or too large, etc to collect population data

2. Sample – randomly select out of the population

3. The population variance is:

1. Notice – there is no hat; we have all data points and can calculate the population variance; it does not have to be estimated!

2. It is easy to calculate the sample variance from the population variance and vice versa

1. Usually rare to have the whole population data, so sample is always used

1. The population variance is written as:

1. Very easy to derive

1. The trick to the derivation

1. S is a linear operator

2. X bar and 2 are constant and can be distributed out

2. Calculate the variance for the sample

 Observations Xi – 5 5 – 4.6 = 0.4 0.16 6 6 – 4.6 = 1.4 1.96 3 3 – 4.6 = -1.6 2.56 5 5 – 4.6 = 0.4 0.16 4 4 – 4.6 = -0.6 0.36 5.2

1. Variance has one problem. If data is in \$’s, then units for variance is \$2

2. Take the standard deviation (SD)

1. Standard deviation has the same units as the mean and data

## Probability Distributions

1. Statistics has many probability distributions

1. At least 20 distributions are popular

2. The most common is the Normal or Gaussian Distribution

1. “Bell shaped curve”

2. The mean and standard deviation can completely describe this distribution

1. Normal distribution – as the sample size increases to infinity, many of the other distributions become normal

• If the sample size > 50, then the normal distribution can be a good approximation

2. Confidence intervals

1. From the last example, =4.6 and s = 1.141

2. 68% of the data lies between

1. [4.6 – 1.141(1), 4.6 + 1.141(1)] = [3.46, 5.74]

3. 95% of the data lies between

1. [4.6 – 1.141(2), 4.6 + 1.141(2)] = [2.32 6.88]

4. 99% of the data lies between

1. [4.6 – 1.141(3), 4.6 + 1.141(3)] = [1.18, 8.02]

## Data Transformations

1. If you have a positively skewed distribution, then use a transformation to make distribution “more symmetric.”

2. An example of a positively skewed distribution

1. Use natural logarithm

1. This function flattens the distribution

 Data Natural logarithm . . 45 ln45 = 3.8066 . . 50 ln50 = 3.912 This is the mean . . 100 ln100 = 4.605 An outlier
1. Note – the mean of the data and the mean of log of the data will not equal

1. ln and exp are inverses of each other

1. The natural logarithm of a negatively skewed distribution will not work

• The smaller numbers are made smaller with this transformation

## Measurement Errors

1. Measurement Errors – errors in measuring the data

1. Within subject (or intra subject) – if you take another measurement on the same person, you get a different measurement

1. We can measure this variability

2. Coefficient of Variability (CV) is

1. Use CV to check variability of our measurement on one person

1. Between subject (or inter subject) – measurement error on each subject in sample

• We cannot use the CV

2. Example

1. One person’s heart beat is 60 beats per second and CV = 3%

2. Another persons’ heart beat is 80 beats per second and CV = 10%

3. Each person’s heart is different

4. Each sample has intra and inter measurement errors