Types
of data
Quantitative variables – how much
Continuous variables – business profits, sales, etc
Discrete – counting things
Categorical variables – what type
Unordered – also called nominal
Ordered – also called ordinal
Grades – A, B, C, D, and F
Class level – 1^{st}, 2^{nd}, 3^{rd},
and 4^{th}
Responses on a survey
Possible to convert one variable into another
Stem
and Leaf Plots
Data should be plotted to get an idea what it looks
like
This method is old
Example: Company’s assets in $ billions
Data – 3.5, 6.9, 4.4, 4.4, 2.2, 5.3, 4.3, 4.0, 5.1,
7.1, 0.6, 5.3, 6.7
Scan data and find the smallest and largest numbers
The data is unordered
0 
6 
1 

2 
2 
3 
5 
4 
4 4 3 0 
5 
3 1 3 
6 
9 7 
7 
1 
The data is ordered
0 
6 Possibly an outlier 
1 

2 
2 
3 
5 
4 
0 3 4 4 
5 
1 3 3 
6 
7 9 
7 
1 
Outlier – an extreme value
Benefit? – The only plot where we still have the
original data
Median
– a mid point of a data set
Take data and order it from smallest to largest
Example
Unordered: 4.5 6.3 6.1 5.5 7
Ordered: 4.5 5.5 6.1 6.3 7
The median is the value in the center, which is 6.1 in
our case
The median is not sensitive to outliers
If the data has an even number of points, then take
the average of the two points in the center
Example
Unordered: 3 10 8 7
Ordered: 3 7 8 10
The median is the average of 7 and 8, which is 7.5
The average is (7 + 8)/2 = 7.5
Measures
of variance
Range – the difference between the largest value
in the sample (the maximum) and the smallest value (the minimum),

Very sensitive to outliers
Example
Unordered: 5 4 6 7 100
Ordered: 4 5 6 7 100
The range is [4, 100]
Did you notice the 100? It appears to be an outlier,
because it is very large relative to the other numbers
Quartiles – divide the data into four groups
0 to 25% 
Bottom 25% of values 
25 to 50% 


Median is 50% 
50 to 75% 

75 to 100% 
Top 25% of values 

Usually works well for large data sets
BoxWhisker Plots – a nice way to plot quartiles
Excel cannot do this!
We can have several BoxWhisker Plots side by side

Some statistical programs can calculate these
Histograms
– for continuous variables
Excel can do this with some difficulty
Steps
Take the data and categorize into groups; groups are
ranked
Count how many are in a group, which is the frequency
A histogram displays the distribution of data
Excel
Find the maximum data point by using =max( )
function
Find the minimum data point by using = min( )
function
Specify the number of categories, k, which are also
called bins
First
category: min. to min. + (width)(1)
Second
category: min. + (width)(1) to min. + (width)(2)
Last
category: min. + (width)(k – 1) to min. + (width)(k)
Then use =countif( ) function to count how many data
points fall with a category
This part is hard
Excel has a histrogram function in Data Analysis
If you choose too many categories, then you get noise
Bar
charts – categorical data
Example – Medeo collects information on visitors for
2008
Almaty has 10,031 visitors
Astana has 542
Foreigners who visited are 5,321
Could convert frequency into a percentage

FOLLOW ME