Individuals can be anything – people, animals, objects, etc.

Variables: any characteristics that changes between individuals - Categorical variables place an individuals into discrete groups or categories (e.g., ethnicity) - Quantitative variables can take numerical values (e.g., height)

Representing Categorical Variables

Tables:

Car ColourFrequencyRelative Frequency
Red150.179
Yellow120.143
Blue230.274
White340.405

Pie Charts or Bar Graphs:

Pie ChartBar Graph

Pie graphs show relative frequency. Bar graphs can show either frequency or relative frequency.

Representing Quantitative Variables

Quantitative variables may be either continuous or discrete. Bins are defined intervals which are equal in size, which are the quantitative equivalent to categories.

The four visualizations are dot plots, stem-and-leaf plots, histograms, and cumulative graphs.

HistogramCumulative Graph

The distribution of a quantitative variable is described in terms of:

  • Shape: is the graph unimodal or bimodal? does it skew to one direction? are there any clusters or gaps?
  • Centre: what is the (approximate) average?
  • Spread: where are the majority of data located?
  • Outliers/unusual features: are there any unusual points?

The above histogram can be described as:

Symetric and unimodal with a centre around 140cm. Heights are spread from 100 to 170cm, but little spread from where the majority are from 130 to 150cm.

Measures of Centre

The mean is sum of the values divided by the number of values

The mean is nonresistant. It can be influenced by outliers and skewness.

The median is the middle ordered value. When there an even number of values, it is the average of the middle two values.

The median is resistant. it is not influenced by outliers and skewness.

When data is symmetric, the mean is close to the median. When data is left-skewed, mean < median and vice-versa.

Measures of Position

Percentile: a value at the percentile will have of the data less than or equal to itself.

Quartile: a multiple of the percentile

  • The median is at the quartile or percentile
  • The and quartiles are the medians of the bottom and top halves

Measures of Spread

Range: max - min; easily influenced by outliers

Interquartile range (IQR): ; not influenced by outliers

Standard deviation (SD/): how far typical values are from the mean

  • Large SD: majority of data far from the mean
  • Small SD: majority of data close to the mean

Outliers

Outliers can be determined in two ways:

Fence method: an outlier is a value less than or greater than

2SD method: an outlier is a value from the mean

Transforming Data

Addition and subtraction affects centre and position but not spread.

Multiplication and division affects all measures.

Adding or removing data can have a major impact on nonresistant measures if outliers are present.

Box Plots

Normal Distribution

The normal distribution models populations. 68% fall within 1 of the mean , 95% within of , and 99.7% within 3 of , however the distribution continues to .

Z-Score

The z-score measures how many standard deviations a value falls above or below the mean. It can be used to compare different data as it is a standardized measure.

For example, a student with an IQ of 132 would have a z-score of

meaning they fall approximately 2.13 standard deviations above the mean.

Cumulative Distribution Function

The cumulative distribution function (CDF) of the normal distribution is the integral of the normal distribution, representing the percentage falling within the bounds of integration.

On the FX-CG50, the normal CDF can be accessed through the Run-Matrix or Statistics applications.

In Statistics: Dist > NORM > Ncd

  • Set Data to Var

In Run-Matrix: OPTN > STAT > DIST > NORM > Ncd

  • The NormCD() function takes the following arguments:
    • lower bound (use a large negative value to represent )
    • upper bound (use a large positive value to represent )
    • standard deviation ()
    • mean ()

When calculating CDF using a z-score, and .

According to the normal CDF, a student with an IQ of 132 would fall at the percentile:

>>> NormCD(-100000, 132, 15, 100)
0.9835513

Conversely, only 1.64% of the population would have an IQ above 132:

Normal C.D
Data    :Variable
Lower   :132
Upper   :100000
σ       :15
μ       :100
Normal C.D
p    =0.01644869

Inverse CDF

The inverse normal CDF takes a percentile and returns the value or z-score at that percentile.

In Run-Matrix: OPTN > STAT > DIST > NORM > InvN

  • InvNormCD takes the following arguments:
    • percentile
    • standard deviation ()
    • mean ()

A person at the 80th percentile would have an IQ of 112.6:

>>> InvNormCD(0.80, 15, 100)
112.6243185

or a z-score of 0.84:

>>> InvNormCD(0.80, 1, 0)
0.8416212336

Standard Normal Tables

Standard normal tables return the integral of a normal probability density function from to the z-score, which can be used to calculate the left NormCD manually.

To find the percentage at or below a z-score of 2.34:

We find the row 2.3 and column 0.03, returning 0.4904.