Inference

Significance tests are tests using a sample statistic to determine if a claim about a population parameter is true.

Samples used for inference must be random and independent.

  • Sample size must be less than 10% of the population
  • 10+ successes/failures

Confidence Intervals

Confidence intervals are estimates that provide a reasonable range of values that a population parameter can be expected to fall in.

  • 95% confident means that 95% of samples will fall within of the true mean (see: empirical rule)

Standard Error

Standard deviation applies to variability within one sample. Standard error applies to variability within multiple samples.

Critical Value

  • z* is the critical value of the confidence interval (how many standard deviations up or down)
  • C% confidence level refers to the percentage of sample proportions in the middle of the sampling distribution around the true population proportion
  • is the portion outside the C% interval (1-C)
    • Proportion on each end of the distribution is

To find z* for the 90% confidence interval:

  • InvNormCD(0.05, 1, 0) >>> -1.64485
  • z* =

Where:

  • = sample proportion
  • = margin of error
  • = standard error of the sample population

Example

A random sample of 780 CS majors showed that 82% said YES to showering daily.

Create a 98% confidence interval for the proportion of CS majors shower daily.

Naming the procedure:

β€œThis is a one sample z-interval for the proportion of all CS majors who shower daily.”

=InvNormCD(0.01, 1, 0) =

Applying the formula:

Answer is written as:

β€œI am 98% confident that the proportion of all CS majors who shower daily is between 78.8% and 85.2%.”

Interpreting Confidence Intervals

C% confidence level: C% of all possible sample proportions (of the same size from the same population) will be captured within the interval (C% of samples will result in an interval that will contain the population proportion).

Example

Does the above sample provide evidence that less than 84% of CS majors shower daily?

Since portions of the interval fall both above and below 84%:

β€œWe cannot conclude that less than 84% of CS majors shower daily.”

Does it provide evidence that over 70% of CS majors shower daily?

Since the entire interval is over 70%:

β€œWe can conclude that over 70% of CS majors shower daily.”

Margins of Error

Recall from Chapter 7 – The Central Limit Theorem that a larger sample size will lead to a smaller standard deviation (standard error).

  • Bigger samples have smaller margins of error (with high confidence) since they vary less

Example

Researchers want to find the proportion of all dogs that take a multivitamin by constructing a 95% confidence interval with a 2% margin of error.

What sample size do they need?

Using 0.50 as a placeholder for :

Comparing Population Proportions

Example

Sample of 780 CS majors showed 82% shower daily.

Sample of 550 math majors showed 70% shower daily.

What is the true difference between these two population proportions?

Naming the procedure:

β€œI will construct a two sample z-interval for the difference in the proportion of CS majors that shower daily and math majors that shower daily.”

Ensure that the samples are random, have enough successes, etc.

Building the interval:

Answer written as:

β€œI am 95% confident that the proportion of CS majors that shower daily is from 7.32% to 16.68% more than the proportion of math majors that shower daily.”

A negative bound would mean that a greater proportion of math majors shower compared to the proportion of CS majors.

Example

Is there evidence that the proportion of CS majors that shower daily is greater than the proportion of math majors that shower daily?

”Yes, my interval shows that I can be very confident that the proportion of CS majors that shower daily is greater than the proportion of math majors that shower daily.”

If one bound was negative:

β€œNo, the proportion of math majors could even be greater.”