Terminology and Definitons

Column

Inference

  • Statistics and Population parameters
  • Random Sampling
  • Properties of Estimators
  • Estimation (Point and Interval)
  • Confidence Intervals
  • The Central Limit Theorem
  • Standard Errors

Statistical Inference : Definitions

  • A ***{} consists of an entire set of objects, observations, or scores that have something in common. For example, a population might be defined as students in a university.

  • Some populations are only hypothetical. Consider an experiment where a die is thrown 100 times and the sum of the scores was recorded.

  • The researcher might define a population as the sums that would result if this experiment was repeated an infinite number of times.

  • The population is hypothetical in the sense that it is not reasonable to repeat this experiment indefinitely.

  • The distribution of a population can be described by several parameters such as the mean and standard deviation.

Statistical Inference : Sample

  • A sample is a subset of a population.
  • Suppose we are interested in some characteristic of a population ( e.g. amount of time spent on the internet)
  • Since it is usually impractical to test every member of a population, a sample from the population is typically the best approach available.
  • To be properly representative of a population, a sample should be both ***{} and sufficiently large.

Random Sampling

  • In random sampling, each item or element of the population has an equal chance of being chosen at each draw.
  • A sample is random if the method for obtaining the sample meets the criterion of randomness (each element having an equal chance at each draw).

Biased Sampling

  • A biased sample is one in which the method used to create the sample results in samples that are systematically different from the population.

  • For instance, consider a market research project on attitudes of attendees towards an event they attended.

  • Collecting the data by publishing a questionnaire and asking people to fill it out and send it in would produce a biased sample.

  • People interested enough to spend their time and energy filling out and sending in the questionnaire are likely to have different attitudes about the event than those not taking the time to fill out the questionnaire.

Parameters

  • A parameter is a numerical quantity measuring some aspect of a population of scores.
  • The population mean \(\mu\) and population variance \(\sigma^2\) are commonly used parameters.
  • Another commonly used parameter is the population proportion \(\pi\).
  • (Remark : greek letters are used to designate parameters.)
  • Parameters are rarely known and are usually estimated by ***{} computed from samples.

Statistics

  • The most common use of the word `statistics’is for describing a wide range of techniques and procedures for analyzing, interpreting and displaying data.
  • In a second usage, a ``statistic" is defined as a numerical quantity (such as the sample mean) calculated from a sample.
  • Sample mean \(\bar{x}\) and sample standard deviation \(s\) are types of statistics.
  • These statistics are used to estimate population parameters.

Estimators

  • Three important attributes of statistics as estimators are:
  • unbiasedness, * consistency, * relative efficiency.
  • A statistic is unbiased if, in the long run, it’s value is reasonably close to the parameter it is estimating.
  • An estimator is consistent if the estimator tends to get closer to the parameter it is estimating as the sample size increases.

  • The efficiency of a statistic is the degree to which the statistic is stable from sample to sample.
  • That is, the less subject to sampling fluctuation a statistic is, the more efficient it is.
  • ***{} refers to the extent to which a statistic takes on different values with different samples.
  • That is, it refers to how much the statistic’s value fluctuates from sample to sample.

Inferential Statistics

  • Inferential statistics are used to draw inferences about a population from a sample.
  • Consider an experiment in which 10 subjects who performed a task after 24 hours of sleep deprivation scored 12 points lower than 10 subjects who performed after a normal night’s sleep.
  • Is the difference real or could it be due to chance?
  • How much larger could the real difference be than the 12 points found in the sample?
  • These are the types of questions answered by inferential statistics.

Sampling Distribution

Small Samples and the t- distribution

Column

The Student t- distribution

  • The \(t\) distribution is the appropriate basis for determining the standardized test statistic when the sampling distribution of the mean is normally distributed but \(s\) is not known.

  • The sampling distribution can be assumed to be normal either because the population is normal or because the sample is large enough to invoke the central limit theorem.

  • The \(t\) distribution is required when the sample is small (\(n < 30\)). For larger samples, normal approximation can be used.

  • For the critical value approach, the procedure is identical to that described of the normal distribution, except for the use of \(t\) instead of z as the test statistic.

Quantiles

The quantile function is the inverse of the cumulative distribution function. The p-quantile is the value with the property that there is probability p of getting a value less than or equal to it. The median is by definition the 50% quantile.

Theoretical quantiles are commonly used for the calculation of confidence intervals and for power calculations in connection with designing and dimensioning experiments.

Inference Procedures for Proportions

Column

Hypothesis Testing

Hypothesis Testing For Proportions

Worked Examples

Worked Example 1

A study was made of children who were hospitalized as a result of a car accident. 280 of the children were not wearing seat belts and 98 of these were seriously injured. 130 children wore seat belts and 26 were seriously injured.

Exercises
  1. Test the hypothesis that the rate of serious injury is the same for children who wear a seat belt or not. Clearly state your null and alternative hypotheses and your conclusion. Use a significance level of 5%.
Solution

Worked Example 2

63 of 100 first time convicts serving at least 3 years re-offended, whereas 70 of 140 first time convicts serving less than 3 years re-offended.

Exercises
  1. Test the null hypothesis that the re-offending rate does not depend on the length of the first sentence at a significance level of 5%.
  2. Calculate a 95% confidence interval for the difference in the re-offending rates of these two groups.
  3. Based on this confidence interval, is there any evidence that the re-offending rates differ?
Solution

Worked Example 3

It is generally assumed that older people are more likely to vote for the Conservatives than younger people. In a survey, 180 of 400 people over 40 and 120 of 400 people under 40 stated they would vote Conservative.

Exercises
  1. Do the data support this hypothesis at a significance level of 5%?
  2. Calculate a 95% confidence interval for the difference between the proportion of people over 40 voting Conservative and the proportion of people below 40 voting Conservative.
Solution

Worked Example 4

A study was carried out to compare two treatments for the flu. A total of 500 newly diagnosed flu patients were randomly assigned to one of the two treatments.

  • Of the 280 assigned to the first treatment, 168 still had the flu after 2 days after diagnosis.
  • Of the 220 assigned to the second treatment, 176 still had the flu after 2 days after diagnosis.
Exercises
  1. Test the hypothesis that the 2 day recovery rate is the same for treatment groups. Clearly state your null and alternative hypotheses and your conclusion. Use a significance level of 5%.
Solution

Testing Model Assumptions

Column

The Dixon Q test

The Dixon Q test

The Dixon Q test is used for identification and rejection of outliers. This assumes normal distribution and per Robert Dean and Wilfrid Dixon, and others, this test should be used sparingly and never more than once in a data set. To apply a Q test for bad data, arrange the data in order of increasing values and calculate Q as defined:

\[ {\displaystyle Q={\frac {\text{gap}}{\text{range}}}} \]

Where gap is the absolute difference between the outlier in question and the closest number to it. If Q > Qtable, where Qtable is a reference value corresponding to the sample size and confidence level, then reject the questionable point. Note that only one point may be rejected from a data set using a Q test.

Videos

The Grubb’s Outlier Test

The Grubb’s Outlier Test

Statisticians have devised several ways to detect outliers. Grubbs’ test is particularly easy to follow. This method is also called the ESD method (extreme studentized deviate).

The first step is to quantify how far the outlier is from the others. Calculate the ratio Z as the difference between the outlier and the mean divided by the SD. If Z is large, the value is far from the others. Note that you calculate the mean and SD from all values, including the outlier.

Since 5% of the values in a Gaussian population are more than 1.96 standard deviations from the mean, your first thought might be to conclude that the outlier comes from a different population if Z is greater than 1.96. This approach only works if you know the population mean and SD from other data. Although this is rarely the case in experimental science, it is often the case in quality control. You know the overall mean and SD from historical data, and want to know whether the latest value matches the others. This is the basis for quality control charts.

When analyzing experimental data, you don’t know the SD of the population. Instead, you calculate the SD from the data. The presence of an outlier increases the calculated SD. Since the presence of an outlier increases both the numerator (difference between the value and the mean) and denominator (SD of all values), Z does not get very large. In fact, no matter how the data are distributed, Z can not get larger than, where N is the number of values. For example, if N=3, Z cannot be larger than 1.155 for any set of values.

Grubbs and others have tabulated critical values for Z which are tabulated below. The critical value increases with sample size, as expected.

If your calculated value of Z is greater than the critical value in the table, then the P value is less than 0.05. This means that there is less than a \(5\%\) chance that you’d encounter an outlier so far from the others (in either direction) by chance alone, if all the data were really sampled from a single Gaussian distribution. Note that the method only works for testing the most extreme value in the sample (if in doubt, calculate Z for all values, but only calculate a P value for Grubbs’ test from the largest value of Z.

Once you’ve identified an outlier, you may choose to exclude that value from your analyses. Or you may choose to keep the outlier, but use robust analysis techniques that do not assume that data are sampled from Gaussian populations.

If you decide to remove the outlier, you then may be tempted to run Grubbs’ test again to see if there is a second outlier in your data. If you do this , you cannot use the same table.

.

Videos

The Anderson–Darling test

The Anderson–Darling test

The Anderson–Darling test is a statistical test of whether there is evidence that a given sample of data did not arise from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values. When applied to testing if a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality.

Differences in Means

Column

Differences in Means

Differences in Means

Examples

Hypothesis Testing

Column

The Critical Value

  • The critical value(s) for a hypothesis test is a threshold to which the value of the test statistic in sample is compared to determine whether or not the null hypothesis is rejected.
  • We will use the initials CV for the sake of brevity.* The critical value for any hypothesis test depends on the significance level at which the test is carried out, and whether the test is one-sided or two-sided.
  • The critical value is determined the exact same way as quantiles for confidence intervals; using Murdoch Barnes table 7.

Determining the Critical value

  • A pre-determined level of significance \(\alpha\) must be specified. Usually it is set at 5% (0.05).* The number of tails must be specified, either one tailed or two tailed, i.e. \(k\) is either 1 or 2.* Sample size is an issue. We must decide whether to use \(n-1\) degrees of freedom or \(\infty\) degrees of freedom, depending on the sample size in question.* The manner by which we compute critical value is identical to the way we compute quantiles.

Critical value

A critical value is any value that separates the critical region ( where we reject the null hypothesis) for that tha values of the test statistic that do not lead to a rejection of the null hypothesis.

Critical Region

Remark: The absolute value function of some value x is denoted \(|x|\). It is the magnitude of the value without consideration of whether the value is positive or negative.

  • Let TS denote Test Statistic and CV denoted Critical Value.
  • \(|TS| > CV\) Then we reject null hypothesis.
  • \(|TS| \leq CV\) Then we null hypothesis.
  • Suppose TS = 2.99, CV = 1.96
  • For our die-throw example; TS = 2.99, CV = 1.96.
  • Here \(|2.99| > 1.96\) we reject the null hypothesis that the die is fair.
  • Consider this in the context of ``proof".

The critical region ( or rejection region ) is the set of all values of % the test statistic that causes us to reject the null hypothesis.

One Way ANOVA in Experimental Design

One Way ANOVA

p-values

Inference Procedures for Proportions

Column

Hypothesis Testing

Hypothesis Testing For Proportions

Worked Examples

Worked Example 1

A study was made of children who were hospitalized as a result of a car accident. 280 of the children were not wearing seat belts and 98 of these were seriously injured. 130 children wore seat belts and 26 were seriously injured.

Exercises
  1. Test the hypothesis that the rate of serious injury is the same for children who wear a seat belt or not. Clearly state your null and alternative hypotheses and your conclusion. Use a significance level of 5%.
Solution

Worked Example 2

63 of 100 first time convicts serving at least 3 years re-offended, whereas 70 of 140 first time convicts serving less than 3 years re-offended.

Exercises
  1. Test the null hypothesis that the re-offending rate does not depend on the length of the first sentence at a significance level of 5%.
  2. Calculate a 95% confidence interval for the difference in the re-offending rates of these two groups.
  3. Based on this confidence interval, is there any evidence that the re-offending rates differ?
Solution

Worked Example 3

It is generally assumed that older people are more likely to vote for the Conservatives than younger people. In a survey, 180 of 400 people over 40 and 120 of 400 people under 40 stated they would vote Conservative.

Exercises
  1. Do the data support this hypothesis at a significance level of 5%?
  2. Calculate a 95% confidence interval for the difference between the proportion of people over 40 voting Conservative and the proportion of people below 40 voting Conservative.
Solution

Worked Example 4

A study was carried out to compare two treatments for the flu. A total of 500 newly diagnosed flu patients were randomly assigned to one of the two treatments.

  • Of the 280 assigned to the first treatment, 168 still had the flu after 2 days after diagnosis.
  • Of the 220 assigned to the second treatment, 176 still had the flu after 2 days after diagnosis.
Exercises
  1. Test the hypothesis that the 2 day recovery rate is the same for treatment groups. Clearly state your null and alternative hypotheses and your conclusion. Use a significance level of 5%.
Solution

Inference Procedures for Small Samples

Column

Small Samples

Two Small Samples Case

  • Previously we have looked at large samples, now we will consider small samples.
  • (For the sake of clarity, I will not use small samples that have a combined sample size of greater than 30.
  • Additionally we require the assumption that both samples have equal variance. This assumption be tested with another formal hypothesis test. We will revisit this later, and in the mean time, assume that the assumption of equal variance holds.

The key differences between the large sample case and the small sample cases arise in the following steps.

  • The standard error is computed in a different way (see next slide).
  • The degrees of freedom used to compute the critical value is \((n_X-1) + (n_Y - 1)\)) or equivalently (\(n_X + n_Y - 2\)).
  • Also - a formal test of equality of variances is required beforehand (End of Year Exam)

Videos

Hypothesis Testing

Hypothesis Test for the Mean of a Single Sample

This procedure is used to assess whether the population mean has a specified value, based on the sample mean. The hypotheses are conventionally written in a form similar to below (here the hypothesized population mean is zero).

There are two hypothesis test for the mean of a single sample.

  1. The sample is of a normally-distributed variable for which the population standard deviation (\(\sigma\)) is known.
  2. The sample is of a normally-distributed variable where σ is estimated by the sample standard deviation (s).

In practice, the population standard deviation is rarely known. For this reason, we will consider the second case only in this course.

In most statistical packages, this analysis is performed in the summary statistics functions.

Independent one-sample \(t\)-test}

In testing the null hypothesis that the population mean is equal to a specified value \(\mu_{0}\), one uses the statistic

\[t = \frac{\overline{x} - \mu_0}{s / \sqrt{n}}\]

where \(s\) is the sample standard deviation and \(n\) is the sample size. The degrees of freedom used in this test is \(n - 1\).

Standards Error

Two Small Samples Case: Standard Error

Computing the standard error requires a two step calculation. From the formulae, we have the two equations below. The first term \(s_p^2\) is called the pooled variance of the combined samples. \[\begin{eqnarray*} s_p^2&=&\frac{s_X^2(n_X-1)+s_Y^2(n_Y-1)}{n_X+n_Y-2}.\\ S.E.(\bar{X}-\bar{Y})&=&\sqrt{s_p^2\left(\frac{1}{n_X}+\frac{1}{n_Y}\right)}.\\ \end{eqnarray*}\]

Dixon Q-Test for Outliers

Column

Dixon Q-Test for Outliers

Dixon Q-Test for Outliers

Use the Dixon Q-Test to determine whether or not there is an outlier value present in this data set. \ .

\[\LARGE \{19, 36,33, 25, 30, 28, 31,36, 29, 37\}\]

  • Arrange the data set into ascending order.

\[19, 25, 28, 29, 30, 31, 33, 36, 36, 37\]

  • Here the potential outlier is the lowest value, i.e. 19

  • We can formally state the null and alternative hypothesis as follows

Hypotheses
  • [H\(_0\)] There are no outliers present in the data.

  • [H\(_1\)] There is one outlier present \ (i.e. the lowest value 19)

  • The test statistic for this procedure is as follows:

\[ Q_{TS} = \frac{\mbox{Gap}}{\mbox{Range}} \]

  • The gap is the difference of the outlier from the next value. In this case , the next value is 25, so the gap is \[ \mbox{Gap} = 25 - 19 = 6\]

  • The range is simply the difference between the maximum and minimum value. \[ \mbox{Range} = 37-19 =18\]

  • \[ Q_{TS} = \frac{\mbox{Gap}}{\mbox{Range}} = \frac{6}{18} =0.333 \]

  • Before we look at the critical value, we confirm the size of the data set is \(n=10\).

  • The critical value can be determined from the following table. \ \(\bullet\) The column to chose is the significance level (here 5% or 0.05 ). \ \(\bullet\) The row to use is \(n\), the number of items in the data set.


  • If the Test Statistic is greater than the Critical value, reject the null hypothesis \[ Q_{TS} > Q_{CV}\]

  • Otherwise we fail to reject the null hypothesis

Worked Example

Worked Example

Consider the data set:

\[0.189,\ 0.167,\ 0.187,\ 0.183,\ 0.186,\]\[ 0.182,\ 0.181,\ 0.184,\ 0.181,\ 0.177 \,\]

Solution

Step 1: Hypotheses for the Dixon Test.

\(H_0\) No Outlier Present in Data \(H_1\) There is an Outlier present in Data (You may identify it)


Step 2: Dixon Q Test Statistic

To apply a Q test for suspicious data, arrange the data in order of increasing values and calculate Q as defined:

\[ Q = \frac{\text{gap}}{\text{range}} \]

Where gap is the absolute difference between the outlier in question and the closest number to it.


Consider the data set:

\[0.189,\ 0.167,\ 0.187,\ 0.183,\ 0.186,\]\[ 0.182,\ 0.181,\ 0.184,\ 0.181,\ 0.177 \,\]

Now rearrange in increasing order:

\[0.167,\ 0.177,\ 0.181,\ 0.181,\ 0.182,\]\[ 0.183,\ 0.184,\ 0.186,\ 0.187,\ 0.189 \, \]

We hypothesize 0.167 is an outlier (based on it’s large gap to next number, i.e. 0.010).


Test Statistic

Calculate The Test Statistic \(Q_{Ts}\): \[ Q_{TS}=\frac{\text{gap}}{\text{range}} = \frac{0.177-0.167}{0.189-0.167}=0.455.\]


Critical Value

  • Choose the Critical Value based on sample size and significance level \(\alpha\).
  • In this table we work on the basis of confidence level. Let’s use 95% as our confidence level. \ (i.e. 5% significance, i.e. \(\alpha=0.05)\)

Step 4 : Dixon Q Test: Decison Rule

  • If \(Q_{TS} > Q_{CV}\) , where \(Q_{CV}\) is a critical value corresponding to the sample size and confidence level, then reject the null hypothesis.

  • If \(Q_{TS} \leq Q_{CV}\) , we fail to reject. null hypothesis. i.e. Not enough evidence.

  • At 95% confidence, \(Q_{TS} \leq Q_{CV}\) i.e $ 0.455 $

  • Therefore we dont have enough evidence to classify the lowest value 0.167 as an outlier.

Non-parametric Tests

Column

Introduction

Non-parametric Statistics: Terminology

An experiment conducted under controlled conditions, using factor-levels intended to estimate specified factors and their interactions. A designed experiment may be contrasted with the analysis of happenstance, historical or casual data.

A designed experiment presumes that the process being studied or the simulation being performed is stable and that the important factors in the process have been recognized. While an experiment may be performed on an unstable process, the results may not be directly useful because of large experimental error. All the important factors must be included; otherwise, the factors cannot completely describe the process.

Statistical Tables

Sample Size Estimation

Column

Sample Size Estimation

Sample Size Estimation For Proportions Worked Example