A ***{} consists of an entire set of objects, observations, or scores that have something in common. For example, a population might be defined as students in a university.
Some populations are only hypothetical. Consider an experiment where a die is thrown 100 times and the sum of the scores was recorded.
The researcher might define a population as the sums that would result if this experiment was repeated an infinite number of times.
The population is hypothetical in the sense that it is not reasonable to repeat this experiment indefinitely.
The distribution of a population can be described by several parameters such as the mean and standard deviation.
A biased sample is one in which the method used to create the sample results in samples that are systematically different from the population.
For instance, consider a market research project on attitudes of attendees towards an event they attended.
Collecting the data by publishing a questionnaire and asking people to fill it out and send it in would produce a biased sample.
People interested enough to spend their time and energy filling out and sending in the questionnaire are likely to have different attitudes about the event than those not taking the time to fill out the questionnaire.
The \(t\) distribution is the appropriate basis for determining the standardized test statistic when the sampling distribution of the mean is normally distributed but \(s\) is not known.
The sampling distribution can be assumed to be normal either because the population is normal or because the sample is large enough to invoke the central limit theorem.
The \(t\) distribution is required when the sample is small (\(n < 30\)). For larger samples, normal approximation can be used.
For the critical value approach, the procedure is identical to that described of the normal distribution, except for the use of \(t\) instead of z as the test statistic.
The quantile function is the inverse of the cumulative distribution function. The p-quantile is the value with the property that there is probability p of getting a value less than or equal to it. The median is by definition the 50% quantile.
Theoretical quantiles are commonly used for the calculation of confidence intervals and for power calculations in connection with designing and dimensioning experiments.
A study was made of children who were hospitalized as a result of a car accident. 280 of the children were not wearing seat belts and 98 of these were seriously injured. 130 children wore seat belts and 26 were seriously injured.
63 of 100 first time convicts serving at least 3 years re-offended, whereas 70 of 140 first time convicts serving less than 3 years re-offended.
It is generally assumed that older people are more likely to vote for the Conservatives than younger people. In a survey, 180 of 400 people over 40 and 120 of 400 people under 40 stated they would vote Conservative.
A study was carried out to compare two treatments for the flu. A total of 500 newly diagnosed flu patients were randomly assigned to one of the two treatments.
The Dixon Q test is used for identification and rejection of outliers. This assumes normal distribution and per Robert Dean and Wilfrid Dixon, and others, this test should be used sparingly and never more than once in a data set. To apply a Q test for bad data, arrange the data in order of increasing values and calculate Q as defined:
\[ {\displaystyle Q={\frac {\text{gap}}{\text{range}}}} \]
Where gap is the absolute difference between the outlier in question and the closest number to it. If Q > Qtable, where Qtable is a reference value corresponding to the sample size and confidence level, then reject the questionable point. Note that only one point may be rejected from a data set using a Q test.
Statisticians have devised several ways to detect outliers. Grubbs’ test is particularly easy to follow. This method is also called the ESD method (extreme studentized deviate).
The first step is to quantify how far the outlier is from the others. Calculate the ratio Z as the difference between the outlier and the mean divided by the SD. If Z is large, the value is far from the others. Note that you calculate the mean and SD from all values, including the outlier.
Since 5% of the values in a Gaussian population are more than 1.96 standard deviations from the mean, your first thought might be to conclude that the outlier comes from a different population if Z is greater than 1.96. This approach only works if you know the population mean and SD from other data. Although this is rarely the case in experimental science, it is often the case in quality control. You know the overall mean and SD from historical data, and want to know whether the latest value matches the others. This is the basis for quality control charts.
When analyzing experimental data, you don’t know the SD of the population. Instead, you calculate the SD from the data. The presence of an outlier increases the calculated SD. Since the presence of an outlier increases both the numerator (difference between the value and the mean) and denominator (SD of all values), Z does not get very large. In fact, no matter how the data are distributed, Z can not get larger than, where N is the number of values. For example, if N=3, Z cannot be larger than 1.155 for any set of values.
Grubbs and others have tabulated critical values for Z which are tabulated below. The critical value increases with sample size, as expected.
If your calculated value of Z is greater than the critical value in the table, then the P value is less than 0.05. This means that there is less than a \(5\%\) chance that you’d encounter an outlier so far from the others (in either direction) by chance alone, if all the data were really sampled from a single Gaussian distribution. Note that the method only works for testing the most extreme value in the sample (if in doubt, calculate Z for all values, but only calculate a P value for Grubbs’ test from the largest value of Z.
Once you’ve identified an outlier, you may choose to exclude that value from your analyses. Or you may choose to keep the outlier, but use robust analysis techniques that do not assume that data are sampled from Gaussian populations.
If you decide to remove the outlier, you then may be tempted to run Grubbs’ test again to see if there is a second outlier in your data. If you do this , you cannot use the same table.
.
The Anderson–Darling test is a statistical test of whether there is evidence that a given sample of data did not arise from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values. When applied to testing if a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality.
A critical value is any value that separates the critical region ( where we reject the null hypothesis) for that tha values of the test statistic that do not lead to a rejection of the null hypothesis.
Remark: The absolute value function of some value x is denoted \(|x|\). It is the magnitude of the value without consideration of whether the value is positive or negative.
The critical region ( or rejection region ) is the set of all values of % the test statistic that causes us to reject the null hypothesis.
A study was made of children who were hospitalized as a result of a car accident. 280 of the children were not wearing seat belts and 98 of these were seriously injured. 130 children wore seat belts and 26 were seriously injured.
63 of 100 first time convicts serving at least 3 years re-offended, whereas 70 of 140 first time convicts serving less than 3 years re-offended.
It is generally assumed that older people are more likely to vote for the Conservatives than younger people. In a survey, 180 of 400 people over 40 and 120 of 400 people under 40 stated they would vote Conservative.
A study was carried out to compare two treatments for the flu. A total of 500 newly diagnosed flu patients were randomly assigned to one of the two treatments.
The key differences between the large sample case and the small sample cases arise in the following steps.
This procedure is used to assess whether the population mean has a specified value, based on the sample mean. The hypotheses are conventionally written in a form similar to below (here the hypothesized population mean is zero).
There are two hypothesis test for the mean of a single sample.
In practice, the population standard deviation is rarely known. For this reason, we will consider the second case only in this course.
In most statistical packages, this analysis is performed in the summary statistics functions.
In testing the null hypothesis that the population mean is equal to a specified value \(\mu_{0}\), one uses the statistic
\[t = \frac{\overline{x} - \mu_0}{s / \sqrt{n}}\]
where \(s\) is the sample standard deviation and \(n\) is the sample size. The degrees of freedom used in this test is \(n - 1\).
Computing the standard error requires a two step calculation. From the formulae, we have the two equations below. The first term \(s_p^2\) is called the pooled variance of the combined samples. \[\begin{eqnarray*} s_p^2&=&\frac{s_X^2(n_X-1)+s_Y^2(n_Y-1)}{n_X+n_Y-2}.\\ S.E.(\bar{X}-\bar{Y})&=&\sqrt{s_p^2\left(\frac{1}{n_X}+\frac{1}{n_Y}\right)}.\\ \end{eqnarray*}\]
Use the Dixon Q-Test to determine whether or not there is an outlier value present in this data set. \ .
\[\LARGE \{19, 36,33, 25, 30, 28, 31,36, 29, 37\}\]
\[19, 25, 28, 29, 30, 31, 33, 36, 36, 37\]
Here the potential outlier is the lowest value, i.e. 19
We can formally state the null and alternative hypothesis as follows
[H\(_0\)] There are no outliers present in the data.
[H\(_1\)] There is one outlier present \ (i.e. the lowest value 19)
The test statistic for this procedure is as follows:
\[ Q_{TS} = \frac{\mbox{Gap}}{\mbox{Range}} \]
The gap is the difference of the outlier from the next value. In this case , the next value is 25, so the gap is \[ \mbox{Gap} = 25 - 19 = 6\]
The range is simply the difference between the maximum and minimum value. \[ \mbox{Range} = 37-19 =18\]
\[ Q_{TS} = \frac{\mbox{Gap}}{\mbox{Range}} = \frac{6}{18} =0.333 \]
Before we look at the critical value, we confirm the size of the data set is \(n=10\).
The critical value can be determined from the following table. \ \(\bullet\) The column to chose is the significance level (here 5% or 0.05 ). \ \(\bullet\) The row to use is \(n\), the number of items in the data set.
If the Test Statistic is greater than the Critical value, reject the null hypothesis \[ Q_{TS} > Q_{CV}\]
Otherwise we fail to reject the null hypothesis
Consider the data set:
\[0.189,\ 0.167,\ 0.187,\ 0.183,\ 0.186,\]\[ 0.182,\ 0.181,\ 0.184,\ 0.181,\ 0.177 \,\]
\(H_0\) No Outlier Present in Data \(H_1\) There is an Outlier present in Data (You may identify it)
To apply a Q test for suspicious data, arrange the data in order of increasing values and calculate Q as defined:
\[ Q = \frac{\text{gap}}{\text{range}} \]
Where gap is the absolute difference between the outlier in question and the closest number to it.
Consider the data set:
\[0.189,\ 0.167,\ 0.187,\ 0.183,\ 0.186,\]\[ 0.182,\ 0.181,\ 0.184,\ 0.181,\ 0.177 \,\]
Now rearrange in increasing order:
\[0.167,\ 0.177,\ 0.181,\ 0.181,\ 0.182,\]\[ 0.183,\ 0.184,\ 0.186,\ 0.187,\ 0.189 \, \]
We hypothesize 0.167 is an outlier (based on it’s large gap to next number, i.e. 0.010).
If \(Q_{TS} > Q_{CV}\) , where \(Q_{CV}\) is a critical value corresponding to the sample size and confidence level, then reject the null hypothesis.
If \(Q_{TS} \leq Q_{CV}\) , we fail to reject. null hypothesis. i.e. Not enough evidence.
At 95% confidence, \(Q_{TS} \leq Q_{CV}\) i.e $ 0.455 $
Therefore we dont have enough evidence to classify the lowest value 0.167 as an outlier.
An experiment conducted under controlled conditions, using factor-levels intended to estimate specified factors and their interactions. A designed experiment may be contrasted with the analysis of happenstance, historical or casual data.
A designed experiment presumes that the process being studied or the simulation being performed is stable and that the important factors in the process have been recognized. While an experiment may be performed on an unstable process, the results may not be directly useful because of large experimental error. All the important factors must be included; otherwise, the factors cannot completely describe the process.