Measures of centrality give one representative number for the location of the centre of the distribution of data.
The most common measures are the sample mean and the sample median We must make a distinction between a sample mean and a population mean: The sample mean is simply the average of all the items in a sample. The population mean (often represented by the Greek letter \(\mu\)) is simply the average of all the items in a population. Because a population is usually very large, the population mean is usually an unknown constant.
The sample mean is an estimator available for estimating the population mean \(\mu\). It is a measure of location, commonly called the average, often denoted \(\bar{x}\), where \(x\) is the data set.
Its value depends equally on all of the data which may include outliers. It may not appear representative of the central region for skewed data sets.
It is especially useful as being representative of the whole sample for use in subsequent calculations. The sample mean of a data set is defined as : \[ \bar{x} = { \sum x_i\over n}\] \(\sum x_i\) is the summation of al the elements of \(x\), and \(n\) is the sample size.
Suppose we roll a die 8 times and get the following scores: \(x = \{ 5, 2, 1, 6, 3, 5, 3, 1\}\) \
What is the sample mean of the scores \(\bar{x}\)? \[ \bar{x} = {5 + 2 + 1 + 6 + 3 + 5 + 3 + 1 \over 8 } = {26 \over 8} = 3.25 \]
With an odd number of data values, for example nine, we have:
With an even number of data values, for example 8, we have:
\[ H = \frac{n}{ \frac{1}{x_1} + \frac{1}{x_2} + \ldots \frac{1}{x_n} } \]
\[ S_k = \frac{3(\mbox{Mean} - \mbox{Median} )}{\sigma} \]
What happens if you have two sets of data with two different means and two different standard deviations? How do you decide which set is more spread out? Remember the size of the standard deviation is relative to the mean it is associated with.
The coefficient of variation (cv) is often used to compare the relative dispersion between two or more sets of data. It is formed by dividing the standard deviation by the mean and is usually expressed as a percentage i.e. (multiplied by 100). Again we distinguish between the population and sample coefficient of variation.
Syllabus
Summarise a set of data using a table or frequency distribution, and display it graphically using a line plot, a box plot, a bar chart, histogram, stem and leaf plot, or other appropriate elementary device.
Describe the level/location of a set of data using the mean, median, mode, as appropriate.
Describe the spread/variability of a set of data using the standard deviation, range, interquartile range, as appropriate.
Explain what is meant by symmetry and skewness for the distribution of a set of data.
Section 1 Basic Summary Statistics
Section 2a: Descriptive Statistics
Section 2b:Â Descriptive Statistics for Grouped Data
Revision Notes
With an odd number of data values, for example nine, we have:
With an even number of data values, for example 8, we have: