Measures of Centrality

Column

Measures of Centrality

Measures of centrality give one representative number for the location of the centre of the distribution of data.

The most common measures are the sample mean and the sample median We must make a distinction between a sample mean and a population mean: The sample mean is simply the average of all the items in a sample. The population mean (often represented by the Greek letter \(\mu\)) is simply the average of all the items in a population. Because a population is usually very large, the population mean is usually an unknown constant.

Median

Sample Mean

The sample mean is an estimator available for estimating the population mean \(\mu\). It is a measure of location, commonly called the average, often denoted \(\bar{x}\), where \(x\) is the data set.

Its value depends equally on all of the data which may include outliers. It may not appear representative of the central region for skewed data sets.

It is especially useful as being representative of the whole sample for use in subsequent calculations. The sample mean of a data set is defined as : \[ \bar{x} = { \sum x_i\over n}\] \(\sum x_i\) is the summation of al the elements of \(x\), and \(n\) is the sample size.

Computing the sample mean

Suppose we roll a die 8 times and get the following scores: \(x = \{ 5, 2, 1, 6, 3, 5, 3, 1\}\) \

What is the sample mean of the scores \(\bar{x}\)? \[ \bar{x} = {5 + 2 + 1 + 6 + 3 + 5 + 3 + 1 \over 8 } = {26 \over 8} = 3.25 \]

Median

  • The other commonly used measure of centrality is the median.
  • The median is the value halfway through the ordered data set, below and above which there lies an equal number of data values.
  • For an odd sized data set, the median is the middle element of the ordered data set.
  • For an even sized data set, the median is the average of the middle pair of elements of an data set.
  • It is generally a good descriptive measure of the location which works well for skewed data, or data with outliers.
  • For later, the median is the 0.5 quantile, and the second quartile Q_2.

Odd Sized Data Set

With an odd number of data values, for example nine, we have:

  • Data : {96, 48, 27, 72, 39, 70, 7, 68, 99 }
  • Ordered Data : {7, 27, 39, 48, 68, 70, 72, 96, 99}
  • Median : 68, leaving four values below and four values above

Even Sized Data Set

With an even number of data values, for example 8, we have:

  • Data : {96, 48 ,27 ,72, 39, 70, 7, 68 }
  • Ordered Data :{7, 27, 39, 48, 68, 70, 72, 96}
  • Median : Halfway between the two ‘middle’ data points - in this case halfway between 48 and 68, and so the median is 58

Other Measures

Geometric Mean

Harmonic Mean

\[ H = \frac{n}{ \frac{1}{x_1} + \frac{1}{x_2} + \ldots \frac{1}{x_n} } \]

  • Example 1 - Find the harmonic mean of \(\{7,9\}\).
  • Example 2 - Find the harmonic mean of \(\{7,9,3\}\).
Exercise

Useful Formulas

Skewness: Pearson Coefficient of Skewness

\[ S_k = \frac{3(\mbox{Mean} - \mbox{Median} )}{\sigma} \]

The Coefficient of Variation

What happens if you have two sets of data with two different means and two different standard deviations? How do you decide which set is more spread out? Remember the size of the standard deviation is relative to the mean it is associated with.

The coefficient of variation (cv) is often used to compare the relative dispersion between two or more sets of data. It is formed by dividing the standard deviation by the mean and is usually expressed as a percentage i.e. (multiplied by 100). Again we distinguish between the population and sample coefficient of variation.

Exploratory Data Analysis

Measures of Centrality

Column

Medians

Median

  • The other commonly used measure of centrality is the median.
  • The median is the value halfway through the ordered data set, below and above which there lies an equal number of data values.
  • For an odd sized data set, the median is the middle element of the ordered data set.
  • For an even sized data set, the median is the average of the middle pair of elements of an data set.
  • It is generally a good descriptive measure of the location which works well for skewed data, or data with outliers.
  • For later, the median is the 0.5 quantile, and the second quartile Q_2.

Odd Sized Data Set

With an odd number of data values, for example nine, we have:

  • Data : {96, 48, 27, 72, 39, 70, 7, 68, 99 }
  • Ordered Data : {7, 27, 39, 48, 68, 70, 72, 96, 99}
  • Median : 68, leaving four values below and four values above

Even Sized Data Set

With an even number of data values, for example 8, we have:

  • Data : {96, 48 ,27 ,72, 39, 70, 7, 68 }
  • Ordered Data :{7, 27, 39, 48, 68, 70, 72, 96}
  • Median : Halfway between the two ‘middle’ data points - in this case halfway between 48 and 68, and so the median is 58