Navigate

Column

Exploring Data Measures of Centrality and Dispersion	Graphical Procedures Histograms, Boxplots, Ogives	Probability and Random Variables	Probability Distributions Discrete and Continuous probability Distributions. Compound and Mixed Probability Distributions
Testing Distributional Assumptions Testing Normality and Outlier Detection	Inference Procedures Confidence Intervals and Hypothesis Testing	Bivariate Data Correlation Coefficients & Hypothesis Tests, Chi Square Test for Independence	Linear Models Simple and Multiple Linear Regression Models, Residual Analysis, Regression ANOVA
Discrete Random Variables Expectations and variance of discrete random variables, Joint Distribution of discrete random variables	Continuous Random Variables Expectations and variance of continuous random variables	More on Random Variables Moment Generating Functions, Cumulant Generating Functions	Random Number Generation
Game Theory and Decision Theory Matrix Games	Experimental Design ANOVA Procedures, Testing Model Assumptions	Survival Analysis Non parametric approaches, Cox proportional hazard models	Statistical Process Control
Time Series Analysis Time Series

Measures of Centrality

Column

Measures of Centrality

Measures of centrality give one representative number for the location of the centre of the distribution of data.

The most common measures are the sample mean and the sample median We must make a distinction between a sample mean and a population mean: The sample mean is simply the average of all the items in a sample. The population mean (often represented by the Greek letter \(\mu\)) is simply the average of all the items in a population. Because a population is usually very large, the population mean is usually an unknown constant.

Descriptive Statistics

Median

Sample Mean

The sample mean is an estimator available for estimating the population mean \(\mu\). It is a measure of location, commonly called the average, often denoted \(\bar{x}\), where \(x\) is the data set.

Its value depends equally on all of the data which may include outliers. It may not appear representative of the central region for skewed data sets.

It is especially useful as being representative of the whole sample for use in subsequent calculations. The sample mean of a data set is defined as : \[ \bar{x} = { \sum x_i\over n}\] \(\sum x_i\) is the summation of al the elements of \(x\), and \(n\) is the sample size.

Computing the sample mean

Suppose we roll a die 8 times and get the following scores: \(x = \{ 5, 2, 1, 6, 3, 5, 3, 1\}\) \

What is the sample mean of the scores \(\bar{x}\)? \[ \bar{x} = {5 + 2 + 1 + 6 + 3 + 5 + 3 + 1 \over 8 } = {26 \over 8} = 3.25 \]

Median

The other commonly used measure of centrality is the median.
The median is the value halfway through the ordered data set, below and above which there lies an equal number of data values.
For an odd sized data set, the median is the middle element of the ordered data set.
For an even sized data set, the median is the average of the middle pair of elements of an data set.
It is generally a good descriptive measure of the location which works well for skewed data, or data with outliers.
For later, the median is the 0.5 quantile, and the second quartile Q_2.

Odd Sized Data Set

With an odd number of data values, for example nine, we have:

Data : {96, 48, 27, 72, 39, 70, 7, 68, 99 }
Ordered Data : {7, 27, 39, 48, 68, 70, 72, 96, 99}
Median : 68, leaving four values below and four values above

Even Sized Data Set

With an even number of data values, for example 8, we have:

Data : {96, 48 ,27 ,72, 39, 70, 7, 68 }
Ordered Data :{7, 27, 39, 48, 68, 70, 72, 96}
Median : Halfway between the two ‘middle’ data points - in this case halfway between 48 and 68, and so the median is 58

Other Measures

Geometric Mean

Geometric Mean

Harmonic Mean

\[ H = \frac{n}{ \frac{1}{x_1} + \frac{1}{x_2} + \ldots \frac{1}{x_n} } \]

Example 1 - Find the harmonic mean of \(\{7,9\}\).
Example 2 - Find the harmonic mean of \(\{7,9,3\}\).

Exercise

Click here for Demonstration of Harmonic Mean/a>

Useful Formulas

Skewness: Pearson Coefficient of Skewness

\[ S_k = \frac{3(\mbox{Mean} - \mbox{Median} )}{\sigma} \]

The Coefficient of Variation

What happens if you have two sets of data with two different means and two different standard deviations? How do you decide which set is more spread out? Remember the size of the standard deviation is relative to the mean it is associated with.

The coefficient of variation (cv) is often used to compare the relative dispersion between two or more sets of data. It is formed by dividing the standard deviation by the mean and is usually expressed as a percentage i.e. (multiplied by 100). Again we distinguish between the population and sample coefficient of variation.

Exploratory Data Analysis

Column

Syllabus

Summarise the main features of a data set.

Syllabus

Summarise a set of data using a table or frequency distribution, and display it graphically using a line plot, a box plot, a bar chart, histogram, stem and leaf plot, or other appropriate elementary device.
Describe the level/location of a set of data using the mean, median, mode, as appropriate.
Describe the spread/variability of a set of data using the standard deviation, range, interquartile range, as appropriate.
Explain what is meant by symmetry and skewness for the distribution of a set of data.

Section 1 Basic Summary Statistics

Computing the Median of a Data Set.

Sample Mean, Median and Skewness for Financial Data - Here - [8.55]
Descriptive Statistics - Adjustment of Data and Effect on sample statistics - Here - 3.34
The Geometric Mean - HERE

Section 2a: Descriptive Statistics

Section 2b:Â Descriptive Statistics for Grouped Data

Mean of Grouped Data
Determining the median from frequency data - Â Here - [3:47]
Variance of Grouped DataÂ - HERE - [6:24]

Revision Notes

Revision Notes â Here
Variance and Standard Deviation â HereÂ Here
Coefficient of Variation â Here
Percentiles, Quartiles and the IQR â Here

Measures of Centrality

Column

Medians

Median

The other commonly used measure of centrality is the median.
The median is the value halfway through the ordered data set, below and above which there lies an equal number of data values.
For an odd sized data set, the median is the middle element of the ordered data set.
For an even sized data set, the median is the average of the middle pair of elements of an data set.
It is generally a good descriptive measure of the location which works well for skewed data, or data with outliers.
For later, the median is the 0.5 quantile, and the second quartile Q_2.

Odd Sized Data Set

With an odd number of data values, for example nine, we have:

Data : {96, 48, 27, 72, 39, 70, 7, 68, 99 }
Ordered Data : {7, 27, 39, 48, 68, 70, 72, 96, 99}
Median : 68, leaving four values below and four values above

Even Sized Data Set

With an even number of data values, for example 8, we have:

Data : {96, 48 ,27 ,72, 39, 70, 7, 68 }
Ordered Data :{7, 27, 39, 48, 68, 70, 72, 96}
Median : Halfway between the two ‘middle’ data points - in this case halfway between 48 and 68, and so the median is 58

Navigate

Column

Contents

Measures of Centrality

Column

Measures of Centrality

Descriptive Statistics

Descriptive Statistics

Median

Sample Mean

Computing the sample mean

Median

Odd Sized Data Set

Even Sized Data Set

Other Measures

Geometric Mean

Harmonic Mean

Exercise

Useful Formulas

Skewness: Pearson Coefficient of Skewness

The Coefficient of Variation

Exploratory Data Analysis

Column

Syllabus

Measures of Centrality

Column

Medians

Median

Odd Sized Data Set

Even Sized Data Set