Hypothesis Testing for Proportions

This section of the website is being actively developed. Please send any comments/suggestions to graham.currell@uwe.ac.uk
Study Text: "Essential Mathematics and Statistics for Science", 2nd ed, G Currell and A A Dowman (Wiley-Blackwell)

Introduction

An experimental proportion, P, is defined by observing r specific outcomes (or ‘events’) occurring in a total of n ‘trials’:
P = r / n
For example, (r =) 11 red cars out of a total of (n =) 87 cars would represent a proportion of (P = r/n =) 11/87 = 0.126 (3 sf).
(we use a capital P here to avoid confusion with the p-value test result)

It is important to remember that the test calculations use the raw data values of r events and n trials, and these values must be retained from any experiment.

This guide uses a case study (below) to introduce two main analytical processes for calculating and testing for differences in proportions:
1. Fisher’s Exact Test with exact calculations based on binomial statistics.
2. Approximate calculations based on the normal distribution

An advantage of using the normal distribution approximation was that the calculations can be performed easily without using software, but Minitab (for example) can now use either method very easily.
The normal approximation can produce inaccurate results when the number of events, n, is low or when the proportion, P, is close to either 0.0 or 1.0, but the method can be useful for some particular calculations (e.g. Minitab uses the normal approximation when a non-zero difference between two proportions is entered for the null hypothesis).

An introduction to tests for proportion is given in Section 14.3 (p343) of the Study Text. The relevant theory for using the normal approximation can be viewed via the link – Tests for Proportion - using Normal Distribution.

Case Study

In association with a drugs investigation, police search two houses A and B. In house A they find a store of 500 sterling £20 notes, and in house B they find a store of 100 sterling £20 notes.
They analyse the notes to identify those with significant drug contamination, and find that, in house A129 of the 500 notes were contaminated and, in house B, 18 of the 100 notes were contaminated.
It is estimated that the proportion of contaminated notes in normal circulation in the surrounding area is 0.09.

The occupants of house B claim that their notes came from the general circulation in the local area, but the police claim that the notes in house B showed increased contamination because they came from the same source as those in house A.

It is possible to perform two tests with the following hypotheses:

One Proportion Test

Null Hypothesis: The notes in house B came from a source population with the same proportion of contaminated notes as in the local area.
Proposed Hypothesis: The notes in house B came from a source population with a greater proportion of contaminated notes than in the local area.

Two Proportion Test

Null Hypothesis: The notes in houses A and B came from source populations with the same proportion of contaminated notes.
Proposed Hypothesis: The notes in houses A and B came from source populations with different proportions of contaminated notes.

In the above tests, assume that the notes in each sample are randomly distributed. This on-line Study Guide has been developed by Graham Currell in association with:
University of the West of England,
"Essential Mathematics and Statistic for Science", 2nd Edition,
Graham Currell and Antony Dowman, Wiley-Blackwell, 2009