Distribution of Variables

It is common practice to take several, replicate, measurements of the same variable - this allows for experimental variations to be taken into account in the analysis.
Before you can decide how to plan your experiment and analyse your data, you need to think about how and why your own data is likely to vary.

ACTION: You need to consider

• whether the source of variation is subject-based or measurement-based, and

• how the values of each of your different variables will be distributed.

Main Categories of Variability:

• Subject variation - the different measurements may relate to different subjects. For example, when comparing the growths of plants under different soil conditions, the replicate measurements would be the heights of different plants grown in the same soil conditions. In this case, the distribution of values will be the distribution in heights of the plants.

• Measurement variation - the different measurements relate to repeated measurements of the same subject. For example, the single subject may be a drug sample, but then replicate experimental analyses may be performed to get a best-estimate of the true active content. In this case, the distribution of values will be due to slight variations in the experimental measurements.

Normal Distribution:

The starting point is often to ask yourself - 'Is my data normally distributed?'

For example, if you use a spectrophotometer to make repeated (replicate) measurements of the absorbance of the same solution, you might expect to get a range of slightly different values centred symmetrically on a mean value. The greatest number of values would be close to the mean value with a decreasing probability of finding a value at greater distances from the mean.

This is the well-known 'bell-shaped' curve described by the normal distribution.

You can link here for further information on performing a Statistical Test for Normality

Many techniques of statistical analysis assume that the data is normally distributed, e.g. t-tests, ANOVAs.

Thankfully, most repeated experimental variations do follow a normal distribution, particularly when the random variations are small (~<20%) compared to the mean value.

Deciding that your data is Not Normal:

It is important for you to know if your data is not normally distributed:

• If your variations are subject-based, then you must consider the known statistical variations in your subject. For example, many biological systems often have a skewed growth distribution, with longer tails towards the high growth side of the mean. In some cases, skewed distributions can be 'transformed' to near normal distributions by using a transformation process.

• If you are analysing proportions/percentages, the distribution becomes skewed for proportions near 0 or 1, and it can be necessary to use a transformation to get a normal distribution.

• Where you suspect that the data might not be normal, then you can use a test for normality. These hypothesis tests will not confirm that your data is normal, but will only indicate if it is significantly not normal. The tests do not work well with only a few data values.

• If you decide that it is safest to assume that your data is not normal, then you will need to use appropriate statistical analyses, e.g. non-parametric tests.

• For student projects, you might wish to demonstrate your understanding of standard parametric tests (e.g. t-tests, ANOVAs) by using them with your non-normal data, but you must indicate that the results will be unreliable.

Transforming Data to a Near Normal Distribution:

For certain non-normal distributions, It is possible to apply a mathematical transformation that will allow the data to be treated as though it were normal.

See 8 Data Analysis > Data Transformation