Distribution of Variables
It is common practice to take several,
replicate, measurements of the same variable - this allows for
experimental
variations to be taken into account in the analysis.
Before you can decide how to plan your experiment and analyse your data, you
need to think about how and why your own data is likely to vary.
ACTION: You need to consider
whether the source of variation is subject-based or measurement-based, and
how the values of each of your different variables will be distributed.
Main Categories of Variability:
Subject variation - the different measurements may
relate to different subjects. For example, when comparing the growths of plants
under different soil conditions, the replicate measurements would be the heights
of different plants grown in the same soil conditions. In this case, the
distribution of values will be the distribution in heights of the plants.
Measurement variation - the different measurements relate to repeated measurements of the same subject. For example, the single subject may be a drug sample, but then replicate experimental analyses may be performed to get a best-estimate of the true active content. In this case, the distribution of values will be due to slight variations in the experimental measurements.
Normal Distribution:
The starting point is often to ask yourself - 'Is my data normally distributed?'
For example, if you use a spectrophotometer to make repeated (replicate) measurements of the absorbance of the same solution, you might expect to get a range of slightly different values centred symmetrically on a mean value. The greatest number of values would be close to the mean value with a decreasing probability of finding a value at greater distances from the mean.
This is the well-known 'bell-shaped' curve described by the normal distribution.
You can link here for further information on performing a Statistical Test for Normality
Many techniques of statistical analysis assume that the data is normally distributed, e.g. t-tests, ANOVAs.
Thankfully, most repeated experimental variations do follow a normal distribution, particularly when the random variations are small (~<20%) compared to the mean value.
Deciding that your data is Not Normal:
It is important for you to know if your data is not normally distributed:
If your variations are subject-based, then you
must consider the known statistical variations in your subject. For example,
many biological systems often have a skewed growth distribution, with longer
tails towards the high growth side of the mean. In some cases, skewed
distributions can be 'transformed' to near normal distributions by using a
transformation process.
If you are analysing proportions/percentages,
the distribution becomes skewed for proportions near 0 or 1, and it can be
necessary to use a transformation to get a normal distribution.
Where you suspect that the data might not be
normal, then you can use a test for normality. These hypothesis tests
will not confirm that your data is normal, but will only indicate if it is
significantly not normal. The tests do not work well with only a few
data values.
If you decide that it is safest to assume that your
data is not normal, then you will need to use appropriate statistical
analyses, e.g. non-parametric tests.
For student projects, you might wish to demonstrate your understanding of standard parametric tests (e.g. t-tests, ANOVAs) by using them with your non-normal data, but you must indicate that the results will be unreliable.
Transforming Data to a Near Normal Distribution:
For certain non-normal distributions, It is possible to apply a mathematical transformation that will allow the data to be treated as though it were normal.
See 8 Data Analysis > Data Transformation