4 Describing Data and Results in development


You have already identified (step 1) your aims, factors and variables in relation to measuring the system that you are investigating.


ACTION: You now need to

If you have multivariate data (i.e. more than one response variable) then you should also look at Step 8.



Set of replicate measurements:


A set of replicate data values is a number of measurements of the 'same thing', e.g. several measurements of pH from the same place in a river, salaries of randomly selected workers in the same factory.


The 'average' or 'middle' value is best given by:

> Mean for normally distributed data or for data that is not skewed heavily in one direction

> Median for data that may not be distributed normally, e.g. typical distribution of salaries is skewed, with a long tail, towards high values.


The 'spread' of data values is best given by:

> Standard deviation for normally (or near normally) distributed data

> Interquartile range for any type of data

> Maximum and minimum values and the range are only useful in specific cases where limits are important

> Outliers are data values that are so far from the 'middle' value that they might be errors or anomalies.


The boxplot is a graphical picture of the interquartile range, median, extreme values and outliers, and is an excellent way of illustrating any data set.


The 'best-estimate' value of the variable being sampled, together with its uncertainty, is best given by:

> Mean plus either standard error of the mean (standard uncertainty) or confidence interval for normal data.

> Median plus confidence interval of the median for non-normal (particularly skewed) data.


Confidence interval of the median can be calculated using Minitab


Distribution of replicated data values is best described by

> Frequency column graph of data values gives a direct visual representation

> Skewness measures the extent to which the data is not symmetrical about the median value

> Kurtosis measured the extent to which the data might be peaked or flattened.

> Results of a normality test - see Testing

Values of skewness and particularly kurtosis are only useful for large data sets, and would not be routinely used.

Performing a normality test can be useful to check whether there may be a significant deviation in the data from a normal distribution.



Variation of one variable with respect to another:


Variation of a scale variable wrt to another scale variable

> X-y scatter graph with the response (dependent) variable on the y-axis


Do NOT use a line graph in Excel - the x-axis has categorical values and not scale values

Do NOT join the data values with a line. A 'best-fit' line can be used - see modelling.

Error bars can (should) be given usually using one (sometimes two) standard deviations, but must be specified in the key.


Variation of a scale variable wrt to an ordinal variable

> Line graph

> X-y graph



Variation of a scale variable wrt to an nominal variable

> Variety of options are available - column graph, pie chart



Frequency data




Multifactorial data


Factor plots

Interaction plots