4 Describing Data and Results in development

 

You have already identified (step 1) your aims, factors and variables in relation to measuring the system that you are investigating.

 

ACTION: You now need to

If you have multivariate data (i.e. more than one response variable) then you should also look at Step 8.

 

 

Set of replicate measurements:

 

A set of replicate data values is a number of measurements of the 'same thing', e.g. several measurements of pH from the same place in a river, salaries of randomly selected workers in the same factory.

 

The 'average' or 'middle' value is best given by:

> Mean for normally distributed data or for data that is not skewed heavily in one direction

> Median for data that may not be distributed normally, e.g. typical distribution of salaries is skewed, with a long tail, towards high values.

 

The 'spread' of data values is best given by:

> Standard deviation for normally (or near normally) distributed data

> Interquartile range for any type of data

> Maximum and minimum values and the range are only useful in specific cases where limits are important

> Outliers are data values that are so far from the 'middle' value that they might be errors or anomalies.

Notes:

The boxplot is a graphical picture of the interquartile range, median, extreme values and outliers, and is an excellent way of illustrating any data set.

 

The 'best-estimate' value of the variable being sampled, together with its uncertainty, is best given by:

> Mean plus either standard error of the mean (standard uncertainty) or confidence interval for normal data.

> Median plus confidence interval of the median for non-normal (particularly skewed) data.

Notes:

Confidence interval of the median can be calculated using Minitab

 

Distribution of replicated data values is best described by

> Frequency column graph of data values gives a direct visual representation

> Skewness measures the extent to which the data is not symmetrical about the median value

> Kurtosis measured the extent to which the data might be peaked or flattened.

> Results of a normality test - see Testing

Notes:
Values of skewness and particularly kurtosis are only useful for large data sets, and would not be routinely used.

Performing a normality test can be useful to check whether there may be a significant deviation in the data from a normal distribution.

 

 

Variation of one variable with respect to another:

 

Variation of a scale variable wrt to another scale variable

> X-y scatter graph with the response (dependent) variable on the y-axis

Notes:

Do NOT use a line graph in Excel - the x-axis has categorical values and not scale values

Do NOT join the data values with a line. A 'best-fit' line can be used - see modelling.

Error bars can (should) be given usually using one (sometimes two) standard deviations, but must be specified in the key.

 

Variation of a scale variable wrt to an ordinal variable

> Line graph

> X-y graph

Notes:

 

Variation of a scale variable wrt to an nominal variable

> Variety of options are available - column graph, pie chart

Notes:

 

Frequency data

 

Cross-tabulation

 

Multifactorial data

 

Factor plots

Interaction plots