Data Analysis, Error and Uncertainty - data analysis using Excel Produced by Graham Currell, University of the West of England, Bristol, and David Read, University of Southampton, in association with:
Royal Society of Chemistry, 'Discover Maths for Chemists' website
, and

Essential Mathematics and Statistics for Science, 2nd Edition
Graham Currell and Antony Dowman, Wiley-Blackwell, 2009

This study unit uses video to demonstrate the use of Excel in the analysis of experimental data and its uncertainty. The Excel files used in the data analysis examples and videos can be downloaded here:
ExcelDataUncert01.xlsx for analyses 1 and 2, and
BeersLaw.xls for analysis 3

The study unit is divided into four main sections:

Introduction - provides an overview of the important methods of data analysis using Excel, together with links to video tutorials on basic skills and self-assessment study guide/tutorials on linear regression.

1. Analysis of replicate data - demonstrates the use of equations, functions and data analysis tools, to interpret the results of repeated measurements of a single experimental value. The data represents replicate measures of the pressure, p, of a gas.

2. Analysis of linear data - demonstrates the use of regression analysis and graphical presentation to interpret the experimental results for a linear relationship between two variables. The data uses the variation of pressure, p, against temperature, T, of an ideal gas.

3. Analysis of linear calibration data - demonstrates the analysis of spectrophotometric data, using correlation coefficients, data residuals, and a calculation of the 95% confidence interval of the measurement of concentration using the calibration line of best-fit.

Introduction

It is possible to:

• Use Excel functions to perform specific calculations
e.g. =SQRT(B4) will calculate the square root of the value in cell B4.
• Write equations directly into Excel cells,
e.g. =B5*B6/SQRT(B4) will multiply the contents of B5 and B6 and divide by the square root of B4.
• Use Data Analysis tools. These are not normally loaded when Excel is first installed, but can be added later (see video) using Add-Ins.

Additional video help on the use of Excel 2007 is available:

Important notes on calculations in Excel:

• Equations and functions are dynamic. Their results change if the source data is changed, e.g. the value of SQRT(B4) will change if the value in B4 is changed.
• The calculations in the Data Analysis tools are static. They are a single-shot calculation, whose results do NOT change if the source data changes.

Study Guide for the straight line and linear regression:

Analysis 1: Experimental uncertainty (error) in replicate measurements

It is common (and good) practice to obtain repeated (replicate) measurements of the same experimental value, e.g. measurements of gas pressure (in pascals):

 p / Pa 5246 5471 5821 5321 5537 5329

This data is analysed using:

Equations and functions to calculate - see video Replicate Measurements A

• Mean value, x, (p171) using function AVERAGE(data)
• Sample standard deviation, s, (p173) using function STDEV(data)
• Sample size, n, using function COUNT(data)
• Standard uncertainty, u, (p221) using the equation, u = s/√n
• Degrees of freedom, df, (p222) using the equation, df = n - 1
• t-value (p222) for 95% confidence using the function = TINV(0.05,df)
• 95% Confidence deviation, Cd, (p221) using the equation, Cd  = t*s/√n
• 95% Confidence interval (p221), minimum and maximum values, using CI = x ± Cd
(NB Do NOT use the function CONFIDENCE for experimental sample data)

(The page numbers given in brackets above refer to the text: Essential Mathematics and Statistics for Science, 2nd Ed, by G Currell and A A Dowman)

The Descriptive Statistics option in Data Analysis tools to - see video Replicate Measurements B

• Calculate the same values as obtained by the equations and functions above

Excel demonstration of the effect of random experimental variations - see video Replicate Measurements C

Analysis 2: Experimental uncertainty (error) in simple linear data plot

A typical set of linear data can be described by the change of the pressure, p, (in pascals) of an ideal gas as a function of the temperature, T, in degrees kelvin.

 T /K 298 328 358 388 418 448 478 508 p /Pa 5606 5890 6405 6997 7172 8160 8518 9218

Important Note:
Regression calculations to find the line of ‘best-fit’ assume that the main errors (uncertainties) occur in the y-values with negligible error in the x-values.

This data is analysed using:

X-Y plot of data including  - see video Regression Analysis A

• Trendlines projected back to the y-axis
• Forcing the trendline through the origin of the graph
• Error bars based on a specified value

Functions to calculate  - see video Regression Analysis A

• Slope of the trendline using SLOPE(y-data,x-data)
• Intercept (on the y-axis) of the trendline using INTERCEPT(y-data,x-data)
• Slope of a trendline which is forced through the origin using LINEST(y-data,x-data,FALSE)
• Standard deviation of the experimental data using STEYX(y-data,x-data)

Data Analysis tools to perform a Regression analysis to  - see video Regression Analysis A

• Calculate the same values as obtained by the functions above
• Uncertainties in the values for slope and intercept

Excel demonstration of the effect of random experimental variations, including - see video Regression Analysis B

• Error bars based on a data range

Analysis 3: Experimental uncertainty (error) in using linear calibration

A typical set of linear data can be described by the plot of the absorbance, A, of a solution as a function of its concentration, C, in mg dm-3.

 C / mg dm-3 0 20 40 60 80 100 120 A 0 0.267 0.583 0.824 1.12 1.313 1.499

A solution of unknown concentration gives 3 replicate measures of absorbance: 0.763, 0.741 and 0.749, and we use the calibration data to calculate a best-estimate for the true concentration of this solution.

This data is analysed using:

An analysis of Residuals and Corrrelation coefficients to  - see video Beers Law v1

• Identify curvature of the calibration data at high concentrations (x-values)
• Select an appropriate linear calibration range

Functions and equations to calculate  - see video Beers Law v2

• Best estimate of the unknown concentration using both a free-fit trendline and a trendline forced through the origin.

Functions and equations to calculate  - see video Beers Law v3

• Standard uncertainty in the calculated concentration, given by or • 95% Confidence Interval for the true value of  the unknown concentration

(Refer to section 13.3.3 in the text: Essential Mathematics and Statistics for Science, 2nd Ed, by G Currell and A A Dowman)