DESCRIPTION
This is an extended analysis of Microsoft's Excel spreadsheet program with respect to statistical applications using Excel 97, 2000, 2003 or 2007. The earlier Excel versions, Excel 97 and Excel 2000 are covered under Excel 2000. The current versions (Excel 2003 and Excel 2007) are covered under Excel 2003. There were essentially no changes in the 2003-version statistical functions and routines for the 2007 version. Microsoft currently supports both the 2003 and 2007 versions with upgrades.
The statistical functions and routines in Excel are only an elementary and incomplete set of statistical analysis tools. These tools come from the 1960's. They have not been basically updated since Excel version 4.0.
Excel is a very convient method of collecting and organizing data. However the statistical analysis of data using ONLY the provided statistical tools, may give misleading, incomplete or wrong results.
Also see the review of Excel 2007 in "Computational Statistics and Data Analysis 52 (2008) 4568-4606" (www.elsevier.com/locate/csda), "On the accuracy of statistical procedures in Microsoft Excel 2007", McCullough and Heiser. This set of articles has some additional faults in Excel that are not yet covered in the sections below.
What's new for Excel 2007
The primary reasons for the new Excel 2007 version was to patch up lagging Office sales and to have a version that matches the look-and-feel of Vista. Excel 2007 offers improved interfaces with other applications (such as SQL, Web Access, Office SharePoint Server 2007 and Office XML file formats). The 2007 version includes some changes (KBA 829208) made for Excel 2004, the MacIntosh version of Excel 2003.
To most users, the changes are one of appearance, rather than capabilities. The changes in graphics however are major, and requires learning a whole new set of menus and sequences.
1. All the default charts in Excel 2007 are bad.
2. All the Excel 2007 charts are loaded with chartjunk, built-in and in options. It takes effort to remove it.
3. There are new, complex sets of menus and sub-menus, making it difficult to build clean charts.
4. Charts and data links built in previous versions will probably not work in 2007. You will need to start for scratch in Excel 2007.
5. The default charts are all characteristically fuzzy and indistinct. Colors are all pastels. There is a lack of preciseness.
6. Microsoft does not support many common statistical charts and related graphics.
SCOPE
The sections and notes are frequently changed, reflecting added tests, addition of new material, correction of errors, changing of text and modifying results, incorporating valuable insights from emails.
Section 9 was updated 1/31/08 to reflect some new findings on singularity
If you have any comments, or have noted some errors or faults with Excel or my findings, please send it by email to me at either "d_heiser@att.net" or "daheiser594@gmail.com" The previous email address is no longer valid.
CONTENTS
I. Introduction (update, 11/28/07)
II. General Problems With Excel (update, 8/28/07)
III. Excel Computation Issues (update, 9/30/07)
IV. The Testing Program For Accuracy (update, 11/27/07)
V. Univariate Analysis (update, 3/18/08)
VI. Analysis of Variance (ANOVA) (update, 8/28/07)
VII. Covariance and Correlation (update, 8/28/07)
VIII. Regression: The Fitting Of A Model To Data (update, 8/28/07)
IX. Linear Regression In Excel (update, 1/31/08)
X. What-If Solution Tools (update, 8/28/07)
XI. Nonlinear Regression (update, 11/1/07)
XII. Statistical Distributions and Related Functions (update, 8/28/07)
XIII. Testing for Accuracy and Reliability of Statistical Distributions (update, 8/26/07)
XIV-1. Results of New Tests on Discrete Distributions (update, 8/28/07)
XIV-2. Results of New Tests on Continuous Density Distributions (update, 8/28/07)
XIV-3. Results of New Tests on Continuous Cumulative Distributions (update, 8/26/07)
XIV-4. Results of New Tests on Continuous Inverse Distributions (update, 8/26/07)
XV. Statistical Tests, Tests of Significance and Tests of a Hypothesis (update, 8/28/07)
XVI. Random Number Generators (update, 8/28/07)
XVII. The Data Analysis Tool Routines (update, 9/2/07)
XVIII. Graphics, Charts and Visual Displays (update 1/17/08)
XIX. Add-In Programs. Functions and Routines (unfinished)
XX. Bibleography (updated, 6/27/08)
XXI. Closing Remarks (unfinished)
NOTES (Revised 6/25/08)
Note A: Comments On Teaching/Using Excel
Note B: Excel Versions and Sources
Note C: Microsoft Knowledge Base Articles (KBA's)
Note D: Excel Help From The Internet
Note E: Guide To Excel Statistical Functions, Routines and Tools
Note H: Some Specific Lists Of Excel Faults
Note I: Improving Documentation
Note J: Ordinal, Nominal and Likert Scale Variables
Note K: New Display Modified Probability Distributions
Note M: An Actual Problem Requiring Unbiased Standard Deviations
Note N: Ranking, Quartiles, Medians and Percentiles
Note O: Averages, Standard Deviations and Pre-centering
Note Q: Data Entry For ANOVAs In The Data Analysis Tool Pac
Note S: Linear Regression Throught The Origin, Excel 2000
Note T: Singularity, Multi-colinearity, Accuracy And Other Matrix Problems
Note V: Regression Normal Probability Plot
Note W: Standardized Residuals
Note X: Support In Excel For Tests Of Significance
Note Y: Constructing A Hypothesis
Note Z: Setting Up The Excel Sheet For Calculating P Values
Note AA: Generate Diehard Test Input Files
Note AB: Diehard-II Output For RAND, Excel 2000
Note AC: Marsaglia's MWC256 RNG
Note AD: Diehard-II And -III Output For RAND, Excel 2003
Note AE: Random Number Generator VBA Routines
Note AF: The Wilkinson-Sawitski Series Of Tests On Excel 2007
Note XN: XNUMBERS, A Multi-precision Floating Point Calculus For Excel
ACKNOWLEDGEMENTS
I would also like to thank Jerry W. Lewis, for his assistance in tests on statistical distributions.
He also steered me to Smith's set of accurate probability distributions.
I thank the Journal "Computational Statistics and Data Analysis" (Elsevier) for including a short note in the
November 2006 issue on this website.
Thanks to Leonardo Volpi for information on some problems in Excel, on his XNUMBERS extended precision
add-in and the optimization and non-linear equation add-ins.
Sawilowsky's "Journal of Modern Applied Statistical Methods" has been a good source on articles that
directly relate to the problems of statistical analysis.
Also to those who sent in comments and corrections to my text.
For the 2007 version, thanks to Christopher L. Paras on the chart output graph fault in the Data Analysis Tool Pac Regression routine,
Thanks to Joseph Billo 10/2/07 on different ways to improve the Solver fit to non-linear equations.
COMMENTS AND REMARKS
This effort to test Excel started in 2001, based on McCullough's articles in "The American Statistician" in
1998, and in suceeding articles in other publications.
This URL also includes material on the free Excel add-ins from the "Foxes" for
optimization and non-linear equation fits. Also included is the extended
precision computation package (xnumbers) for Excel. These extend the
basic Excel computational capabilities.
CORRECTIONS AND CHANGES
Sections 1 through 21 are complete rewrites of the first edition. Most of the
structure, text, tables and figures are the same as the first edition. I currently have
both the Excel 2003 and 2007 versions loaded on my computer, so I can check out
comments from viewers on either version. There are differences between Excel 2003 and
Excel 2007.
CORRECTIONS AND CHANGES TO THE THIRD EDITION:
Stuart MacDonald on 6/3/07 pointed out a numerical error on page 4 of note AA
regarding XNUMBERS. The correction was made both in Note AA and on Volpi's XNUMBERS Website.
June 6, 2007.
Correction to Bibleography and Section 18 reference, Harold Henderson, 1/17/08.
Numerical errors in table 5.5, was noted by Michelle Chutka, 3/18/08. The errors have been corrected.