MICROSOFT EXCEL 2000, 2003 AND 2007 FAULTS, PROBLEMS, WORKAROUNDS AND FIXES

DESCRIPTION

This is an extended analysis of Microsoft's Excel spreadsheet program with respect to statistical applications using Excel 97, 2000, 2003 or 2007. The earlier Excel versions, Excel 97 and Excel 2000 are covered under Excel 2000. The current versions (Excel 2003 and Excel 2007) are covered under Excel 2003. There were essentially no changes in the 2003-version statistical functions and routines for the 2007 version. Microsoft currently supports both the 2003 and 2007 versions with upgrades.

The statistical functions and routines in Excel are only an elementary and incomplete set of statistical analysis tools. These tools come from the 1960's. They have not been basically updated since Excel version 4.0.

Excel is a very convient method of collecting and organizing data. However the statistical analysis of data using ONLY the provided statistical tools, may give misleading, incomplete or wrong results.

Also see the review of Excel 2007 in "Computational Statistics and Data Analysis 52 (2008) 4568-4606" (www.elsevier.com/locate/csda), "On the accuracy of statistical procedures in Microsoft Excel 2007", McCullough and Heiser. This set of articles has some additional faults in Excel that are not yet covered in the sections below. What's new for Excel 2007

The primary reasons for the new Excel 2007 version was to patch up lagging Office sales and to have a version that matches the look-and-feel of Vista. Excel 2007 offers improved interfaces with other applications (such as SQL, Web Access, Office SharePoint Server 2007 and Office XML file formats). The 2007 version includes some changes (KBA 829208) made for Excel 2004, the MacIntosh version of Excel 2003.

To most users, the changes are one of appearance, rather than capabilities. The changes in graphics however are major, and requires learning a whole new set of menus and sequences.

1. All the default charts in Excel 2007 are bad.

2. All the Excel 2007 charts are loaded with chartjunk, built-in and in options. It takes effort to remove it.

3. There are new, complex sets of menus and sub-menus, making it difficult to build clean charts.

4. Charts and data links built in previous versions will probably not work in 2007. You will need to start for scratch in Excel 2007.

5. The default charts are all characteristically fuzzy and indistinct. Colors are all pastels. There is a lack of preciseness.

6. Microsoft does not support many common statistical charts and related graphics.

SCOPE

The sections and notes are frequently changed, reflecting added tests, addition of new material, correction of errors, changing of text and modifying results, incorporating valuable insights from emails.

Section 9 was updated 1/31/08 to reflect some new findings on singularity

If you have any comments, or have noted some errors or faults with Excel or my findings, please send it by email to me at either "d_heiser@att.net" or "daheiser594@gmail.com" The previous email address is no longer valid.

CONTENTS

I. Introduction (update, 11/28/07)

II. General Problems With Excel (update, 8/28/07)

III. Excel Computation Issues (update, 9/30/07)

IV. The Testing Program For Accuracy (update, 11/27/07)

V. Univariate Analysis (update, 3/18/08)

VI. Analysis of Variance (ANOVA) (update, 8/28/07)

VII. Covariance and Correlation (update, 8/28/07)

VIII. Regression: The Fitting Of A Model To Data (update, 8/28/07)

IX. Linear Regression In Excel (update, 1/31/08)

X. What-If Solution Tools (update, 8/28/07)

XI. Nonlinear Regression (update, 11/1/07)

XII. Statistical Distributions and Related Functions (update, 8/28/07)

XIII. Testing for Accuracy and Reliability of Statistical Distributions (update, 8/26/07)

XIV-1. Results of New Tests on Discrete Distributions (update, 8/28/07)

XIV-2. Results of New Tests on Continuous Density Distributions (update, 8/28/07)

XIV-3. Results of New Tests on Continuous Cumulative Distributions (update, 8/26/07)

XIV-4. Results of New Tests on Continuous Inverse Distributions (update, 8/26/07)

XV. Statistical Tests, Tests of Significance and Tests of a Hypothesis (update, 8/28/07)

XVI. Random Number Generators (update, 8/28/07)

XVII. The Data Analysis Tool Routines (update, 9/2/07)

XVIII. Graphics, Charts and Visual Displays (update 1/17/08)

XIX. Add-In Programs. Functions and Routines (unfinished)

XX. Bibleography (updated, 6/27/08)

XXI. Closing Remarks (unfinished)

Sample Excel 2003 Files

NOTES (Revised 6/25/08)

Note A: Comments On Teaching/Using Excel

Note B: Excel Versions and Sources

Note C: Microsoft Knowledge Base Articles (KBA's)

Note D: Excel Help From The Internet

Note E: Guide To Excel Statistical Functions, Routines and Tools

Note F: Help Screen Errors

Note G: Data Input Errors

Note H: Some Specific Lists Of Excel Faults

Note I: Improving Documentation

Note J: Ordinal, Nominal and Likert Scale Variables

Note K: New Display Modified Probability Distributions

Note L: Autocorrelation

Note M: An Actual Problem Requiring Unbiased Standard Deviations

Note N: Ranking, Quartiles, Medians and Percentiles

Note O: Averages, Standard Deviations and Pre-centering

Note P: Alternate Algorithms

Note Q: Data Entry For ANOVAs In The Data Analysis Tool Pac

Note R: Linear Regression

Note S: Linear Regression Throught The Origin, Excel 2000

Note T: Singularity, Multi-colinearity, Accuracy And Other Matrix Problems

Note U: Pllynomial Regression

Note V: Regression Normal Probability Plot

Note W: Standardized Residuals

Note X: Support In Excel For Tests Of Significance

Note Y: Constructing A Hypothesis

Note Z: Setting Up The Excel Sheet For Calculating P Values

Note AA: Generate Diehard Test Input Files

Note AB: Diehard-II Output For RAND, Excel 2000

Note AC: Marsaglia's MWC256 RNG

Note AD: Diehard-II And -III Output For RAND, Excel 2003

Note AE: Random Number Generator VBA Routines

Note AF: The Wilkinson-Sawitski Series Of Tests On Excel 2007

Note XN: XNUMBERS, A Multi-precision Floating Point Calculus For Excel ACKNOWLEDGEMENTS

I would like to thank Dr. B.D. McCullough, Department of Decisions Sciences and Department of Economics, Drexel University, Philadelphia, PA. for his help in making comments and corrections to the paper. I have tried to incorporate all his comments and corrections, and the current versions are the result.

I would also like to thank Jerry W. Lewis, for his assistance in tests on statistical distributions. He also steered me to Smith's set of accurate probability distributions.

I thank the Journal "Computational Statistics and Data Analysis" (Elsevier) for including a short note in the November 2006 issue on this website.

Thanks to Leonardo Volpi for information on some problems in Excel, on his XNUMBERS extended precision add-in and the optimization and non-linear equation add-ins.

Sawilowsky's "Journal of Modern Applied Statistical Methods" has been a good source on articles that directly relate to the problems of statistical analysis.

Also to those who sent in comments and corrections to my text.

For the 2007 version, thanks to Christopher L. Paras on the chart output graph fault in the Data Analysis Tool Pac Regression routine,

Thanks to Joseph Billo 10/2/07 on different ways to improve the Solver fit to non-linear equations.

COMMENTS AND REMARKS

This effort to test Excel started in 2001, based on McCullough's articles in "The American Statistician" in 1998, and in suceeding articles in other publications.

This URL also includes material on the free Excel add-ins from the "Foxes" for optimization and non-linear equation fits. Also included is the extended precision computation package (xnumbers) for Excel. These extend the basic Excel computational capabilities.

CORRECTIONS AND CHANGES

Sections 1 through 21 are complete rewrites of the first edition. Most of the structure, text, tables and figures are the same as the first edition. I currently have both the Excel 2003 and 2007 versions loaded on my computer, so I can check out comments from viewers on either version. There are differences between Excel 2003 and Excel 2007.

CORRECTIONS AND CHANGES TO THE THIRD EDITION:

Stuart MacDonald on 6/3/07 pointed out a numerical error on page 4 of note AA regarding XNUMBERS. The correction was made both in Note AA and on Volpi's XNUMBERS Website. June 6, 2007.

Correction to Bibleography and Section 18 reference, Harold Henderson, 1/17/08.

Numerical errors in table 5.5, was noted by Michelle Chutka, 3/18/08. The errors have been corrected.