Silver Blaze Problems in Regression and Multivariate Analyses

Multivariate Data Analysis
Psychology 6140

In reviewing or critiquing research reports, the some of most important questions to consider include these: However, the most difficult problems for the reader to judge are often those of the "Silver Blaze" variety.(1) Because the author(s) typically attempt to present their results in the most coherent way, finding problems often involves going beyond the information given, reading between the lines, and asking yourself about things that are not described explicitly.
(1) In Silver Blaze, Inspector Gregory asked Holmes how he knew the identity of the thief, to which Holmes replied, Because of the curious incident of the dog in the night-time. Gregory protested, But the dog did nothing in the night-time, and Holmes said, That was the curious incident -- the dog should have been barking.

As a group effort in reacting to the research papers you are each critiquing, let us see if we can make up a list of Silver Blaze problems that might occur in regression studies. Some of these are examples of what may be called lurking variables -- a variable which has an important effect, and yet is not included among the predictor variables considered (or presented) by the author. Such variables may be omitted because its existence is unknown, or its influence is thought to be negligible, or simply because data on it are not available or difficult to obtain. See Joiner (1981; Amer. Statistician, 227-233) for examples of lurking variables. I've started it off with a few examples of things you might look for.

1. Underlying associations with outside variables.
A classical example of spurious correlation is the astounding correlation of 0.998 cited by Yule and Kendall (1950; Introduction to the Theory of Statistics) the number of people in the U.K. classified as "notified mental defectives" and the number of "wireless licenses issued". However, both variables are yearly figures from 1925 to 1937. The spurious correlation arises from the fact that both variables happened to be increasing over time: radios were becoming common household items in the U.K., while an increase in recognition of mental illness and facilities for its treatment was also taking place.

A more recent example: the Places Rated Almanac (Boyer & Savageau, 1985) contains nine composite variables related to climate, housing costs, health care, arts and cultural facilities, etc. for 329 metropolitan areas in the US. Several analyses of these data pointed out a rather high correlation between the arts and health measures. This is due, however, to an underlying correlation of each of these with population.

2. Unmeasured variables and influential observations
The data on fuel consumption ( in the US showed a moderately good prediction of fuel consumption per capita from gasoline tax, proportion of licensed drivers and per capita income. It was thought that expressing the variables in per capita terms eliminated the effects of varying state population. Influence plots, however, pointed to a few states (Wyoming, South Dakota) as greatly underpredicted, influential observations. Some thought led to the suggestion that population density might be important. This variable, when tried, let to a better 1-predictor model than the best model from the other variables! Had the influential outliers been deleted, this conclusion would not have been reached.
3. Improper randomization or experimental control
Draper & Smith (1966) give data on an experiment on the effect of three variables (solar radiation, soil moisture, and temperature) on the amount of vitamin B* in turnip greens. Relatively careful analysis and graphical display showed nothing unusual. However, an index plot of the response against the order of listing of the data in the text book shows a straight line with better fit than the three predictors! (See the analysis in the file Joiner (1988) suggests that either the vitamin content of the turnips or the chemical reagent used to measure vitamin B* may have decayed over time.
4. The New England Blackout
A curious reporter found that there was an unexpectedly high number of births on the Monday and Tuesday exactly nine months after the famous New England blackout in 1965. He wrote an article suggesting the obvious causal inference. What's wrong with his reasoning? [Click here for an answer]
5. The dangers of over-fitting
In the heyday of mathematical modelling someone is reported to have said, "Give me three parameters, I can fit an elephant. Give me four, I can make it wag its tail". Bob Agnew has a short piece on Fitting Sickness with a real-world illustration. Another nice illustration concerns fitting polynomial models to Galileo's experiments on inclined planes. See also for an example of overfitting when random predictors are added to a data set and stepwise fitting is used.

Reviewer's Checklist

Try to suggest additional items or questions to consider under the following headings; feel free to add additional topics.

Questions for Reviewers

The following questions are suggested in Leon Glaser's article, "Some Notes on Refereeing", American Statistician, 1986, 40(4), 310-312.

Other pointers: