After appearing on the wires, this story was picked up and published by any number of news outlets. It's based on a study published in the December issue of Psychological Science by Leif Nelson and Joseph Simmons: MonikerMaladies: When Names Sabotage Success. There's a lot here for a sensationalist to love. The authors' basic claim is that a person's initials directly affect their performance on a variety of physical and cognitive tasks. They attempt to establish this claim with five experiments in three different domains:
- Do initials affect performance in major league baseball players?
- Do initials affect school grades? (Specifically, are students whose names begin with A more likely to get A's, B initials more likely to get B's, etc.) And does this translate into enrollment in lower-ranked institutions?
- Do initials affect performance on a laboratory cognitive task?
Batter up!
I'm hoping to have the time to address all aspects of the paper in the coming days (ok, weeks), but I'm going to start, as is only appropriate, by discussing Study 1: Baseball Performance. As mentioned above, the general hypothesis is that initials affect performance among Major League baseball players. The authors chose to operationalize this hypothesis in a very specific manner: Do players whose first and/or last initial is "K" have a higher career strikeout rate than those who don't have "K" names?
I know, that just sounds stupid. The theory is this: people generally either have an affinity for or lack an aversion to objects / behaviors / events / etc. that involve their initials. (This is an active area of research, and there have been a number of studies that suggest such connections. None of these studies are exactly home runs, though, if you will.) And "K" is the formal scorecard notation in baseball for a strikeout. So, if your initials contain a "K", you're less averse to striking out than if not.
The procedure the authors used to test this hypothesis is quite simple and, since the data is publicly available (baseball-databank.org), can be easily replicated at home. They calculated career strikeout rate for every player with at least 100 plate appearances during the years 1913 - 2006, and they compared the mean rates between those players with and those without "K" initials using a two-sample t-test.
The results were fairly striking: "K"-initialed players had a career strikeout rate of 18.8% of plate appearances, compared with 17.2% for non-"K" players. This difference was statistically significant (t = 3.08, df = 6395, p = .002).
The analysis and its discontents (e.g., me)
While this analysis has the advantage of simplicity, there are several potential pitfalls. First, the t-test assumes that the strikeout rate has a normal (bell-shaped) distribution. As the figure below demonstrates (non "K" names on top, "K" names on the bottom), this assumption is clearly violated: the strikeout rate is highly skewed to the right (perhaps due to the inclusion of pitchers and of journeyman minor leaguers with few career at bats in the majors - 100 plate appearances is not a high bar). This problem could be greatly mitigated by use of a nonparametric statistical test (the Mann-Whitney U test, say), or by the simple expedient of transforming the data. (A square-root transformation appears to do nicely.)
Another difficulty is that not all observations of career strikeout rate are created equal. We have much more reliable estimates of career strikeout rate for players with thousands or tens of thousands of plate appearances than for those with a couple hundred. It would be a simple matter to differentially weight each player's contribution to the analysis by his number of plate appearances or the variance in his estimated strikeout rate. On the other hand, it's not clear exactly what the implications of such weighting would be, since strikeout rate is itself substantially correlated with number of plate appearances. (Naturally, players with low strikeout rates tend to have long careers.) What is clear is that the authors were unaware of these thorny questions, or chose to sweep them under the rug.
There are other questions that arise in the analysis (should strikeout per plate appearance be treated as a binary outcome and analyzed via logistic regression, should strikeout rate have been collapsed across years or would it be better to model years separately in a repeated measures model, etc., etc.). But the truth is that none of these issues really matter all that much compared to the major, inexcusable failing of this study: lack of thoroughness and consistency.
Lack of thoroughness and consistency
Let's say, just for the sake of argument, that we believe the authors' theory. Under this theory, there are several other hypotheses that would be at least as true and equally testable as the hypothesis that "K" initials influence strikeout rate. It's actually rather peculiar that the authors wouldn't have thought to perform and report these additional, simple, experiments. But, that's why I'm here! For the sake of simplicity and comparability, we'll use the same (suboptimal) analysis used in the paper.
Hypothesis: Batters with "H" initials have higher hit rates than those without.
Data: "H" initials: 20.9% hit rate. Non-"H": 21.2% hit rate.
Result: False (t = 0.43, df = 6203, p = .67). Effect in wrong direction.
Hypothesis: Batters with "B" initials have higher walk rates than those without. (The scorecard notation for a walk is "BB".)
Data: "B" initials: 7.4% walk rate. Non-"B": 7.4% walk rate.
Result: False (t = -1.03, df = 6203, p = .31). (Note: this negative result despite the heroic contributions of Barry Bonds to the field of walking.)
Hypothesis: Batters with "S" initials have higher strikeout rates than those without. (After all, while "K" is the official scorecard notation, doesn't "strikeout" actually begin with an "s"?)
Data: "S" initials: 16.4% strikeout rate. Non-"S": 16.9% strikeout rate.
Result: False (t = 0.05, df = 6203, p = .96).
Hypothesis: Batters with "W" initials have higher walk rates. (Again, "BB" is the scorecard notation, but "walk" begins with "w".)
Data: "W" initials: 7.2% walk rate. Non-"W": 7.4% walk rate.
Result: False (t = 0.11, df = 6203, p = .91). Effect in wrong direction.
So, the question is this: why on Earth would "K" batters have more strikeouts, but none of the other hypotheses above would be true? And why wouldn't the authors have thought to test the above hypotheses?
Addendum
I've received (through backchannels) an interesting comment suggesting that pitchers, rather than batters, are primarily responsible for walks. I'm not sure whether or not I agree with this, but it's certainly the case that, for each at bat, both pitcher and batter are intimately involved in the outcome, whether it be walk, strikeout or hit. Accordingly, here are the analyses for pitchers, corresponding to those above for hitters. I included all pitchers with at least 100 innings pitched, rates are calculated per inning pitched.
Hypothesis: Pitchers with "K" initials have higher strikeout rates than those without.
Data: "K" initials: 0.55 strikeouts/ip. Non-"K": 0.55 strikeouts/ip.
Result: False (t = 0.11, df = 3339, p = .91). Effect in wrong direction.
Hypothesis: Pitchers with "H" initials have higher hit rates than those without.
Data: "H" initials: 1.02 hits/ip. Non-"H": 1.02 hits/ip.
Result: False (t = -0.28, df = 431, p = .78; Satterthwaite correction).
Hypothesis: Pitchers with "B" initials have higher walk rates than those without.
Data: "B" initials: 0.40 walks/ip. Non-"B": 0.40 walks/ip.
Result: False (t = 0.53, df = 3339, p = .60). Effect in wrong direction.
Hypothesis: Pitchers with "S" initials have higher strikeout rates than those without.
Data: "S" initials: 0.55 strikeouts/ip. Non-"S": 0.55 strikeouts/ip.
Result: False (t = -0.61, df = 3339, p = .54).
Hypothesis: Pitchers with "W" initials have higher walk rates than those without.
Data: "W" initials: 0.42 walks/ip. Non-"W": 0.40 walks/ip.
Result: True (t = -2.95, df = 333, p = .004; Satterthwaite correction).
So, interestingly, there is one significant result: pitchers with first or last names beginning with the letter "W" have more walks per inning pitched than others. But, really, this is the exception that proves the rule.
In the case of batters, it was those whose initials coincided with the somewhat arbitrary scorecard notation for strikeout ("K") who had higher strikeout rates than others. Those whose initials coincided with the word "strikeout" ("S") had no significant elevation in strikeout rate.
On the other hand, in the case of pitchers, those whose initials coincided with the scorecard notation ("B") did not have a higher walk rate than the non-"B"s, whereas those whose initials coincided with the word "walk" ("W") did have a higher walk rate than others. This arbitrary pattern of results and total lack of consistency is wholly consistent with chance findings.
4 comments:
First, the t-test assumes that the strikeout rate has a normal (bell-shaped) distribution... this assumption is clearly violated: the strikeout rate is highly skewed to the right... This problem could be greatly mitigated by use of a nonparametric statistical test (the Mann-Whitney U test, say), or by the simple expedient of transforming the data. (A square-root transformation appears to do nicely.)
If the aim is to test for a shift in the mean, you could also stick with the t statistic but obtain the distribution of the statistic via resampling, such as perform a randomization test. (Or you could resample a simpler statistic, such as the difference in means, of course.)
-
It occurred to me when I first read of this (on Pharyngula I think) to wonder whether any other tests than the ones resported were considered. For example, with the academic performance, whether they considered if A was different from B,C,D, etc, whether C was different from D and so on.
You later point out that a variety of other tests consistent with this particular form of nominative determinism could have been performed in the baseball situation, and so the same question arises - were those considered at all?
If so, we have to wonder at the familywise type I error rates (and the underlying bias in not presenting them), or even whether hypothesis tests are really an appropriate way to approach something like this at all.
You note a significant result at the end. Of course, if you do enough tests where in reality there's nothing really going on, sooner or later you're going to get some significant ones!
Another interesting post. Thanks.
Thanks for another thoughtful comment, Efrique. Good point about resampling tests. I actually mentioned that as a possibility in an early draft, but redacted it in what was probably a misguided attempt to limit the technical / jargony aspects of the discussion. I try to keep things statistically accurate and reasonably thorough, but also accessible to readers with no statistical background. I fear I usually fail on both counts of this compromise, but hopefully I'll improve in time.
I agree with you about the possibility of unreported significance tests, and I was hoping readers would draw the inference that there might be unreported comparisons. I don't want to outright accuse the authors of cherry picking, but all the hallmarks are there, and even more so in the studies of grades and in the laboratory study than in the baseball analysis.
I'm hoping to have a nice, thorough discussion here about multiple comparisons one of these days. This paper didn't seem ideal for the topic, but I'm confident I'll come across one that is, and sooner rather than later.
I sent you a manuscript file via email.
That was accepted.
Who knows where to download XRumer 5.0 Palladium?
Help, please. All recommend this program to effectively advertise on the Internet, this is the best program!
Post a Comment