This post, by Kathryn Rusch, originally appeared on her site on 3/7/12.
The quote in my title comes from Mark Twain’s autobiography. Twain said:
“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: “There are three kinds of lies: lies, damned lies and statistics.”
The problem with Twain’s attribution, however, is that no scholar can find anything in Disraeli’s papers that even resembles it. (Yes, scholars have that kind of time on their hands.) The website twainquotes.com cites an 1895 article by Leonard H. Courtney in which the quote first appeared—or so everyone thinks.
I find it hilarious that the source of this quote about statistics is almost impossible to track down. I also find it funny that Twain’s preface to the quote has gotten lost in the pithiness of the “lies, damned lies, and statistics.”
“Figures often beguile me,” he wrote, “particularly when I have the arranging of them myself.”
And thus, Mark Twain, who died in 1910, has poked at the heart of modern publishing. We all love statistics – or figures, as he calls them – but they prove nothing. In fact, this year, statistical analysis is harder than ever.
You’d think it would be easier. We have computers, after all. We have incredible processing speeds and more information at our fingertips than ever before. We can “crunch” the numbers quickly and easily.
The problem is in which numbers we crunch.
Let’s take, for example, the number of e-book sales versus the number of print book sales. We’re seeing a lot of statistics about the percentage of e-books in the marketplace. And those statistics come from reputable organizations.
I felt uncomfortable about those statistics at the end of 2011, and I feel even more uncomfortable about them now. These statistics purport to examine all books sold, and I know that’s not true. I also know that there are equations that supposedly take a statistical sample, and apply them over information not yet gathered (or information that’s impossible to gather). And even though I know the mathematical model is accepted, I’m still uncomfortable.
You see the mathematical model in polling all the time. Pollsters contact 1,000 or 10,000 or 100,000 sufficiently diverse people, poll them, and then use them as a statistical sample that supposedly represents the entire population. This same technique takes place in medical studies. Studies gather information from 50 to 500 to 5,000 people, gauge their reactions to, say, a medication over a period of time, and then use those as a basis for the result.
People who watch medical studies, for example, generally ignore the ones with less than 100 participants, and really believe the ones with tens of thousands of participants. And if those tens of thousands were studied over years, then the medical study is considered even more accurate than the one that follows someone’s reaction to a treatment or a medication over a few hours.
See why Mark Twain insisted that he liked figures if he arranged them himself? Or to put it in 2012 language: he liked statistics if he manipulated the information himself.
One of the first things I learned as a journalist, back in high school of all places, was how to look for statistical manipulation. “Four out of five dentists surveyed” might mean that five dentists were surveyed, and four of them (the ones who worked for the company) liked the product. Or it might mean that four out of five dentists in a survey that contacted 10,000 dentists (none of whom worked for the company) liked the product.
Both statements would be true. Four out of five dentists liked the product. But only one statement might be information that a consumer might benefit from.
As the past year has continued, it has become clear to me that e-book sales are rising. Anyone who watches numbers knows that. Every day there’s a new tablet hitting the market, or some new version of an e-reader. Just this week, Apple unveiled iPad 3. At the same time that Apple announced the New HD iPad (which is what they’re calling it), Google announced Google Play which it claims will rival iTunes. We’ll see.