* * * * * * * * *

* * * * *

My Rules of Information

  1. Go where it is
  2. The answer depends on the question
  3. Research is a multi-stage process
  4. Ask a Librarian
  5. Information is meaningless until queried by human intelligence
  6. Information can be true and still wrong


Joel Best. Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists. University of California Press, 2001. 0-520-21978-3. $19.95. Reviewed by Marylaine Block

One of the important functions served by information professionals is skepticism; we usually don't accept statistics blindly, even when they're generated by authoritative sources. We tend to offer them with caveats -- "as of 1998..." or "according to this poll on this date, but on the other hand, according to this other poll..."

Possibly this is because we wouldn't want to swear to a lie detector that the statistics we ourselves collect are absolutely accurate -- did we really record every reference question exactly when it was asked, or did we say, an hour later after the rush died down, "Hmm, I think I answered about 8 reference questions and 12 directional questions"? But it's also because we have better crap detectors. We see daily the evidence that information is only as good as the question that generates it. We know that dishonest questions designed to yield specific results do so, and that questions that have never been asked leave gaping craters in the information landscape.

Unfortunately our skepticism is a rare commodity, which is why a book like this is valuable. Joel Best says people are awed by numbers, regarding them as hard fact, as solid, scientific unchallengeable evidence; such awe somehow turns off our automatic crap detectors and keeps us from asking things like "Sez who?" and "How do they know that?" Combined with our basic inability to differentiate between millions and billions, this conviction makes us dupes for anyone with an agenda, including our own naive, willing-to-believe selves.

Social problems, Best says, are not hard, pre-existing realities but social constructions; he shows how social statistics evolved in the 19th century as people tried to create public awareness of something they believed to be social problems. To answer the question, "How big a problem is it?" they had to collect and promulgate numbers.

Best examines the ways bad statistics come about: from guessing, defining, measurement decisions, and sampling decisions. Often the first available statistics are guesses, extrapolated from anecdotal evidence; Best quotes Mitch Snyder, the activist for the homeless, who said his estimate of 2-3 million homeless people was based on getting on the phone and talking to a lot of people. Such guesses are problematic both because activists tend to guess high and because, once reported, the numbers take on a life of their own. Best says:

People lose track of the estimate's original source, but they assume the number must be correct because it appears everywhere -- in news reports, politicians' speeches, articles in scholarly journals and law reviews, and so on. Over time, as people repeat the number, they may begin to change its meaning, embellish the statistic. . . After a newsmagazine story reported "researchers suggest that up to 200,000 people exhibit a stalker's traits," other news reports picked up the "suggested" figure and confidently repeated that there were 200,000 people being stalked.

Why are so many of our political discussions marked by duelling numbers? In some cases because each side has used a different definition of the problem: How long does somebody have to be without permanent shelter to qualify as homeless? The longer the time period, the smaller the number that defines the problem, and vice versa. In other cases, differing numbers may reflect differences in the way the questions have been posed. In yet other cases, the sample may be skewed -- too few people may have been included to generalize from, or the sample may be large but unrepresentative (the infamous discredited study of internet pornography was based on a huge but unrepresentative file of downloaded images, drawn from only 17 out of 32 usenet groups that offered downloadable images ).

But even if the initial numbers are impeccable, Best says, there are numerous ways in which they may be mangled, including changing or competing definitions (combining numbers based on different definitions of homicide over time or from different jurisdictions, for instance), incomplete records (a jurisdiction that chooses not to submit hate crime statistics may be taken as a jurisdiction that has no hate crimes), and transformations (as when the estimate of 150,000 young women suffering from anorexia became 150,000 young women DYING every year from anorexia).

But since we are not good at math, Best says, innocent confusion causes much of the garbling of statistics. Figures from the report Workforce 2000, for instance, showed that, because of the increasing participation of women, blacks, Hispanics and Asians in the American workforce, white men, then 47.9% of the American workforce, would be only 44.8% in the coming century. But the report authors chose to report not in terms of percentage of the total workforce -- a concept most of us grasp -- but in terms of "net additions," a concept even the writers of the report's executive summary misunderstood. Thus they, and the media, misreported the figures, claiming that white males would become only 15% of new entrants to the workforce. This mutant statistic was further confused by other people, including the official who testified in a congressional hearing that "By the year 2000, nearly 65% of the total workforce will be women." [Say what? No warning beeps from your crap detectors, anybody?]

Possibly the most useful part of the book is his section on how to question numbers when we encounter them. We must think, Best says, about the sources of the numbers and what stake they have in those numbers; we must ask what definitions were used, what method was used, what competing statistics are available. The more dramatic the numbers, and the more significant the actions we are asked to take because of them, the greater the need to examine those figures closely.

This is not the only book available on this topic; I'd also recommend John Allen Paulos' book, A Mathematician Reads the Newspaper for multiple examples of numbers misreported by an innumerate press. But the book does an outstanding job of making complex statistical issues intelligible TO the innumerate. And it's a clear warning to any of us who think our job is done when we hand our users an article that has the statistics they requested in it.

* * * * *

NOTE: On a related note, do not miss Trudy Lieberman's article, "Seven Deadly Sins" in the September-October, 2001 issue of Columbia Journalism Review, which explains why you should never trust medical news in newspapers, TV news or popular magazines.

* * * * *


Reporters are faced with the daily choice of painstakingly researching stories or writing whatever people tell them. Both approaches pay the same.

Scott Adams. The Dilbert Principle.

* * * * *

