Statistics Hacks: Tips & Tools for Measuring the World and Beating the Odds

Graphs are powerful tools to represent quantities, relationships, and the results of research studies. But in the wrong hands, they can be made to deceive. Choose your destiny, young Luke (or, if you are under the age of 25, "young Anakin"), and avoid the dark side.

There was a time when only scientists, engineers, and mathematicians ever saw a graph. With the advent of more and more news outlets aimed at the general public, visual representations of numeric information have become more and more common. Just think of yesterday's issue of USA Todayit contained at least a dozen graphs.

In business conferences, graphs are used frequently to communicate information and demonstrate success (or failure). If the creator of a graph isn't careful, though, choices that might seem arbitrary will affect the interpretation of the information. Without changing the data, you can change the meaning.

So, if you want to avoid manipulating your audience when you create a graph, or if you just want to be able to spot a misleading (whether intentional or not) chart, then use this hack to help you create and interpret graphs effectively.

Choosing the Honest Graph

To understand correct and incorrect graphing options, we first have to cover some graphing basics. There are various pieces to a graph, and the manipulation of those pieces can lead or mislead.

Typical graphs have two axes, because they describe two different variables. Axes are the lines along the bottom, called the X-axis, and along the side, called the Y-axis.

You can remember that the vertical axis is called the Y-axis because the cute little letter Y is reaching its cute little hands up, vertically, toward the sky. Get it? (Welcome to the creative world of statistics education.)

The sort of graph that is appropriate (and nondeceptive) for showing the variables you have measured depends on the level of measurement of your variables [Hack #7]. You can choose from three common types of graphs, and only one will be the right one for your variables:

Bar chart

In Figure 2-8, the X-axis represents categories or groups, such as males and females. The Y-axis is continuous: the taller the bars, the higher the value on variable Y.

Figure 2-8. Bar chart

Histogram

In Figure 2-9, the X-axis represents continuous values. A histogram is often used when the X-axis represents common categories that reflect an underlying continuous variable, such as months of the year or some other distinctive set of groupings that can be placed in a meaningful order. These look like bar charts, except that the bars are pushed together with no spaces between them.

Figure 2-9. Histogram

Line chart

In Figure 2-10, both the X- and Y-axis are continuous variables; in this example, they're time and value. The higher the line at any point, the greater the quantity as represented by the Y-axis.

Figure 2-10. Line chart

To pick the right kind of graph (i.e., the one with the format that is the least deceptive and the most intuitive), identify the types of X variable you are using (notice that Y is continuous in all of these formats):

  • If X represents different categories and Y is continuous, use a bar chart.

  • If X can be conceived of as categories, but there is also some meaningful order among them and Y is continuous, use a histogram.

  • If X and Y are both continuous, use a line chart.

Graphic Violence

A common error in graphing, either intentional or not, has to do with setting the scale for the X-axis. Here's why this is a problem and how you can avoid it.

Graphs with two variables invite comparisons across categories or time or across different values of one variable. Pictures are worth a thousand words, as they say, and a graph can be very persuasive evidence. Anytime lines or bars are used to compare values, the comparison is accurate only when the height of the line or the length of the bar is judged against some standard minimum value. That minimum value is often zero. If the graph is not calibrated to some reasonable base value, small differences look huge.

Compare the two graphs shown in Figure 2-11, for example. Both convey exactly the same data, and yet your interpretation of each might be wildly different. The histogram in the top left reflects performance of the U.S. stock market over the last five days. Notice a rather frightening-looking drop on day five. No doubt, earth-shaking news hit near the end of day 4. You might also notice that the Y-axis (the Dow Jones Index) does not begin at zero; it begins at 9,900, a value that is low enough to contain the top of all five bars, but that is otherwise not meaningful.

Figure 2-11. The power of the Y-axis

Look more closely at the second histogram in Figure 2-11, on the bottom right. Both charts present the same data, but the second graph uses zero as the starting point. The interpretation of the data as presented in this graph shows very little fluctuation across the last five days, and the frightening drop at day 5 is barely a hiccup.

Which display is the correct one? Both reflect a drop of 2.8 percent in stock market value from day 4 to day 5. It really depends on the intent of the graph constructor and the intended audience. When number counts are involved, or money, the most meaningful and fairest starting point is usually nothing. Many newspapers provide daily stock information in the format as shown in the first histogram. They believe their readers are interested in small changes, so they set a Y-axis starting value that is as high as possible but low enough to contain all data points on the X-axis.

After all, to an avid investor who changes her portfolio often and buys and sells frequently, a drop of 2.8 percent is serious business. A graph designed to make small changes look serious might be the most valid for that reader. If an investor is one of those "in it for the long haul" types, a relatively small change is meaningless, however.

To get the most meaning out of graphs like these, always check the bottom value on the Y-axis. This way, you can get a sense of the real differences on the X-axis as you crawl from bar to bar. If you are making graphs like these, think about the most honest way to present the information. You want to inform, not deceive (probably).

See Also

  • The book that first pointed out to the general public how charts can deceive, especially in advertising, was How to Lie With Statistics. Huff, D. (1954). New York: Norton and Company.

Категории