Data – Donald Marron

Spaghetti, Pies, and Clutterplots: Visualizing Data

Jonathan Schwabish has just published a wonderful guide to visualizing economic data. If you produce charts, you really ought to study it.

Here’s one example, transforming a default Excel “spaghetti” chart into something more tasty:

Getting the Most from Excel Charts

Over at The Why Axis, Jon Schwabish has a great show-and-tell on improving economic visuals in Excel. He starts with a chart of job openings data from the Bureau of Labor Statistics that was clearly made using Excel’s default settings (click charts to enlarge):

Jon demonstrates many improvements. I particularly like this one:

To my eye, this chart is way better than the original, although it shares one big flaw: the horizontal spacing of the dots doesn’t match the timing. Jon then corrects that and adds more information (I might have stopped with just getting the spacing right):

Check out his original post for an easy guide to the how and why of these changes and a copy of his Excel file.

Turning Data Into Art

The art shows in Miami last weekend included several artists who turn data into art.

Norwood Viviano’s installation Cities: Departure and Deviation brings three-dimensional tangibility to the population history of 24 American cities. Here are the first dozen of his blown-glass pieces, Atlanta through Los Angeles:

2_cities_departure-and-deviation_atlanta-to-los-angeles

Chicago, short and squat, is fifth from the left.

In Words and Years, Toril Johannessen brings a certain whimsy to data tracking word use in leading academic and popular publications. Here, for example, she documents the triumph of “Hope” over “Reality” in political science:

Purple America – The Best Election Maps

For all the talk of red states and blue states, much of America is really purple.

That simple observation has inspired some great alternatives to the standard red and blue maps depicting electoral outcomes.

Princeton’s Robert Vanderbei, for example, has created an animation that makes three improvements on the standard red/blue map: he maps counties not just states; he uses shades of purple to reflect the mix of Democratic and Republican votes; and he uses green for third parties.

Here’s his animation for the 1960 to 2008 elections; keep an eye out for Ross Perot. (Vanderbei also has a static version of the 2012 results.)

Michigan’s Mark Newman also adopts the purple view, with another wrinkle. Traditional maps emphasize geographic area, not the location of electoral votes (or population). Using some fancy math, he resizes and reshapes states to reflect their relative electoral import. The result resembles a smooshed butterfly, with blue areas (mostly cities) amid a red web:

Niall Ferguson’s Mistake Makes the Case for Metadata

Harvard historian Niall Ferguson goofed on Bloomberg TV yesterday. Arguing that the 2009 stimulus had little effect, he said:

The point I made in the piece [his controversial cover story in Newsweek] was that the stimulus had a very short-term effect, which is very clear if you look, for example, at the federal employment numbers. There’s a huge spike in early 2010, and then it falls back down. (This is slightly edited from the transcription by Invictus at The Big Picture.)

That spike did happen. But as every economic data jockey knows, it doesn’t reflect the stimulus; it’s temporary hiring of Census workers.

Ferguson ought to know that. He’s trying to position himself as an important economic commentator and that should require basic familiarity with key data.

But Ferguson is just the tip of the iceberg. For every prominent pundit, there are thousands of other people—students, business analysts, congressional staffers, and interested citizens—who use these data and sometimes make the same mistakes. I’m sure I do as well—it’s hard to know every relevant anomaly in the data. As I said in one of my first blog posts back in 2009:

Data rarely speak for themselves. There’s almost always some folklore, known to initiates, about how data should and should not be used. As the web transforms the availability and use of data, it’s essential that the folklore be democratized as much as the raw data themselves.

How would that democratization work? One approach would be to create metadata for key economic data series. Just as your camera attachs time, date, GPS coordinates, and who knows what else to each digital photograph you take, so could each economic data point be accompanied by a field identifying any special issues and providing a link for users who want more information.

When Niall Ferguson calls up a chart of federal employment statistics at his favorite data provider, such metadata would allow them to display something like this:

Clicking on or hovering over the “2” would then reveal text: “Federal employment boosted by temporary Census hiring; for more information see link.” And the stimulus mistake would be avoided.

I am, of course, skimming over a host of practical challenges. How do you decide which anomalies should be included in the metadata? When should charts show a single flag for metadata issues, even when the underlying data have it for each affected datapoint?

And, perhaps most important, who should do this? It would be great if the statistical agencies could do it, so the information could filter out through the entire data-using community. But their budgets are already tight. Failing that, perhaps the fine folks at FRED could do it; they’ve certainly revolutionized access to the raw data. Or even Google, which already does something similar to highlight news stories on its stock price charts, but would need to create the underlying database of metadata.

Here’s hoping that someone will do it. Democratizing data folklore would reduce needless confusion about economic facts so we can focus on real economic challenges. And it just might remind me what happened to federal employment in early 2009.

2,000 Years of Economic History in One Chart … and Another

Michael Cembalest, head of investment strategy at JP Morgan, is famous for his beautiful, insightful charts. His latest (courtesy of Paul Kedrosky) illustrates two millennia of world economic history:

According to the chart, India (orange) and China (red) together comprised more than two-thirds of the globe’s economic activity back in year 1 (well, not so much the globe, but the chosen countries). By 1950, their share had fallen to only one-eighth, thanks to the growth of the United States (green), Western Europe (shades of blue), Russia (gray), and Japan (yellow). Since then, China has been gaining share.

Not surprisingly, the chart has already attracted attention in the blogosphere. Over at the Atlantic, Derek Thompson slices and dices the data to see how much of the pattern reflects the ebbs and flows of population vs. productivity.

At the Economist, meanwhile, K.N.C. channels Edward Tufte, expressing appropriate alarm about the compressed x-axis. The first millennium gets as much real estate as the 1990s. K.N.C. then offers another approach:

Given data limitations, this chart also compresses the x-axis, but using bars and variable-width gaps make it much clearer that there are jumps between years. The focus on a limited number of countries also makes it clear that the chart omits countries that account for 30-40% of world GDP. In Cembalest’s chart, in contrast, one wonders what happened to South America, the Nordic countries, Canada, Africa ex Egypt, etc. His listed countries appear to sum to 100% of world GDP, but large swathes of the world are unaccounted for.

Are Two Economic Clocks Better Than One?

A man with one clock always knows the time. A man with two clocks is never sure.

This week brings the two heavyweights of economic statistics. On Thursday morning we got the latest read on economic growth, and on Friday we learn how the job market fared in May.

Government statisticians and outside commenters usually emphasize a particular headline number in these reports. For the economy as a whole, it’s the annual growth rate of gross domestic product (GDP), which logged in at a mediocre 1.9 percent in the first quarter. For jobs, it’s the number of nonfarm payroll jobs created in the past month (115,000 in April, but that will be revised on Friday morning).

In each case, the government also reports a second measure of essentially the same thing. Jobs day aficionados are familiar with this. The payroll figure comes from a survey of employers, but the Bureau of Labor Statistics also reports results from a survey of people. That provides the other famous job metric, the unemployment rate, and a second count of how many people have a job. The concept isn’t exactly the same as the payroll measure–it includes a broader array of jobs, for example, but doesn’t reflect people holding multiple jobs–but it’s sufficiently similar that it can be an interesting check on the more-quoted payroll figure.

The downside of this extra information, however, is that it can foster confusion. In April, for example, payrolls increased by 115,000, but the household measure of employment fell by 169,000. Did jobs grow or decline in April?

Another, less well-known example happens with the GDP data. The Bureau of Economic Analysis calculates this figure two different ways: by adding up production to get GDP and by adding up incomes to get gross domestic income (GDI). In principle, these should be identical. In practice, they differ because of measurement challenges. As Brad Plummer notes in a piece channeling Wharton economist Justin Wolfers, the two measures tell somewhat different stories about recent economic growth. In Q1, for example, GDI expanded at a respectable 2.7 percent, much faster than the 1.9 percent recorded for GDP. Is the economy doing ok or barely plodding along?

Such confusion is the curse of having two clocks. We can’t be sure which measure to believe. Experts offer good reasons to prefer the payroll figure (e.g., it’s based on a much larger survey) and GDP (e.g., income measurement is difficult for various technical reasons, including capital gains). But there are counterviews as well; for example, at least one paper finds that GDI does a better job of capturing swings in the business cycle.

Despite this confusion, two clocks are better than one. They remind us of the fundamental uncertainty in economic measurement. That uncertainty is often overlooked in the rush to analyze the latest economic data, but it is real. There are limits to what we know about the state of the economy.

In addition, a weighted average of two readings may well provide a better reading than either one alone. If one clock says 11:40 and another says 11:50, for example, you’d probably do well to guess that it’s 11:45. Unless, of course, you have reason to believe that one clock is better than the other.

The same may well be true for GDP and GDI – the truth is likely in the middle. (This is less true with the jobs data; because of the larger sample, I weight the payroll measure much more heavily than the household measure, at least for monthly changes.)

P.S. For more on GDP vs. GDI, see Dean Baker and Binyamin Appelbaum.

Is the United States the Best Place for Women?

Women in the United States have the best quality of life of any developed nation, according to the Better Life Index recently released by the Organization for Economic Cooperation and Development.

The index combines eleven different measures of wellbeing, including health, education, income, and life satisfaction. Australia gets top honors for overall wellbeing, but U.S. women top the rankings (women in red, men in black; you’ll probably have to click to read this chart):

Any such index rests on many assumptions and value judgements, of course. So kudos to the OECD for providing a completely interactive version of the index. If you don’t like the way they combine the eleven factors, you can roll your own index and see what happens.

Among other things, that allows you to drill down on each of the individual factors the OECD considers. The “life satisfaction” element reveals that the United States is an outlier in another way: the disparity between women’s life satisfaction and men’s:

Korea has the largest gap, with women reporting much higher satisfaction than men. The United States has the second largest gap, with women noticeably more satisfied than men. (But do note that by this metric alone, the United States is not the best place for women–several countries are higher.)

If you like data, interactive graphics, and international comparisons, you’ll probably enjoy putting the OECD’s Better Life Index through its paces.

Child Mortality and Development, the Video

Courtesy of Bill Gates, here’s Hans Rosling talking child mortality and development.

(Gates emphasizes foreign aid in his description, but that seems secondary compared to development generally.)

Hans Rosling Breaks Down the Impact of Foreign Aid from bgC3 on Vimeo.

Better Than Feared, But Still Mediocre

America’s job market has been down so long, today’s mediocre report looked like up.

The headline figures — payrolls up 117,000, unemployment rate down a tic to 9.1% — were better than most forecasters anticipated. That’s a relief.

And many details moved in the right direction as well. Revisions to May and June added another 56,000 jobs, the U-6 measure of underemployment ticked down to 16.1%, and hourly earnings were up 0.4%.

But we still need much stronger job growth if we are ever going to get America back to work. Both unemployment and underemployment remain stubbornly high:

(The U-6 measures includes the officially unemployed, marginally attached workers, and those who are working part-time but want full-time work.)