Niall Ferguson’s Mistake Makes the Case for Metadata

Harvard historian Niall Ferguson goofed on Bloomberg TV yesterday. Arguing that the 2009 stimulus had little effect, he said:

The point I made in the piece [his controversial cover story in Newsweek] was that the stimulus had a very short-term effect, which is very clear if you look, for example, at the federal employment numbers. There’s a huge spike in early 2010, and then it falls back down.  (This is slightly edited from the transcription by Invictus at The Big Picture.)

That spike did happen. But as every economic data jockey knows, it doesn’t reflect the stimulus; it’s temporary hiring of Census workers.

Ferguson ought to know that. He’s trying to position himself as an important economic commentator and that should require basic familiarity with key data.

But Ferguson is just the tip of the iceberg. For every prominent pundit, there are thousands of other people—students, business analysts, congressional staffers, and interested citizens—who use these data and sometimes make the same mistakes. I’m sure I do as well—it’s hard to know every relevant anomaly in the data. As I said in one of my first blog posts back in 2009:

Data rarely speak for themselves. There’s almost always some folklore, known to initiates, about how data should and should not be used. As the web transforms the availability and use of data, it’s essential that the folklore be democratized as much as the raw data themselves.

How would that democratization work? One approach would be to create metadata for key economic data series. Just as your camera attachs time, date, GPS coordinates, and who knows what else to each digital photograph you take, so could each economic data point be accompanied by a field identifying any special issues and providing a link for users who want more information.

When Niall Ferguson calls up a chart of federal employment statistics at his favorite data provider, such metadata would allow them to display something like this:

 

Clicking on or hovering over the “2” would then reveal text: “Federal employment boosted by temporary Census hiring; for more information see link.” And the stimulus mistake would be avoided.

I am, of course, skimming over a host of practical challenges. How do you decide which anomalies should be included in the metadata? When should charts show a single flag for metadata issues, even when the underlying data have it for each affected datapoint?

And, perhaps most important, who should do this? It would be great if the statistical agencies could do it, so the information could filter out through the entire data-using community. But their budgets are already tight. Failing that, perhaps the fine folks at FRED could do it; they’ve certainly revolutionized access to the raw data. Or even Google, which already does something similar to highlight news stories on its stock price charts, but would need to create the underlying database of metadata.

Here’s hoping that someone will do it. Democratizing data folklore would reduce needless confusion about economic facts so we can focus on real economic challenges. And it just might remind me what happened to federal employment in early 2009.

7 thoughts on “Niall Ferguson’s Mistake Makes the Case for Metadata”

  1. What a great post, thank you!

    I’ve been trying to make similar arguments and I really fear that data gets misused and misinterpreted by perfectly innocent people who just don’t have the time to figure out all the catches and caveats hidden in the numbers.

    But there are two big problems I see:

    1. How do we identify trends, breaks, or “abnormalities” in the data that actually need explanation?

    Example: Census hiring might seem obvious enough, but is a 5% jump in a stock’s price enough to justify looking for a specific event that may have caused such a jump?

    What if the market as a whole was up 5% too? Should Google therefore look at residual price changes (i.e. after correcting for overall market movements with a CAPM)?

    2. To attach specific events to time series as a candidate explanation for some kind of apparent shift, break, trend, or abnormality requires great care.

    It’s easy to mislead people by pointing to one cause when that cause only explains 20% of the observed change in the data.

    So should these kinds of attached explanations report how much of the observed change in the data might be due to the explanation provided?

    How?

    Example: Census hiring is an easy enough explanation for the 2010 spike in Federal employment, but I might doubt that anyone, Google in particular, could possibly decide which of the thousands of events that affect stock prices at different times actually explains a significant portion of a price change.

    And how much of a trend, break, or abnormality should one factor explain before it shows up in the chart, for example? 10%? 100%? How can we tell?

    It’s all very problematic, unfortunately.

    PS: I used the stock price thing with Google only as an example. I think the underlying principles apply to other data too though.

    1. Good points. One screen would be whatever information that the statistics publishers choose to identify. BLS, for example, chose to identify the Census effect specifically in its data releases, but as text which gets lost when people access the data series. So that would make the cut. And you’d want to make it easy to get BLS’s estimate of the number of Census workers to see how that compares to the anomaly. As you say, there will be some cases where the additional info explains only a part of the move.

      As Dan notes, a Wikipedia approach might work if enough folks devote their energy to figure out what factors ought to be identified. But as Gordon’s comment suggests, you might also get some disagreement, not all fully informed.

  2. In a word, “Datapedia”. What you probably want is to create a wikipedia of data to allow those with the information to share it and decide as a community what merits notation. And to make sure it gets funded, include sports data…

  3. Interesting post and a worthwhile objective.

    If I may make a tangential point, although I realize that the point of the post was not just that laypeople can make significant analytical errors due to ignorance of key additional information (or, more generally, of some key aspects of proper/improper use of the data in question), but also that experts make such mistakes sometimes, presumably the latter do so substantially less often, and so, with the democratization of data, although I wouldn’t discourage laypeople from exploring and performing their own analyses and developing hypothesis that contradict conventional wisdom and/or expert consensus, I would encourage laypeople to do so with proper humility and a common sense degree of appreciation for their greater likelihood of making analytical errors vs. experts, particularly when they reach “conclusions” that — just coincidentally, of course — happen to support some policy preference and/or ideology to which the armchair analyst or his audience of fellow laypeople subscribe.

    For example (and not to pick on conservatives; I’ve seen the same on the left), I have seen some folks over at RedState.com make something of a hobby of presenting “analyses” that “disprove” global warming. These posts are readily and generally unquestioningly concurred with and praised by the crowd in the ideologically hyperpartisan echo chamber.

    Similarly, several years ago on RedState.com I challenged the gospel there that “tax cuts increase revenues”, as supposedly (per their argument) clearly proven by increases in revenues in the few years following the Kennedy, Reagan and W. Bush tax cuts. Of course, the folks seeing those revenue increases as proof obviously didn’t even know the first thing about correlation, let alone establishing causation, since they didn’t bother to consider that revenues generally increase over any few years regardless of whether taxes were increased, decreased, or not changed at all (because the economy is more often growing than in recession), even in real terms (although inflation was another factor some neglected), not to mention other factors it didn’t even occur to them to consider. When I pointed out these flaws in the “analysis” on which they based their passionately-held premise, I was met with much scorn, and charged with being an “idiot”, a “troll”, and basically for someone who couldn’t see — or was pretending not to see — the obvious.

    My point is that we laypeople (in my case, vis a vis either of the above fields), should (1) seek expert consensus, if it may exist, as a check on our analyses and conclusions, and (2) not only seek the views of experts who may be biased in the “other” direction (due to partisanship or whatever), but also seek the views of experts who, if their conclusions were to reflect some bias or agenda, would support one’s conclusion. #2 may seem on the surface like a waste of time, but sometimes it is the best piece of research one can do, because it establishes a boundary of how far experts whose incentive or bias, if any, would be toward supporting one’s conclusion, actually go in supporting that conclusion. It would not be uncommon for one to find that even those experts don’t go nearly as far at that layperson, or even that they flat out disagree with the layperson’s conclusion.

    Back to my “tax cuts” example. I compiled a collection of quotes of mostly conservative, very prominent economists opining on the claim that tax cuts generally have a net positive impact on revenues. I tried to find prominent conservatives who made that assertion. All I could find were rejections of that premise by all the prominent conservative economists whose opinions I could find, including the top economists of the George W. Bush Administration, and the most highly regarded economists among conservatives. Here’s my compilation, posted originally in 2007 http://swordscrossed.org/node/1671

    I presented this information at RedState.com, and a small percentage of those who responded were influenced by it, but, as evidence of the extent to which some people just want to believe what they want to believe, most there responded with sharp attacks along the lines of “You’re just appealing to authority. Why should I believe economists, who are wrong so often anyway, when I can see with my own eyes that tax cuts have always led to increases in revenue??” Clearly some people just can’t be helped. But it would be good for political discourse and good for the nation if the needle could be moved in the direction of people appreciating the weight of expert consensus, and of considering view of experts inclined to support a particular conclusion and those inclined to reject that conclusion.

    1. One of the frustrations in this line of work is how often people reason from their beliefs to their “facts” rather than from facts to their beliefs. This is particularly true when dealing with counterfactuals, e.g., what would revenue have been otherwise. People can do before-after comparisons, but as you note, that isn’t enough if you’ve got a trend. In the taxes example, it gives a result (over enough years and barring sharp economic downturns) that gives a result consistent with the predetermined belief that tax cuts boost revenue.

      1. Donald,

        Yeah, to paraphrase H.L. Mencken, people use such “fact-finding” the way a drunk uses a lamppost: more for support than for illumination. By the way, I often recommend http://www.amazon.com/True-Enough-Learning-Post-Fact-Society/dp/0470050101

        And it doesn’t help that there are so many experts who are hyperpartisan (in terms of ideological and/or party agenda). Hyperpartisan experts help fuel the this “more for support” habit of many Americans today. And although their offerings have some value (such as establishing boundaries of how far the experts on that “side” go on something, as I discuss in my comment above, and also in laying out the strongest case in some direction), they also perform a great disservice insofar as they let their biases and agenda-driven insincerity shape the views they offer the public on analytical matters in their areas of expertise. When comparing policy options, we laypeople need to get a sense from experts of what the trade-offs are so we can then apply our own values and priorities in reaching a policy preference. When hyperpartisan experts instead offer up analyses that they have skewed toward some bias or knowingly to serve their policy advocacy objective, they are instead arrogantly substituting their own values and priorities and presenting bogus analysis that they represent to the public as objective analysis showing trade-offs associated with policy alternatives, seeking to deny the public the opportunity to consider the actual trade-offs (the trade-offs the expert would be communicating if he were making a good faith effort at objectivity and honesty), presumably because the expert in question thinks the public might not apply the “right” values and priorities, and, in turn, reach the “right” or “best” policy preferences.

        That’s one reason I have particular respect and appreciation for your role in America’s fiscal policy discourse. I see you as an expert who makes a good faith effort at objectivity, as well as being honest and open, regardless of whatever your personal policy preferences may be.

Comments are closed.