Wolfram Alpha, Unemployment, and the Future of Data

I’ve received a number of helpful responses to my post about the strengths and weaknesses of Google’s efforts to transform data on the web. Reader DD, for example, reminded me that I ought to run the same test on Wolfram Alpha, which I briefly mentioned in my post on Google’s antitrust troubles.

Wolfram Alpha is devoting enormous resources to the problem of data and computation on the web. As described in a fascinating article in Technology Review, Wolfram’s vision is to curate all the world’s data. Not just find and link to it, but have a human think about how best to report it and how to connect it to relevant calculation and visualization techniques. In short:

[Wolfram] Alpha was meant to compute answers rather than list web pages. It would consist of three elements, honed by hand …: a constantly expanding collection of data sets, an elaborate calculator, and a natural-language interface for queries.

That is certainly a grand vision. Let’s see how it does if I run the same test “unemployment rate United States” I used for Google:

Continue reading “Wolfram Alpha, Unemployment, and the Future of Data”

Google, Unemployment, and the Future of Data

Google may eventually solve the problem of finding data on the web. Too bad its first effort reports the wrong numbers for unemployment.

Since leaving public service, I have occasionally pondered whether to start a company / organization to transform the way that data are made available on the web. The data are out there, but they remain a nuisance to find, a nuisance to manipulate, and a nuisance to display. I cringe every time I have to download CSV files, import to Excel, manipulate the data (in a good sense), make a chart, and fix the dumb formatting choices that Excel makes. All those steps should be much, much easier.

There are good solutions to many of these problems if you have a research assistant or are ready to spend $20,000 on an annual subscription. With ongoing technology advances, however, there ought to be a much cheaper (perhaps even free) way of doing this on the net.  With some good programming, some servers, and careful design (both graphic and human factors), it should be possible to dis-intermediate research assistants and democratize the ability to access and analyze data. At least, that’s my vision.

Many organizations have attacked various pieces of this problem, and a few have even made some headway (FRED deserves special mention in economics). But when you think about it, this is really a problem that Google ought to solve. It has the servers, software expertise, and business model to make this work at large scale. And with its launch of a search service for public data it has already signaled its interest in this problem.

As a major data consumer, I wish Google every success in this effort. However, I’d also like to use their initial effort, now almost three months old, as a case study in what not to do.

Google’s first offering of economics data is the unemployment rate for the United States (also available for the individual states and various localities). Search for “unemployment rate united states” and Google will give you the following graph:

Google UE

Your first reaction should be that this is great. With absolutely no muss and no fuss, you have an excellent (albeit sobering) chart of the unemployment rate since 1990. I would add myriad extensions to this – e.g., make it easier to look at shorter time periods, allow users to look at the change in the unemployment rate, rather than the level, etc. – but the basic concept is outstanding.

Unfortunately, there is one major problem:  That’s the wrong unemployment rate.

Click over to the Bureau of Labor Statistics, open a newspaper (remember them?), or stay right here on my blog – all of them will tell you that the unemployment rate in June was 9.5% not 9.7%.

Continue reading “Google, Unemployment, and the Future of Data”

Better GDP Data

Every five years, the fine people at the Bureau of Economic Analysis update the way that they measure the U.S. economy.  Yesterday, the BEA released a helpful document that outlines some of the upcoming improvements.  Among the things that caught my eye:

  • BEA will employ plain English, rather than bureaucratese, to describe the three vintages of GDP estimates, which are reported one, two, and three months after the end of each quarter. Those vintages are currently known as the Advance estimate, the Preliminary Estimate, and Final estimate. The latter two names always struck me as nonsensical: “Preliminary” sounds like it should come before “Advance,” and “Final” estimates aren’t really final. Hence the new names: the Advance Estimate, the Second Estimate, and the Third Estimate.  A definite improvement.

Continue reading “Better GDP Data”