Google’s Public Data: Much Improved

Google recently released some major improvements in its public data efforts. If you click on over to Public Data, you will find a much broader range of data sets including economic information from the OECD and World Bank, key economic statistics for the United States, and some education statistics for California. Google has also included more tools for visualizing these data, from standard line charts to the evolving bubble charts that have made Hans Rosling such a hit at TED.

As an example, I made a flash chart of state unemployment rates from 1990 to the present. Puerto Rico (which counts as a state for these purposes), Michigan, Nevada, and Rhode Island currently have the highest unemployment rates, so I thought it would be interesting to see how they stacked up against the other states over the past twenty years.

WordPress doesn’t allow me to embed Flash, but if you click on the image above and then click play, you will see the evolution of state unemployment rates over time. (Spoiler alert: All those colored bars move sharply upward toward the end of the “movie”.)

Long-time readers may recall my series of posts criticizing Google for directing its users to unemployment data that have not been seasonally adjusted. Happily, Google now allows the user to use either seasonally adjusted or non adjusted data. Two cheers for Google.

Why only two cheers rather than three? Because Google still directs unsuspecting users to unadjusted data–without the ability to switch to seasonally adjusted–if they do a Google search on “unemployment rate United States“. That’s a big deal, particularly for February 2010 when the official unemployment rate was 9.7%, but the unadjusted figure reported by Google was 10.4%.

Clearly, the two parts of Public Data need to integrate a bit more.

Google and Me, Part II

My existential crisis is over. As of last Thursday, Google is again including this blog in its search results. So, welcome to all the new readers who’ve come here after Googling information on the Eggo shortage and the debate about whether kids should get one H1N1 shot or two.

This is probably of interest only to other bloggers, but for the record: When I first started this blog, it took about six weeks for it to appear regularly in Google search results. After several months, the blog inexplicably (to me, at least) disappeared from Google’s results. As in *really* disappeared; as one friend pointed out, you couldn’t even find it if you searched for “Donald Marron blog”.  About eight weeks elapsed before it reappeared regularly in the first few pages of Google’s results.

My eight-week exile provided a nice natural experiment for evaluating Google’s importance. Not surprisingly, Google drives a good amount of traffic; readership is larger when Google knows about the blog. The more interesting impact, though, is a version of the Long Tail: with Google’s help, more posts find readers on any given day.


Insight on Google and Unemployment

In a series of posts (here, here, and here), I have expressed concern that Google directs its users to what I think is the “wrong” measure of unemployment. For example, if you search for “unemployment rate United States” today, it will tell you that the U.S. unemployment rate in August was 9.6%, when the actual figure is 9.7%.

This discrepancy arises because Google directs users to data that haven’t been adjusted for seasonal variations. Almost all discussions of the national economy, however, use data that have been seasonally-adjusted. Why? Because seasonally-adjusted data (usually) make it easier to figure out what’s actually happening in the economy. The unemployment rate always spikes up in January, for example, because retailers lay off their Christmas help. But that doesn’t mean that we should get concerned about the economy every January. Instead, we should ask how the January increase in the unemployment rate compares to a typical year. That’s what seasonal adjustment does.

My concern about Google’s approach is that many (if not most) data users know nothing about seasonal adjustment. They simply want to know what the unemployment rate is and how it has changed over time. Directing those users to the non-seasonally-adjusted data thus seems like a form of search malpractice.

I’ve wondered why Google has chosen this approach, and thus was thrilled when reader Jonathan Biggar provided the answer in a recent comment. Jonathan writes:

Google and Me

A strange this happened last week: Google misplaced my blog.

I’ve run all the usual diagnostics, and I can confirm that Google still knows that my blog exists. But it no longer appears in any of the searches – e.g., “natural gas price”, “unemployment”, “budget deficit”, or “brooke boemio” – that used to help new readers find posts on my site.

Things are so bad, in fact, that my blog doesn’t even come up when you search for “donald marron”. I feel an existential crisis coming on.

I presume this is just the result of some obscure algorithm tweak and that, over time, my posts will reappear in the ranks of the Google-worthy. But it’s fun to imagine that Google is mad at me for my posts criticizing the way it reports unemployment data.

I just checked and, no surprise, Google is still reporting the wrong data. If you search for “unemployment rate”, Google will tell you that the U.S. unemployment rate was 9.6% in August, when in fact it was 9.7%. Why the difference? Because Google is reporting an obscure measure of unemployment, not the one used by 99% of the world.

Google Is Still Wrong About Unemployment

Everyone who follows the U.S. economy closely knows that the unemployment rate was 9.4% in July, down 0.1% from June.

Everyone, that is, except Google.

If you ask Google (by searching for “unemployment rate United States“), it will tell you the unemployment rate in July was 9.7%.

What’s going on? Well, it turns out that Google is directing users to the wrong data series. As I discussed last month, almost everyone who talks about unemployment is using (whether they know it or not) data that have been adjusted to remove known seasonal patterns in hiring and layoffs (e.g., many school teachers become unemployed in June and reemployed in August or September). Adjusting for such seasonal patterns is standard protocol because it makes it easier for data users to extract signals from the noisy movements in data over time.

For unknown reasons, Google has chosen not to direct users to these data. Instead, Google reports data that haven’t been seasonally adjusted and thus do not match what most of the world is using.

This is troubling, since I have high hopes for Google’s vision of bringing the power of search to data sets. The ability of users to find and access data lags far behind their ability to find and access text. I am hopeful that Google will solve part of this problem.

But data search is not about mindlessly pointing users to data series. You need to make sure that users get directed to the right data series. So far, Google is failing on that front, at least with unemployment data.

 P.S. As I discussed in a follow-up post last month, Wofram Alpha has an even more ambitious vision for making data — and computation — available through search. I like many of the things Alpha is trying to do, but they are lagging behind Google in several ways. For example, as I write this, they haven’t updated the unemployment data yet to reflect the new July data. (Click here for Alpha results.)

Bing isn’t trying yet.

Bing Bounces Onto Yahoo

Yesterday’s deal between Microsoft and Yahoo is a big boost for Bing. Microsoft’s new engine will power search on Yahoo, raising its visibility and, perhaps, eating into Google’s market leadership.

If the stock market is any guide, Microsoft is getting the better of the deal. As Techcrunch notes, Yahoo’s stock fell 12% on the day, lopping almost $3 billion off its market cap:


Microsoft , on the other hand, was up  about 1.4%  — boosting its market cap by about $3 billion.

The real question, of course, is how the deal will affect Google. GOOG was down about 0.8% (around $1 billion in market cap), a bit more than the decline in the Dow or the Nasdaq. That suggests that Google investors respect the MSFT-YHOO deal, but aren’t running scared just yet.

The logic of the deal seems impeccable. Yahoo is an also-ran in the search space, while Microsoft’s Bing is an exciting new entrant. Just how far Yahoo has trailed in search was driven home for me when I reviewed my posts about the search market (here is a list). Google gets the most attention in those posts, of course, but I also discussed competitors Bing, Wolfram Alpha, and Cuil. But it never occurred to me to mention Yahoo. That oversight is vindicated by today’s deal.

Personally, I am looking forward to having Bing on the Yahoo home page. I’ve spent far too much effort avoiding Yahoo’s search engine (e.g., by uninstalling the annoying Yahoo toolbar that various services foist on you when you get new software). Perhaps now I will have reason to let Yahoo take up a bit more valuable screen space.

Disclosure: I don’t own stock in any of these companies.

Google and Antitrust

The August Wired has a nice article about the increased antitrust scrutiny that Google is facing. (Updated July 28, 2009 I would usually insert a link to the article, but I couldn’t find one online; sorry, but I am working from the dead-tree-and-ink version that the postman dropped off.)

Early on, the article notes some ironies of the current situation:

More than 15 years ago, federal regulators began making Microsoft the symbol of anticompetitive behavior in the tech industry. Now, a newly activist DOJ may try to do the same thing to Google.

It is an ironic position for the search giant to find itself in. [CEO Eric] Schmidt not only campaigned enthusiastically for the very Obama administration that appointed [DOJ antitrust chief Christine] Varney, but also was one of the most devoted opponents of Microsoft in the mid-’90s, eagerly helping the government build its case against the software firm.

A few weeks ago, I described some of the arguments that Google might use to defend itself. The Wired article elaborates on one of these: it’s fine for a company to be a monopoly if, as John Houseman used to say, they earn it. It then points to the other issues that may raise concerns:

