Zanran: Google for Data?

Zanran is a new search engine, now in beta testing, that focuses on charts and tables. As its website says:

Zanran helps you to find ‘semi-structured’ data on the web. This is the numerical data that people have presented as graphs and tables and charts. For example, the data could be a graph in a PDF report, or a table in an Excel spreadsheet, or a barchart shown as an image in an HTML page. This huge amount of information can be difficult to find using conventional search engines, which are focused primarily on finding text rather than graphs, tables and bar charts.

Put more simply: Zanran is Google for data.

This is a stellar idea. The web holds phenomenal amounts of data that are hard to find buried inside documents. And Zanran offers a fast way to find and scan through documents that may have relevant material. Particularly helpful is the ability to hover your cursor over each document to see the chart Zanran’s thinks you are interested in before you click through to the document.

Zanran is clearly in beta, however, and has some major challenges ahead. Perhaps most important are determining which results should rank high and identifying recent data. If you type “united states GDP” into Zanran, for example, the top results are rather idiosyncratic and there’s nothing on the first few pages that directs you to the latest data from the Bureau of Economic Analysis. Google, in contrast, has the BEA as its third result. And its first result is a graphical display of GDP data via Google’s Public Data project. Too bad, though, it goes up only to 2009. For some reason, both Google and Zanran think the CIA is the best place to get U.S. GDP data. It is a good source for international comparisons, but it falls out of date.

Here’s wishing Zanran good luck in strengthening its search results as its competes with Google, Wolfram Alpha, and others in the data search.

Google Is Still Wrong About Unemployment

Everyone who follows the U.S. economy closely knows that the unemployment rate was 9.4% in July, down 0.1% from June.

Everyone, that is, except Google.

If you ask Google (by searching for “unemployment rate United States“), it will tell you the unemployment rate in July was 9.7%.

What’s going on? Well, it turns out that Google is directing users to the wrong data series. As I discussed last month, almost everyone who talks about unemployment is using (whether they know it or not) data that have been adjusted to remove known seasonal patterns in hiring and layoffs (e.g., many school teachers become unemployed in June and reemployed in August or September). Adjusting for such seasonal patterns is standard protocol because it makes it easier for data users to extract signals from the noisy movements in data over time.

For unknown reasons, Google has chosen not to direct users to these data. Instead, Google reports data that haven’t been seasonally adjusted and thus do not match what most of the world is using.

This is troubling, since I have high hopes for Google’s vision of bringing the power of search to data sets. The ability of users to find and access data lags far behind their ability to find and access text. I am hopeful that Google will solve part of this problem.

But data search is not about mindlessly pointing users to data series. You need to make sure that users get directed to the right data series. So far, Google is failing on that front, at least with unemployment data.

 P.S. As I discussed in a follow-up post last month, Wofram Alpha has an even more ambitious vision for making data — and computation — available through search. I like many of the things Alpha is trying to do, but they are lagging behind Google in several ways. For example, as I write this, they haven’t updated the unemployment data yet to reflect the new July data. (Click here for Alpha results.)

Bing isn’t trying yet.

Google and Antitrust

The August Wired has a nice article about the increased antitrust scrutiny that Google is facing. (Updated July 28, 2009 I would usually insert a link to the article, but I couldn’t find one online; sorry, but I am working from the dead-tree-and-ink version that the postman dropped off.)

Early on, the article notes some ironies of the current situation:

More than 15 years ago, federal regulators began making Microsoft the symbol of anticompetitive behavior in the tech industry. Now, a newly activist DOJ may try to do the same thing to Google.

It is an ironic position for the search giant to find itself in. [CEO Eric] Schmidt not only campaigned enthusiastically for the very Obama administration that appointed [DOJ antitrust chief Christine] Varney, but also was one of the most devoted opponents of Microsoft in the mid-’90s, eagerly helping the government build its case against the software firm.

A few weeks ago, I described some of the arguments that Google might use to defend itself. The Wired article elaborates on one of these: it’s fine for a company to be a monopoly if, as John Houseman used to say, they earn it. It then points to the other issues that may raise concerns:

Continue reading “Google and Antitrust”

Wolfram Alpha, Unemployment, and the Future of Data

I’ve received a number of helpful responses to my post about the strengths and weaknesses of Google’s efforts to transform data on the web. Reader DD, for example, reminded me that I ought to run the same test on Wolfram Alpha, which I briefly mentioned in my post on Google’s antitrust troubles.

Wolfram Alpha is devoting enormous resources to the problem of data and computation on the web. As described in a fascinating article in Technology Review, Wolfram’s vision is to curate all the world’s data. Not just find and link to it, but have a human think about how best to report it and how to connect it to relevant calculation and visualization techniques. In short:

[Wolfram] Alpha was meant to compute answers rather than list web pages. It would consist of three elements, honed by hand …: a constantly expanding collection of data sets, an elaborate calculator, and a natural-language interface for queries.

That is certainly a grand vision. Let’s see how it does if I run the same test “unemployment rate United States” I used for Google:

Continue reading “Wolfram Alpha, Unemployment, and the Future of Data”

Google’s Defense

Google will likely face close scrutiny from the Obama administration. Indeed, it is already the subject of at least three separate antitrust reviews. Here are three ways Google will try to defend itself.

As Jeff Horwitz notes in the Washington Post this morning (“Google Says It’s Actually Quite Small“, previously posted on Slate), the search giant will likely face close scrutiny from the Obama administration.  Indeed, Google is already the subject of at least three separate antitrust reviews.

How will Google try to defend itself?

As Horwitz reports, Google will undoubtedly employ two classic defenses:

Defense 1.  Being a monopolist isn’t illegal.  If firms achieve market dominance through “superior skill, foresight, and industry” (as Justice Learned Hand put it decades ago), that’s fine under our system.  We want to reward firms that gain market share by being innovative and delivering value to customers.

Continue reading “Google’s Defense”