Netflix Boosts Prize Economics

By at least one metric – the number of people who have mentioned it to me – my brief post about Netflix appears to be my most popular one so far.

The post linked to a remarkable slide deck about the corporate culture that Netflix has embraced in its quest for excellence. Most memorable line: “adequate performance gets a generous severance package.” If you haven’t seen it, I encourage you to click on over. It’s worth your time.

Yesterday’s award of the first Netflix prize highlights another strength of Netflix’s culture: it clearly does not suffer from “not-invented-here” syndrome. Indeed, quite the reverse. A few years ago, Netflix realized that it had reached its limit in trying to improve the accuracy of its movie recommendation system. Even though users may rate dozens (or more) movies, it turns out to be difficult to predict what other movies they will like.

So Netflix decided to outsource this problem in an ingenious way: it offered a $1 million prize to any person or team that could improve the recommendation algorithm by at least 10%. Stated that way, the problem seems deceptively easy. But it took nearly three years before the winner – a team led by AT&T Research engineers – took home the prize.

As recounted in Netflix’s press release, this marathon ended in a race to the wire:

“We had a bona fide race right to the very end,” said [CEO Reed] Hastings. “Teams that had previously battled it out independently joined forces to surpass the 10 percent barrier. New submissions arrived fast and furious in the closing hours and the competition had more twists and turns than ‘The Crying Game,’ ‘The Usual Suspects’ and all the ‘Bourne’ movies wrapped into one.”

Netflix said “BellKor’s Pragmatic Chaos” edged out a team called “The Ensemble,” another collaboration of former competitors, with the winning submission coming just 24 minutes before the conclusion of the nearly three-year-long contest. The competition was so close and the submissions so sophisticated that it took a team of external and internal judges several weeks to validate the winner after the contest closed on July 26.

Happily, the resulting algorithm won’t be exclusive to Netflix:

The contest’s rules require the winning team to publish its methods so that businesses in many fields can benefit from the work done. The winning submission and the previously hidden ratings used to score the contest will be published at the University of California Irvine Machine Learning Repository. The team licensed its work to Netflix and is free to license it to other companies.

On the first day of my microeconomics class, I told my students that economics is all about incentives. As an example, I used the famous prize for a way to measure longitude, which inspired the invention of the chronometer (i.e., a clock of sufficient precision to measure longitude). Next time around, I will mention the Netflix prize as well.

P.S. Not one to rest on its successes, Netflix has already announced plans for a second Netflix prize. This one aims to find a better way to recommend movies to people based on demographic data (e.g., where they live) rather than movie ratings.

Insight on Google and Unemployment

In a series of posts (here, here, and here), I have expressed concern that Google directs its users to what I think is the “wrong” measure of unemployment. For example, if you search for “unemployment rate United States” today, it will tell you that the U.S. unemployment rate in August was 9.6%, when the actual figure is 9.7%.

This discrepancy arises because Google directs users to data that haven’t been adjusted for seasonal variations. Almost all discussions of the national economy, however, use data that have been seasonally-adjusted. Why? Because seasonally-adjusted data (usually) make it easier to figure out what’s actually happening in the economy. The unemployment rate always spikes up in January, for example, because retailers lay off their Christmas help. But that doesn’t mean that we should get concerned about the economy every January. Instead, we should ask how the January increase in the unemployment rate compares to a typical year. That’s what seasonal adjustment does.

My concern about Google’s approach is that many (if not most) data users know nothing about seasonal adjustment. They simply want to know what the unemployment rate is and how it has changed over time. Directing those users to the non-seasonally-adjusted data thus seems like a form of search malpractice.

I’ve wondered why Google has chosen this approach, and thus was thrilled when reader Jonathan Biggar provided the answer in a recent comment. Jonathan writes:

Continue reading “Insight on Google and Unemployment”

Voyaging Through U.S. Jobs

In honor of Labor Day, you may want to check out Job Voyager by Flare. It provides a graphical history of the rise and fall of different types of jobs in the United States from 1850 to 2000.

Here’s what you get for “Farmer”:

Farmer Jobs

Back in 1850, farmers accounted for more than 40% of reported jobs. Today, less than 1%.

If you click around, you will find that the decline in farmers has been offset by growth in a host of jobs, including clerical, retail, and nurses.

And economists? Well, we grew rapidly until 1990, and then tailed off. Perhaps the would-be economists ran off to Wall Street instead?

Economist Jobs

P.S. The Job Voyager charts were inspired by the famous Name Voyager charts that let you track the popularity of first names.

Craigslist’s Business Model

The magazine Wired regularly publishes some of the most interesting articles about economics and the modern world. Last month, for example, they had a great article about the antitrust threats looming over Google. The month before, it covered the economics of Somali pirates, which I never found time to write about. And the month before that, it discussed how Google, not eBay, is really the master of auctions.

This month, Wired provides an in-depth look at craigslist.

For those who don’t already know, craigslist is the place to post classified ads on the web. According to Wired, it is the world’s “most popular dating site,” “the most popular job-search site,” and “the nation’s largest apartment-hunting site.” Not to mention the myriad other things you can buy, sell, trade, give, receive, etc. on the site.

How did craigslist achieve this dominance? The article doesn’t provide a crisp answer, but it does offer some gems about how the company operates.  You should read the article for the full effect, but here is a sample:

On innovation:

Think of any Web feature that has become popular in the past 10 years: Chances are craigslist has considered it and rejected it. If you try to build a third-party application designed to make craigslist work better, the management will almost certainly throw up technical roadblocks to shut you down.

Continue reading “Craigslist’s Business Model”

Google Is Still Wrong About Unemployment

Everyone who follows the U.S. economy closely knows that the unemployment rate was 9.4% in July, down 0.1% from June.

Everyone, that is, except Google.

If you ask Google (by searching for “unemployment rate United States“), it will tell you the unemployment rate in July was 9.7%.

What’s going on? Well, it turns out that Google is directing users to the wrong data series. As I discussed last month, almost everyone who talks about unemployment is using (whether they know it or not) data that have been adjusted to remove known seasonal patterns in hiring and layoffs (e.g., many school teachers become unemployed in June and reemployed in August or September). Adjusting for such seasonal patterns is standard protocol because it makes it easier for data users to extract signals from the noisy movements in data over time.

For unknown reasons, Google has chosen not to direct users to these data. Instead, Google reports data that haven’t been seasonally adjusted and thus do not match what most of the world is using.

This is troubling, since I have high hopes for Google’s vision of bringing the power of search to data sets. The ability of users to find and access data lags far behind their ability to find and access text. I am hopeful that Google will solve part of this problem.

But data search is not about mindlessly pointing users to data series. You need to make sure that users get directed to the right data series. So far, Google is failing on that front, at least with unemployment data.

 P.S. As I discussed in a follow-up post last month, Wofram Alpha has an even more ambitious vision for making data — and computation — available through search. I like many of the things Alpha is trying to do, but they are lagging behind Google in several ways. For example, as I write this, they haven’t updated the unemployment data yet to reflect the new July data. (Click here for Alpha results.)

Bing isn’t trying yet.

Follow-up: Defense, Mortgage Modifications, and Yahoo/Microsoft

This morning’s headlines include some important follow-ups to recent posts:

Bing Bounces Onto Yahoo

Yesterday’s deal between Microsoft and Yahoo is a big boost for Bing. Microsoft’s new engine will power search on Yahoo, raising its visibility and, perhaps, eating into Google’s market leadership.

If the stock market is any guide, Microsoft is getting the better of the deal. As Techcrunch notes, Yahoo’s stock fell 12% on the day, lopping almost $3 billion off its market cap:

yahoodown

Microsoft , on the other hand, was up  about 1.4%  — boosting its market cap by about $3 billion.

The real question, of course, is how the deal will affect Google. GOOG was down about 0.8% (around $1 billion in market cap), a bit more than the decline in the Dow or the Nasdaq. That suggests that Google investors respect the MSFT-YHOO deal, but aren’t running scared just yet.

The logic of the deal seems impeccable. Yahoo is an also-ran in the search space, while Microsoft’s Bing is an exciting new entrant. Just how far Yahoo has trailed in search was driven home for me when I reviewed my posts about the search market (here is a list). Google gets the most attention in those posts, of course, but I also discussed competitors Bing, Wolfram Alpha, and Cuil. But it never occurred to me to mention Yahoo. That oversight is vindicated by today’s deal.

Personally, I am looking forward to having Bing on the Yahoo home page. I’ve spent far too much effort avoiding Yahoo’s search engine (e.g., by uninstalling the annoying Yahoo toolbar that various services foist on you when you get new software). Perhaps now I will have reason to let Yahoo take up a bit more valuable screen space.

Disclosure: I don’t own stock in any of these companies.

Google and Antitrust

The August Wired has a nice article about the increased antitrust scrutiny that Google is facing. (Updated July 28, 2009 I would usually insert a link to the article, but I couldn’t find one online; sorry, but I am working from the dead-tree-and-ink version that the postman dropped off.)

Early on, the article notes some ironies of the current situation:

More than 15 years ago, federal regulators began making Microsoft the symbol of anticompetitive behavior in the tech industry. Now, a newly activist DOJ may try to do the same thing to Google.

It is an ironic position for the search giant to find itself in. [CEO Eric] Schmidt not only campaigned enthusiastically for the very Obama administration that appointed [DOJ antitrust chief Christine] Varney, but also was one of the most devoted opponents of Microsoft in the mid-’90s, eagerly helping the government build its case against the software firm.

A few weeks ago, I described some of the arguments that Google might use to defend itself. The Wired article elaborates on one of these: it’s fine for a company to be a monopoly if, as John Houseman used to say, they earn it. It then points to the other issues that may raise concerns:

Continue reading “Google and Antitrust”

Wolfram Alpha, Unemployment, and the Future of Data

I’ve received a number of helpful responses to my post about the strengths and weaknesses of Google’s efforts to transform data on the web. Reader DD, for example, reminded me that I ought to run the same test on Wolfram Alpha, which I briefly mentioned in my post on Google’s antitrust troubles.

Wolfram Alpha is devoting enormous resources to the problem of data and computation on the web. As described in a fascinating article in Technology Review, Wolfram’s vision is to curate all the world’s data. Not just find and link to it, but have a human think about how best to report it and how to connect it to relevant calculation and visualization techniques. In short:

[Wolfram] Alpha was meant to compute answers rather than list web pages. It would consist of three elements, honed by hand …: a constantly expanding collection of data sets, an elaborate calculator, and a natural-language interface for queries.

That is certainly a grand vision. Let’s see how it does if I run the same test “unemployment rate United States” I used for Google:

Continue reading “Wolfram Alpha, Unemployment, and the Future of Data”

Google, Unemployment, and the Future of Data

Google may eventually solve the problem of finding data on the web. Too bad its first effort reports the wrong numbers for unemployment.

Since leaving public service, I have occasionally pondered whether to start a company / organization to transform the way that data are made available on the web. The data are out there, but they remain a nuisance to find, a nuisance to manipulate, and a nuisance to display. I cringe every time I have to download CSV files, import to Excel, manipulate the data (in a good sense), make a chart, and fix the dumb formatting choices that Excel makes. All those steps should be much, much easier.

There are good solutions to many of these problems if you have a research assistant or are ready to spend $20,000 on an annual subscription. With ongoing technology advances, however, there ought to be a much cheaper (perhaps even free) way of doing this on the net.  With some good programming, some servers, and careful design (both graphic and human factors), it should be possible to dis-intermediate research assistants and democratize the ability to access and analyze data. At least, that’s my vision.

Many organizations have attacked various pieces of this problem, and a few have even made some headway (FRED deserves special mention in economics). But when you think about it, this is really a problem that Google ought to solve. It has the servers, software expertise, and business model to make this work at large scale. And with its launch of a search service for public data it has already signaled its interest in this problem.

As a major data consumer, I wish Google every success in this effort. However, I’d also like to use their initial effort, now almost three months old, as a case study in what not to do.

Google’s first offering of economics data is the unemployment rate for the United States (also available for the individual states and various localities). Search for “unemployment rate united states” and Google will give you the following graph:

Google UE

Your first reaction should be that this is great. With absolutely no muss and no fuss, you have an excellent (albeit sobering) chart of the unemployment rate since 1990. I would add myriad extensions to this – e.g., make it easier to look at shorter time periods, allow users to look at the change in the unemployment rate, rather than the level, etc. – but the basic concept is outstanding.

Unfortunately, there is one major problem:  That’s the wrong unemployment rate.

Click over to the Bureau of Labor Statistics, open a newspaper (remember them?), or stay right here on my blog – all of them will tell you that the unemployment rate in June was 9.5% not 9.7%.

Continue reading “Google, Unemployment, and the Future of Data”