A few months ago, I was working on n-grams in the Brown Corpus and I was wondering about the speed of my Haskell application. And that seemed so important. And then some people from Harvard, MIT, the American Heritage Dictionary, the Encyclopedia Britannica, the Google got together and did this. 5 million books. 2 billion n-grams. And you can play with them at Google Labs. And you can download the datasets. Looking for data? This is a cookie jar the size of the moon.

And I, for one, welcome our new n-gram overlords.

Thanks for the link, KrisDS!