Brown Corpus

You are what you is

And a suffix array is a suffix array. But what is a suffix array? Good question! I’m still in the process of trying to understand this weird creature from some top secret government sorting programme, but I will try to give a comprehensive description of my findings. I probably got it wrong somewhere, so please...

Replace replace

The replace function from my earlier post appears to work well and it can be used in function composition. The only downside is its speed. If you want to replace seven characters, you have to make seven runs through the list and this takes time with the Brown Corpus – 12 seconds. So I looked...

Fortune-telling

The second chapter in “Natural Language Processing for the Working Programmer” (nlpwp.org) deals with bigrams, n-grams and collocations. So what are these weird things? Bigrams are pairs of words that follow each other in a sentence. Chomsky’s sentence “Colorless green ideas sleep furiously” can be split up in the following list of bigrams: [["Colorless","green"],["green","ideas"],["ideas","sleep"],["sleep","furiously"]] N-grams...

Broken promises

The cute little program “brown” worked, but it had some problems: When I used it on the complete Brown Corpus, it created a stack place overflow because of lazy evaluation. In human language: it crashed. It did everything at once. I want to be able to pass commands and arguments to the application. The frequency...

First post!

The big test… In order to do something with Haskell and linguistics, I figured that I had to get my fundamentals right. Just to get going, I tried to write an application to: open a file use its contents for some easy computations send the interesting results to the screen/Terminal. I based my little program...