Jigsaw Falling into Place

Jigsaw Falling into Place

Two weeks ago, on March 31, I defended my dissertation in Brussels. The committee consisted of Louise McNally, Chris Barker, Luk Draye, Rick Nouwen, Guido Vanden Wyngaerd and my supervisors Dany Jaspers and Hans Smessaert. The book is available on the LOT website as a free Open Access fulltext PDF,...
Talk

Talk

I gave a talk on the internal structure of adjectival lexical fields in Ghent on January 26.
Latest entries

For the heck of it

The last few days, I’ve been working on a quest for the most frequent n-gram in a corpus. This means that the frequency of every single n-gram in the corpus has to be found and then compared. Linear search proves far too slow for this feat because you have to iterate through the entire list...

It’s the economy, stupid

Playing around with suffix arrays and binary search functions was fun, but when we used them in combination with the Brown Corpus, they turned out to be far too slow to find the frequency of a word. In the previous post, I gave two possible reasons: building a suffix array of one million words takes...

What’s the frequency, Kenneth?

The last few days have been quite hectic on this blog, with a lot of code being posted and lots of new things. Today, I’d like to start with a small recap of what has been happening here the last few weeks. The blog started off with the Brown Corpus: open it, return some statistics,...

The first, the last, my frequency

Yesterday we wrote a binary search function based on the work of Harm Brouwer and DaniĆ«l de Kok at nlpwp.org. Today we will write a binary search function that finds the frequency of a substring. It’s also based on the work at nlpwp.org. Okay… Imagine that we have a sorted array containing some characters: 0...

Magic revisited

Linear versus binary In the previous post we learnt how to build a suffix array. This is: a sorted list of all the n-grams of a string or corpus. Nice… But what is it good for? In my first posts, we used linear search to find (the frequency of) a specific element in a list....

Hopeful Monsters

In the previous post, I became friends with the suffix array. In this post, we will play the part of Dr. Frankenstein and build our very own suffix array in Haskell. Indeed: we build our own friends! All the code is based on the website nlpwp.org, but I changed the name of functions and will...

You are what you is

And a suffix array is a suffix array. But what is a suffix array? Good question! I’m still in the process of trying to understand this weird creature from some top secret government sorting programme, but I will try to give a comprehensive description of my findings. I probably got it wrong somewhere, so please...

Replace replace

The replace function from my earlier post appears to work well and it can be used in function composition. The only downside is its speed. If you want to replace seven characters, you have to make seven runs through the list and this takes time with the Brown Corpus – 12 seconds. So I looked...

Find and replace

The last few days, I’ve been working on a function to replace elements in a list. Apparently, the standard Haskell library does not include such a function. Data.List.Utils does, but I wanted to do it the hard way and build my own. For the time being, I came up with the function below. I still...

Fortune-telling

The second chapter in “Natural Language Processing for the Working Programmer” (nlpwp.org) deals with bigrams, n-grams and collocations. So what are these weird things? Bigrams are pairs of words that follow each other in a sentence. Chomsky’s sentence “Colorless green ideas sleep furiously” can be split up in the following list of bigrams: [["Colorless","green"],["green","ideas"],["ideas","sleep"],["sleep","furiously"]] N-grams...

Broken promises

The cute little program “brown” worked, but it had some problems: When I used it on the complete Brown Corpus, it created a stack place overflow because of lazy evaluation. In human language: it crashed. It did everything at once. I want to be able to pass commands and arguments to the application. The frequency...

First post!

The big test… In order to do something with Haskell and linguistics, I figured that I had to get my fundamentals right. Just to get going, I tried to write an application to: open a file use its contents for some easy computations send the interesting results to the screen/Terminal. I based my little program...