First paper

First paper

I published my first paper! Roelandt, K.,  “(The) most in Flemish Dutch: Definiteness and Specificity”,  In Margot Colinet, Sophia Katrenko, and Rasmus K. Rendsvig, editors, Pristine Perspectives on Logic, Language, and Computation. ESSLLI 2012 and ESSLLI 2013 Student Sessions. Selected Papers, volume 8607 of Lecture Notes in Computer Science, pages...
SLE Poster

SLE Poster

I presented a poster at the annual meeting of the Societas Linguistica Europaea (SLE) in Poznań.
Be good.

Be good.

The Gmail offer seemed perfectly reasonable at the time. Get lots of space for free, and we’ll show you some ads. We promise that the ads will be relevant and we won’t use banners or pop-ups. Just a sponsored link. You’ll get loads of space, and a flexible interface. It’s...
INSEMP poster

INSEMP poster

I presented a poster at Investigating Semantics: Empirical and Philosophical Approaches (INSEMP) in Bochum on October 13th, 2013.
Everything in its right place

Everything in its right place

A website has a life of its own. As you add bits, pages and subdomains, it changes into something you hadn’t really envisioned. After a while, you find remnants of projects that are no longer relevant and even as a webmaster it is hard to keep an overview. Cleaning and...
Latest entries

Thumbs down version

It works The last few weeks, I have been finishing the first version of the posTagger. And I’m thrilled to announce that it works. The posTagger currently: Analyses a training file and builds a list with tokens and their most frequent tag. Tags the tokens in the patching file. Unknown tokens are tagged as nouns...

Backtrack to the horizon, you Road Runner!

Do the backtrack Whenever exploring the Canadian wild, you should sing to scare off bears, wolves, cougars and squirrels. You should also backtrack now and again, just to make sure that you are where you think you are. Brill’s tagger is not exactly a prime example of Canadian scenery and I do not sing while...

The beft n-grams. Ever.

A few months ago, I was working on n-grams in the Brown Corpus and I was wondering about the speed of my Haskell application. And that seemed so important. And then some people from Harvard, MIT, the American Heritage Dictionary, the Encyclopedia Britannica, the Google got together and did this. 5 million books. 2 billion...

Va, vis, deviens

Instead of writing code like crazy, I’m reading documentation on Brill’s tagger. The first interesting text is Eric Brill’s dissertation A Corpus-Based Approach to Language Learning, where he describes a method for transformation-based tagging. It seems that a few of my questions are being answered in his thesis, so I will work my way through it...

When God Created the Coffee Break

I haven’t been posting a lot lately, but I have been working on the POS tagger. Trust me :-). The last few weeks, I put emphasis on rewriting the tagger built by Daniël de Kok and Harm Brouwer (www.nlpwp.org, based on an article by Eric Brill). I tried to understand the functions and capture them...

Travel is Dangerous

I haven’t posted something for a while, but I have been programming and rewriting parts of the posTagger from www.nlpwp.org: the tagger now works with three different files (training, testing, evaluation) it is trained based on the frequency of tags and adds the most common tag (“NN”). Next step is the implementation of transformational rules,...

The weight

A short post again. I wrote functions that add the default “NN” tag to the model if the token does not contain a tag. More precisely: when the tag value is Nothing instead of Just a tag. I also copied some functions that should run the model against an untagged test file. The distinction between...

LittleByLittle

I wrote the first functions for the POS-tagger, all of them are based on the work at www.nlpwp.org. The types are the main difference here: Harm Brouwer and Daniël de Kok create a new data type, while I just use type synonyms. That way, the TokenTag type is basically a tuple containing two Strings. It...

The Plan

My imitation of the POS-tagger at www.nlpwp.org was finished today and everything seems to work. Seems. The problem is that the nlpwp tagger was written for the sake of instruction and is not supposed to be a working application. The success of frequency-based tagging is evaluated, but the performance of the transformation-based tagging remains unclear,...

The Dragon Reborn

It’s been a while since I wrote a post for this weblog, and there are some (good) reasons: I’ve been very busy, both at work and at home. I relapsed into an old habit: reading The Wheel of Time. Parents, beware! Do not let your children read this stuff! I took a stroll into a...

West Indian Revelation

In this post we will: Test the efficiency of the POS tagger Improve its efficiency Power by numbers In order to develop the tagger, the people at www.nlpwp.org cut up the Brown corpus in two parts, one for training the tagger (brown-pos-train.txt) and one for testing it (brown-pos-test.txt). How do we test the tagger? We...

Taggart

In this post, we we will: present the last two functions for our tagger wrap the functions in some IO paper go for a test drive. Finishing touches The first function is simply a composition of tokenMostFreqTag and tokenTagFreqs to make our life a bit easier: trainFreqTagger :: [TokenTag] -> M.Map Token Tag trainFreqTagger =...