Jigsaw Falling into Place

Jigsaw Falling into Place

Two weeks ago, on March 31, I defended my dissertation in Brussels. The committee consisted of Louise McNally, Chris Barker, Luk Draye, Rick Nouwen, Guido Vanden Wyngaerd and my supervisors Dany Jaspers and Hans Smessaert. The book is available on the LOT website as a free Open Access fulltext PDF,...
Talk

Talk

I gave a talk on the internal structure of adjectival lexical fields in Ghent on January 26.
Latest entries

LittleByLittle

I wrote the first functions for the POS-tagger, all of them are based on the work at www.nlpwp.org. The types are the main difference here: Harm Brouwer and Daniƫl de Kok create a new data type, while I just use type synonyms. That way, the TokenTag type is basically a tuple containing two Strings. It...

The Plan

My imitation of the POS-tagger at www.nlpwp.org was finished today and everything seems to work. Seems. The problem is that the nlpwp tagger was written for the sake of instruction and is not supposed to be a working application. The success of frequency-based tagging is evaluated, but the performance of the transformation-based tagging remains unclear,...

The Dragon Reborn

It’s been a while since I wrote a post for this weblog, and there are some (good) reasons: I’ve been very busy, both at work and at home. I relapsed into an old habit: reading The Wheel of Time. Parents, beware! Do not let your children read this stuff! I took a stroll into a...

West Indian Revelation

In this post we will: Test the efficiency of the POS tagger Improve its efficiency Power by numbers In order to develop the tagger, the people at www.nlpwp.org cut up the Brown corpus in two parts, one for training the tagger (brown-pos-train.txt) and one for testing it (brown-pos-test.txt). How do we test the tagger? We...

Taggart

In this post, we we will: present the last two functions for our tagger wrap the functions in some IO paper go for a test drive. Finishing touches The first function is simply a composition of tokenMostFreqTag and tokenTagFreqs to make our life a bit easier: trainFreqTagger :: [TokenTag] -> M.Map Token Tag trainFreqTagger =...

I am here to accumulate you

In the last post, we were building a POS (part of speech) tagger based on the work at www.nlpwp.org. We were already able to distil a Map from a corpus containing its tokens, their tags and the frequencies. Next, we need a function to fold through the Map and produce the tag with the highest...

Cliffhanger

The seventh chapter in nlpwp.org concerns part of speech tagging. What’s that? Well, imagine you have a corpus and you want to add tags to its words (or ‘tokens’) explaining their function. “The” is an article, “president” is a noun, “goes” is a verb etc. You could do it all by hand, but you could...

The crux of the biscuit

Classification and tagging In the previous post, I announced that I was going to work on classification. Unfortunately, the chapter in question on nlpwp.org is still very much under construction and only contains some introductory concepts on classification. Instead, I leaped to the chapter concerning part of speech tagging and worked my way through the...

Distractions

Belgium has been the hottest place in Europe during the last few days, and I have to admit that I was distracted by the sun. Another distraction from www.nlpwp.org was the book “The Haskell Road to Logic, Maths and Programming” that’s been sitting on my bookshelf for quite a while. Only recently did I conquer...

Full circle

It’s been a long trip, but here we are. Back at the beginning. The first, the last, my frequency. The first few weeks of this blog have basically been a laboratory and playground. Although it is not very hard to find the frequency of an n-gram in a corpus, we took some detours following the...

Where’s the Map?

In the previous posts, the frequency of n-grams in a given corpus was calculated using suffix arrays. Although this works well, I wondered if there was a more accessible way to find a string’s frequency. The Map type is being used a lot, so I rewrote some previous code using a Map: import qualified Data.Map...

… and we’re back!

I haven’t been taking any trains the last few days, which means that I haven’t been working on my train project (this blog). But now I’m using the fiendish Belgian railway service again, and from tomorrow on you will be able to read frequent posts once more (no pun intended). The first one will concern...