In this post we will:

  • Test the efficiency of the POS tagger
  • Improve its efficiency

Power by numbers

In order to develop the tagger, the people at www.nlpwp.org cut up the Brown corpus in two parts, one for training the tagger (brown-pos-train.txt) and one for testing it (brown-pos-test.txt). How do we test the tagger? We take the tokens of the test file, apply the model to them, and then check them against the tags in the test file. Enter the evalTagger function:

evalTagger tagFun = L.foldl' eval [0, 0, 0] where
   eval [number, correct, unknown] (TokenTag token correctTag) = case tagFun token of
      Just tag -> if tag == correctTag 
                  then [number+1, correct+1, unknown] 
                  else [number+1, correct, unknown] 
      Nothing -> [number+1, correct, unknown+1]

For every token we check, we add 1 to ‘number’. Is it correct? Then ‘correct’ gets 1up. If the model does not contain the tag, ‘unknown’ gets 1up. I also added a prettyprinter to get some readable results instead of a list containing three numbers:

prettyPrint :: [Int] -> IO()
prettyPrint [evaluated, correct, unknown] = do 
   putStrLn ("Total tagged correctly: " ++ (show $ ((correct `myDiv` evaluated) * 100)) ++ " %")
   putStrLn ("Unknown: " ++ (show $ ((unknown `myDiv` evaluated) * 100)) ++ " %")
   putStrLn ("Total tagged known: " ++ (show $ ((correct `myDiv` (evaluated-unknown)) * 100)) ↵
   ++ " %")
						
myDiv :: Int -> Int -> Float
myDiv n1 n2 = (fromIntegral n1) / (fromIntegral n2)

We all wrap it in some IO paper…

test :: [String] -> IO()
test [trainName, testName] = do
   fileHandle <- openFile trainName ReadMode
   contents <- hGetContents fileHandle
   fileHandle2 <- openFile testName ReadMode
   testContents <- hGetContents fileHandle2
   let model = trainFreqTagger $ map toTokenTag $ words contents
   let stats = evalTagger (freqTagWord model) $ map toTokenTag $ words testContents
   let result = zip ["evaluated","correct","unknown"] stats
   putStrLn ("***Frequency***n" ++ (show result))
   prettyPrint stats
   putStrLn ("***Frequency and basetag (NN)***n" ++ (show result))
   prettyPrint baseStats
   hClose fileHandle
   hClose fileHandle2
test _ = do
   putStrLn "Error: the test command requires two arguments: test trainName testName"

... and give it a try.

>./morse test brown-pos-train.txt brown-pos-test.txt
***Frequency***
[("evaluated",272901),("correct",239230),("unknown",11536)]
Total tagged correctly: 87.66183 %
Unknown: 4.2271743 %
Total tagged known: 91.531006 %

Improvement

About 4 % of the tokens in the test file are unknown to the model. How can we get this number down? We could add a particular tag to each unknown token. An article? A preposition? Hmm. These are closed or (I call them conservative) groups: they don't allow a lot of new words. More open or 'progressive' groups are nouns, verbs or adjectives. And the first one is probably the most progressive one.

Let's add a function that attaches a tag to every unknown token:

baseTagger :: (Token -> Maybe Tag) -> Tag -> Token -> Maybe Tag 
baseTagger function baseTag token = 	let pick = function token
                                        in case pick of 
                                           Just tag -> Just tag 
                                           Nothing -> Just baseTag

We then calculate the test results with the "NN" tag linked to every unknown token and prettyprint the results:

let baseStats = evalTagger (baseTagger (freqTagWord model) "NN") $ map toTokenTag $ ↵ 
words testContents
putStrLn ("***Frequency and basetag (NN)***n" ++ (show result))
prettyPrint baseStats

And this eventually yields the following result:

> ./morse test brown-pos-train.txt brown-pos-test.txt
***Frequency***
[("evaluated",272901),("correct",239230),("unknown",11536)]
Total tagged correctly: 87.66183 %
Unknown: 4.2271743 %
Total tagged known: 91.531006 %
***Frequency and basetag (NN)***
[("evaluated",272901),("correct",239230),("unknown",11536)]
Total tagged correctly: 88.5266 %
Unknown: 0.0 %
Total tagged known: 88.5266 %

In the next post, we will improve the tagger using transformational rules, based on an article by Eric Brill.

And the title of the post? I was listening to Taj Mahal and was out of inspiration...