Part-of-speech Tag Stripper

3 February, 2009 § 1 Comment

As part of a Natural Language Processing course I’m taking, we are developing a simple part-of-speech tagger using the Penn Treebank. Our part-of-speech tagger will use a training and a testing corpus. To make life easier, we can start with an already-tagged testing corpus to compare our results with the “truth”.

I wrote a small command-line app that will remove the part-of-speech tags from the testing corpus so you can begin working on the part-of-speech tagger and not have to worry about this orthogonal task.

As part of my new initiative that I am taking upon myself to release more of my code, I have attached the code for this [7kb]. It is a .zip file and you’ll have to rename it and remove the .doc and replace it with .zip, since WordPress won’t allow you to upload zip files. It is a Visual Studio 2005 solution. The code is STD C++ and runs through about 1.1 megs of the truth file in under a second.

It’s usage is: PosTagStripper <testingfile> <outputfile>

Let me know if you have any problems when running the code.

Tagged: , , , ,

§ One Response to Part-of-speech Tag Stripper

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

What’s this?

You are currently reading Part-of-speech Tag Stripper at JAWS.

meta

%d bloggers like this: