23 March, 2010 § 2 Comments
I gave a talk in my CSE 891: Language and Interaction class yesterday about some research that was done in respect to Natural Language Processing and eye-tracking.
The paper that I covered was written by Gerry Altman and Yuki Kamide and titled, “Now You See It, Now You Don’t: Mediating the Mapping between Language and the Visual World”. The paper has been published in the book “The Interface of Language, Vision, and Action: Eye Movements and the Visual World.”
Please take 15 minutes and watch the presentation I gave. Any feedback is greatly appreciated!
Update (3/30/2010): I now have the video on YouTube and have embedded it below.
5 March, 2010 § 4 Comments
This semester I’m taking two courses at MSU on my way to a Masters in Computer Science. As an aid to my studying, I’ve taken up watching videos of other schools lectures online, specifically lectures from the authors of our textbooks.
For my CSE 891 course: Language and Interaction, a course that goes more in-depth on specific aspects of Natural Language Processing, I’ve found lectures online from James Martin at the University of Colorado at Boulder. Martin co-authored Speech and Language Processing with Dan Jurafsky from Stanford.
- CSCI 5832 Lecture 3 (2010-01-19)
- CSCI 5832 Lecture 4 (2010-01-21)
- CSCI 5832 Lecture 5 (2010-01-26)
- CSCI 5832 Lecture 6 (2010-01-28)
- CSCI 5832 Lecture 7 (2010-02-02)
- CSCI 5832 Lecture 8 (2010-02-04)
- CSCI 5832 Lecture 9 (2010-02-09)
- CSCI 5832 Lecture 10 (2010-02-16)
- CSCI 5832 Lecture 11 (2010-02-18)
- CSCI 5832 Lecture 12 (2010-02-23)
- CSCI 5832 Lecture 13 (2010-02-25)
- CSCI 5832 Lecture 14 (2010-03-02)
- CSCI 5832 Lecture 15 (2010-03-04)
- More lectures can be seen at the RSS feed for the course
For my CSE 820 course: Advanced Computer Architecture, I’ve found lectures online from David Patterson at University of California at Berkeley. Patterson co-authored Computer Architecture: A Quantitative Appraoch with John L. Hennessy, who is currently the President of Stanford University.
20 April, 2009 § Leave a Comment
I’m currently working on negation tagging to understand the sentiments behind movie reviews. To partially understand some of the sentiments, I’m negating words that occur after a negative word (eg. not, isn’t, wasn’t, etc). Can you answer the following:
At a college English lecture a professor said to the class, “Two negative words like ‘not bad’ can have a positive sentiment, but no two positive words can carry a negative sentiment”. A student in the back proved the professor wrong with two words.
What two words do you think he said?
11 March, 2009 § Leave a Comment
As part of my Natural Language Processing course, I developed a simple part-of-speech tagger using the Penn Treebank. The software reads in a trained corpus of labeled data and then applies what it has learned against a testing corpus of unlabeled data. The unlabeled data is labeled after it is run through a hidden Markov model that calculates the best path that can be followed through the sentence.
What is a hidden Markov model?
In the most general sense, it is an algorithm that can calculate the best path through a series of states. The path that is determined is a set of hidden variables that pertain to each step in the known series. For example, with this application the sentence that the program reads in is known and visible. What is not known is the part-of-speech for each word, such as if it is a verb, pronoun, etc.
The hidden Markov model uses a dynamic programming algorithm called the Viterbi algorithm to turn an O(N^T) in to just O(N^2 * T) (with N: number of words in the sentence and with T: number of possible tags in corpus)
Care to take a look at the source code? I’ve uploaded the entire source code for the project and two other applications that I created to help with the development. One of them was previously released, and the other is an application to compare differences between the expected output and the actual output. It is a .zip file and you’ll have to rename it and remove the .doc and replace it with .zip, since WordPress won’t allow you to upload zip files. It is a Visual Studio 2005 solution. Inside of the zip file you will find a readme that will show the usage of the program.
3 February, 2009 § 1 Comment
As part of a Natural Language Processing course I’m taking, we are developing a simple part-of-speech tagger using the Penn Treebank. Our part-of-speech tagger will use a training and a testing corpus. To make life easier, we can start with an already-tagged testing corpus to compare our results with the “truth”.
I wrote a small command-line app that will remove the part-of-speech tags from the testing corpus so you can begin working on the part-of-speech tagger and not have to worry about this orthogonal task.
As part of my new initiative that I am taking upon myself to release more of my code, I have attached the code for this [7kb]. It is a .zip file and you’ll have to rename it and remove the .doc and replace it with .zip, since WordPress won’t allow you to upload zip files. It is a Visual Studio 2005 solution. The code is STD C++ and runs through about 1.1 megs of the truth file in under a second.
It’s usage is: PosTagStripper <testingfile> <outputfile>
Let me know if you have any problems when running the code.