Archive for November, 2008

A Bit About My Project

November 26, 2008

While I still have the motivation I thought I would write a bit about my individual research project that I started about three and a half weeks ago.

The project is in the field of Natural Language Processing and what I am aiming to do is get some useful results using the tools already provided by my University.

I am going to use their existing NLP techniques for grammatical and semantic analysis and other tools such as WebBootCat for corpus compilation in the hope of improving web corpus compilation techniques. The main tools I’m using are WMatrix, JMatrix and WebBootCat as already mentioned.

So that’s a rough outline of what I’m trying to achieve now what have I been doing for the past month?

Since I got the project I have mainly been reading about NLP and it’s key terms so I understand what my supervisors’ are talking about! And so that I can use the tools and understand it’s output.

Sample output from WMatrix using POS tagging and arranged by frequency:

 

Jocks                NP2          2561
county               NN1          2558
singers              NN2          2473
is                   VBZ          2434
to                   TO           2192
pa                   NN1          2061
for                  IF           1523
language             NN1          1377
that                 CST          1362


So now that I have played around with the tools I’m approaching the stage were I will be starting to think about my project goals.  

 

What kind of results can be achieved?

Some examples could be:

  • Was it written by a child or adult. 
  • Somone pretending to be a child or adult. 
  • Was it written by a man or women. (see: Gender Analyzer)
  • The level of english e.g. that of a High School pupil or a Professor.
  • The technical quality of an article. 
  • Could be particulary useful for search engines of particular types of results wanted e.g. technical papers, blogs, articles, etc.

 
 

So as you can see the possiblities are exciting and plentiful.

Is this thing on?

November 24, 2008

I’ll be blogging about my projects and other stuff (if I can muster anything worth saying!) so watch this space.  

Until then check out reddit.com