While I still have the motivation I thought I would write a bit about my individual research project that I started about three and a half weeks ago.
The project is in the field of Natural Language Processing and what I am aiming to do is get some useful results using the tools already provided by my University.
I am going to use their existing NLP techniques for grammatical and semantic analysis and other tools such as WebBootCat for corpus compilation in the hope of improving web corpus compilation techniques. The main tools I’m using are WMatrix, JMatrix and WebBootCat as already mentioned.
So that’s a rough outline of what I’m trying to achieve now what have I been doing for the past month?
Since I got the project I have mainly been reading about NLP and it’s key terms so I understand what my supervisors’ are talking about! And so that I can use the tools and understand it’s output.
Sample output from WMatrix using POS tagging and arranged by frequency:
Jocks NP2 2561 county NN1 2558 singers NN2 2473 is VBZ 2434 to TO 2192 pa NN1 2061 for IF 1523 language NN1 1377 that CST 1362
So now that I have played around with the tools I’m approaching the stage were I will be starting to think about my project goals.
What kind of results can be achieved?
Some examples could be:
- Was it written by a child or adult.
- Somone pretending to be a child or adult.
- Was it written by a man or women. (see: Gender Analyzer)
- The level of english e.g. that of a High School pupil or a Professor.
- The technical quality of an article.
- Could be particulary useful for search engines of particular types of results wanted e.g. technical papers, blogs, articles, etc.
So as you can see the possiblities are exciting and plentiful.