26 May, 2007

Present Tense English Parser : Part I

As part of an ongoing project during which I have been designing, building and testing in one way or another over the past decade and a half, I have arrived as the parser phase.  Well, I will correct that statement.  I have tinkered with creating parsers before, but thanks to the expressive nature of the Python language, I was finally ready to make a serious attempt at writing an English present tense command based parser.  I'm not going to make a massive post about this, though I am going to post the test results.


Note, all the tests pass.  What a passing result actually means is this; The parsers job as of version 0.5.4 is to break apart the sentence(s) properly into their components via identification of verbs, conjunctions, prepositions, articles, conjunctions, pronouns and punctuation.  


Creating Parser Instance:                                                                                  : Passed


Loading Configuration for Instance:                                                                 : Passed


Testing for version: 0.5.0 


  paint the gold bucket black                                                                             : Passed

  get the big , heavy hammer and kill Bob with it !                                           : Passed

  get hammer and squirrel from Bob and then hammer squirrel into the wall .  : Passed

  get the gold gold                                                                                             : Passed

  kill elf and get gold                                                                                         : Passed

  paint the bucket gold                                                                                       : Passed

  paint the gold bucket black !                                                                           : Passed

  get gold                                                                                                            : Passed

  kill elf , get gold                                                                                              : Passed

  get the large gold brick .                                                                                  : Passed

  paint the bucket gold .                                                                                     : Passed

  get the large , gold brick .                                                                                : Passed


Testing for version: 0.5.1 


  get rock , pliers , hammer and squirrel and hammer squirrel into the wall .     : Passed


Testing for version: 0.5.2 


  kill the trite little elf with my sword , then wipe the blood off of it !                : Passed

  destroy the cantankerous creature before you eat your dessert                        : Passed

  kill the trite little elf with my sword , then wipe the blood off of my sword !  : Passed

  kill the trite little elf with my sword .                                                               : Passed


Testing for version: 0.5.3 


  hammer the hammer into the big hammer                                                       : Passed

  hammer the hammer into the hammer                                                             : Passed


Testing for version: 0.5.4 


  kill the trite little elf with my lavacious sword , then wipe the blood off it !   : Passed

  go to the store and buy a new cellphone                                                        : Passed

  slit Fred's throat and capture the warm , red blood in a cup !                         : Passed

  play with my toys and listen to my music .                                                    : Passed

  play with my toys and listen to music .                                                          : Passed


As can be seen, the variety of possible inputs for the parser vary from simple to complex, from grammatically perfect to questionable fragments.  Being that the purpose of this parse is first and foremost for use in a command environment in which interaction is needed, thus the present tense only requirement.  This is a massive relief on the demands of the parser, but even still, it can be see from the above that the system can differentiate key words which can be used in both noun and adjective forms.  The system also handle post adjective usage.  


The system currently most notably recognises over 9,000 verbs (regular and irregular), 50 prepositions, and a whopping 46,000+ adjectives.  A call for test case phrases is hereby announced.  I am satisfied enough with the stage one parse process that I hereby am moving on to the second parse stage, that is the creation and order of individual statements (as dictated by their prepositions), in preparation for the third and final stage, in which the parser sends the results from stage two to the action engine.  Both those phases will be the subjects of new posts, accordingly.