Text “Data Cleaning”
Pre-process text as follows:
Morphological Analysis (Stemming)
inflectional, derivational, or crude IR methods
Part-of-Speech Tagging
I/Pro see/VP Pathfinder/PN on/P Mars/PN ...
Phrase Boundary Identification
[Subj I] [VP saw] [DO Pathfinder] [PP on Mars] [PP with a telescope].