This package contains an automatic parsing of the Princeton WordNet Gloss Corpus project, in an easy-to-use XML file. The original data are here: http://wordnet.princeton.edu/glosstag.shtml DISCLAIMER: This XML is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. NOTE: - Within example sentences, only the synset terms were disambiguated. - Text enclosed within tags has been escaped using XML entities. In order to retrieve the actual human-readable character we highly suggest to unescape it. (e.g. in Java, you can use the StringEscapeUtils class provided by the apache commons-lang API). ============================================================================================================================================== FORMAT OF THE XML FILE ============================================================================================================================================== The XML contains a list of "synset" tags, with the respective "id" (i.e. the part-of-speech and the WordNet offset) as attribute. Each synset contains a list of "examples" and "definitions" tags. Then, example and definition contains the plain text (tokenized), and a list of "token" tag. Each token has two or four attributes, depending on whether or not contains an annotation : - "pos" : the part of speech of the token; - "lemma" : the lemma of the token; - "senseID" : the WordNet senseKey of the token (if it is disambiguated); - "senseType" : the sense annotation type. There are two different sense annotation type: - man : manually-inserted sense tag - auto : automatically generated sense tag The following listing shows the XML representation: force out the air force out the air force out the splinter force out the splinter emit or cause to move with force of effort emit or cause to move with force of effort