Linguisti.cc

A collection of linguistic tools using web & JSON

Breakdown

This is a simple tool which parses text and returns the text broken down into parts of english grammar.

The classification key for breakdown result set
Key Element of Grammar Example(s)
CCCoord Conjuncnand,but,or
CDCardinal numberone,two
DTDeterminerthe,some
EXExistential therethere
FWForeign Wordmon dieu
INPrepositionof,in,by
JJAdjectivebig
JJRAdj., comparativebigger
JJSAdj., superlativebiggest
LSList item marker1,One
MDModalcan,should
NNNoun, sing. or massdog
NNPProper noun, sing.Edinburgh
NNPSProper noun, pluralSmiths
NNSNoun, pluraldogs
POSPossessive ending's
PDTPredeterminerall, both
PP$Possessive pronounmy,one's
PRPPersonal pronounI,you,she
RBAdverbquickly
RBRAdverb, comparativefaster
RBSAdverb, superlativefastest
RPParticleup,off
SYMSymbol+,%,&
TO'to'to
UHInterjectionoh, oops
VBverb, base formeat
VBDverb, past tenseate
VBGverb, gerundeating
VBNverb, past parteaten
VBPVerb, presenteat
VBZVerb, presenteats
WDTWh-determinerwhich,that
WPWh pronounwho,what
WP$Possessive-Whwhose
WRBWh-adverbhow,where
,Comma,
.Sent-final punct. ! ?
:Mid-sent punct.: ;
$Dollar sign$
#Pound sign#
"quote"
(Left paren(
)Right paren)

The url accepts a text string (parameter name 'text') submitted through either http 'POST' or 'GET'. The result will be a JSON object of a two-dimensional array containing each element of the input text coupled with its grammatical classification (see table).

The breakdown tool is located at http://linguisti.cc/breakdown

Examples

http://linguisti.cc/breakdown?text=I am the very model of a modern Major-General, I've information vegetable, animal, and mineral, I know the kings of England, and I quote the fights historical, from Marathon to Waterloo, in order categorical.
Submitting the preceeding text to the breakdown tool results in the JSON encoded two-dimensional array result:
[["I","NN"],["am","VBP"],["the","DT"],["very","RB"],["model","NN"],["of","IN"],["a","DT"],["modern","JJ"],["Major-General","NN"],[",",","],["I've","NN"],["information","NN"],["vegetable","NN"],[",",","],["animal","NN"],[",",","],["and","CC"],["mineral","NN"],[",",","],["I","NN"],["know","VB"],["the","DT"],["kings","NNS"],["of","IN"],["England","NNP"],[",",","],["and","CC"],["I","NN"],["quote","VB"],["the","DT"],["fights","NNS"],["historical","JJ"],[",",","],["from","IN"],["Marathon","NNP"],["to","TO"],["Waterloo","NN"],[",",","],["in","IN"],["order","NN"],["categorical","JJ"],[".","."]]

You may also use POST to submit larger blocks of text.



Credits

All of the hard work of this tool is performed by the glorious jspos (Javascript Part of Speech Tagger) library (over at Google Code) which is itself a port of other work. All I did was give it a web interface. Mostly because clients couldn't be expected to download the entire lexicon, so web hosting it is a natural fit. Much appreciation to Percy Wegmann for developing this tool out in Javascript and making it free to use. The keys table is lifted directly from the project's README.

Use

These tools are free to use. They are for fun and exploratory use, provided as-is without any warranty. They may be taken offline at any time. Let me know if you are using it, or are planning on hammering the servers.