English Models
PRE: tools that need to be run prior to the model.
DATA: the dataset used to build the model.
EVAL: evaluation on the dataset that the model is trained on.
BM: the standard benchmark evaluation for the task.
tokseg: any tokenizer with sentence segmentation.
inmixed: any dataset in the mixed corpus.
Tokenization
Model ID |
|---|
Morphological Analysis
Model ID |
PRE |
|---|---|
tokseg |
Part-of-Speech Tagging
Model ID |
PRE |
DATA |
EVAL |
BM |
|---|---|---|---|---|
tokseg |
Mixed |
97.80% |
97.72% |
EVAL: accuracy.
BM: accuracy on the Wall Street Journal portion of the Penn Treebank using the standard split (trn: 0-18; dev: 19-21; tst: 22-24).
Named Entity Recognition
Model ID |
PRE |
DATA |
EVAL |
BM |
|---|---|---|---|---|
tokseg |
OntoNotes |
88.75% |
92.74% |
EVAL: F1-score.
BM: F1-score on the English dataset distributed by the CoNLL 2003 shared task.
Dependency Parsing
Model ID |
PRE |
DATA |
EVAL |
BM |
|---|---|---|---|---|
tokseg |
Mixed |
92.26/91.03 |
96.08/95.02 |
EVAL: UAS (unlabeled attachment score) / LAS (labeled attachment score).
BM: UAS/LAS on the Wall Street Journal portion of the Penn Treebank using the standard split (trn: 2-21; dev: 22, 24; tst: 23) and the Stanford typed dependencies.
Semantic Dependency Parsing
Model ID |
PRE |
DATA |
EVAL |
BM |
|---|---|---|---|---|
tokseg |
Mixed |
? |
90.68/85.34 |
EVAL: Labeled F1 score.
BM: Average labeled F1 scores on the in-domain and out-of-domain test sets distributed by the SemEval 2015 shared task.