Accueil > Faits marquants

Résultat du LATTICE pour de la tâche CoNLL 2017 (analyse syntaxique dans plus de 50 langues !)

publié le , mis à jour le

Analyse syntaxique multilingue

Multilingual Parsing from Raw Text to Universal Dependencies

Félicitations à l’équipe du LATTICE qui a fini 5e lors de la tâche d’évaluation CoNLL 2017. Il s’agissait de fournir une analyse syntaxique automatique de plus de 80 corpus, représentant une cinquantaine de langues au format dit Universal Dependencies.

Ci-dessous les résultats tels que présentés sur le site de la tâche (en anglais) et une bref descriptif à la suite.

This is the main ranking of the 5 first systems by their macro-averaged LAS F1 score. The ±range corresponds to 95% confidence interval computed with bootstrap resampling. The [OK] means all 81 test sets were parsed, otherwise the number in square brackets is the number of test sets with non-zero LAS.

1. Stanford (Stanford) 76.30 ± 0.12 [OK]
2. C2L2 (Ithaca) 75.00 ± 0.12 [OK]
3. IMS (Stuttgart) 74.42 ± 0.13 [OK]
4. HIT-SCIR (Harbin) 72.11 ± 0.14 [OK]
5. LATTICE (Paris) 70.93 ± 0.13 [OK]

All the results can be found here

Description of the task

Ten years ago, two CoNLL shared tasks were a major milestone for parsing research in general and dependency parsing in particular. For the first time dependency treebanks in more than ten languages were available for learning parsers ; many of them were used in follow-up work, evaluating parsers on multiple languages became a standard ; and multiple state-of-the art, open-source parsers became available, facilitating production of dependency structures to be used in downstream applications. While the 2006 and 2007 tasks were extremely important in setting the scene for the following years, there were also limitations that complicated application of their results : 1. gold-standard tokenization and tags in the test data moved the tasks away from real-world scenarios, and 2. incompatible annotation schemes made cross-linguistic comparison impossible. CoNLL 2017 will pick up the threads of the pioneering tasks and address these two issues.

The focus of the 2017 task is learning syntactic dependency parsers that can work in a real-world setting, starting from raw text, and that can work over many typologically different languages, even surprise languages for which there is little or no training data, by exploiting a common syntactic annotation standard. This task has been made possible by the Universal Dependencies initiative (UD), which has developed treebanks for 40+ languages with cross-linguistically consistent annotation and recoverability of the original raw texts. For the Shared Task, the Universal Dependencies version 2 (UD v2) annotation scheme will be used.