Séminaire Lattice 17 septembre 2024 – Promise Dodzi Kpoglu

Dans le cadre de ses séminaires, le Lattice invite Promise Dodzi Kpoglu le 17/09/2024. Le sujet du séminaire sera "Optimizing cognacy for automatic historical linguistics tasks:The critical role of segmentation"

Salle 510 (Lattice, 1 rue Maurice Arnoux, 92120 Montrouge)

En visio : https://www.gotomeet.me/visio-lattice

Horaire : 10h30-12h.

Intervenant : Promise Dodzi Kpoglu (CNRS/Université Lyon 2)

Titre : Optimizing cognacy for automatic historical linguistics tasks:The critical role of segmentation

Résumé : 

When performing automatic recognition of cognates in a language family, one task is widely recognized as crucial to performance – namely, segmentation into phonemes. This presentation focuses on high-level text segmentation, and its impact on algorithmic performance in computational historical linguistics, specifically with the aim to detect cognate words among Dogon languages (Mali). Starting from a dataset containing words from various Dogon languages, I adopt a flat segmentation mechanism – as opposed to hierarchical segmentation – and concentrate on the “surface” segmentation. I will show that the degree of segmentation significantly influences the results.
First, I will provide an overview of the Dogon languages and review previous computational attempts to determine their phylogenetic relationships through cognacy. I will then introduce the dataset and describe the text processing it has undergone, explaining the segmentation task and the methodologies used. Finally, I will present the results of the cognate detection tasks, illustrating how different levels of segmentation correlate with varying phylogenetic classifications generated from cognate sets. Contrary to the notion that high-level segmentation can be managed with relatively “shallow” linguistic processing, I argue that, at least for tasks in automatic comparative historical linguistics on specific morphological profiles, “deep” linguistic processing is essential at this level.

A lire aussi