Workshop: Crossing borders: Three talks on Text Analysis and Digital Humanities

by Thierry POIBEAU - published on

The LATTICE lab organizes a conference (Friday 23 June 2017, from 9h00 to 12h30) entitled "Crossing borders: Three talks on Text Analysis and Digital Humanities"

Conférence organisée par le laboratoire LATTICE

Vendredi 23 juin 2017, 9h00 - 12h30

Ecole normale supérieure
Salle Jean Jaurès
29 rue d’Ulm
75005 Paris

Entrée libre sans inscription. Prévoir une pièce d’identité pour l’entrée dans les locaux.

Toutes les présentations seront en anglais.


(see the abstracts and bios of the speakers below)

— 9:00 - 9:30: Welcome & participant reception

9:30 - 9:40: Introduction

9:40 - 10:25: Melissa Terras, UCL (University College London)
Linking Crowdsourced Transcription to Automated Handwriting Recognition: Lessons from Transcribe Bentham

10:25 - 10:50: Coffee break

10:50 - 11:35: Caroline Sporleder, University of Göttingen
Computational Linguistics and Digital Humanities: Chances and Challenges

11:35 - 12:20: Elena González-Blanco, UNED (Madrid)
From counting syllables to linked data. Interoperability and digital standardization as a new model to analyze European poetry: POSTDATA”.

12:20 - 12:30: Wrap up


Melissa Terras, UCL (University College London)

For nearly seven years, the Transcribe Bentham project has been generating high quality crowdsourced transcripts of the writings of the philosopher and jurist Jeremy Bentham (1748-1832), held at University College London, and latterly, the British Library. Now with nearly 6 million words transcribed by volunteers, little did we know at the outset that this project would provide an ideal, quality controlled dataset to provide "ground truth" for the development of Handwriting Technology Recognition. This paper will look at the past, present and future of automated handwriting analysis for documents, showing how our research on the EU framework 7 Transcriptorium, and now H2020 READ projects, is working towards a service to improve the searching and analysis of digitised manuscript collections across Europe, and reusing the data created by crowdsourced, volunteer labour, for machine learning purposes.

Melissa Terras is Director of UCL Centre for Digital Humanities, Professor of Digital Humanities in UCL’s Department of Information Studies, and Vice Dean of Research in UCL’s Faculty of Arts and Humanities. Publications include "Image to Interpretation: Intelligent Systems to Aid Historians in the Reading of the Vindolanda Texts" (2006, Oxford University Press) and "Digital Images for the Information Professional" (2008, Ashgate) and she has co-edited various volumes such as "Digital Humanities in Practice" (Facet 2012) and "Defining Digital Humanities: A Reader" (Ashgate 2013). She is currently serving on the Board of Curators of the University of Oxford Libraries, and the Board of the National Library of Scotland, and is a Fellow of the Chartered Institute of Library and Information Professionals and Fellow of the British Computer Society. Her research focuses on the use of computational techniques to enable research in the arts and humanities that would otherwise be impossible. You can generally find her on twitter @melissaterras.

Caroline Sporleder, University of Göttingen

Digital Humanities (DH) is a field that has grown immensely in recent years. It is also a very diverse field covering -in its broadest definition- everything from corpus linguistics over computational philology and quantitative history to computational archaeology.
Because the origin of the field is rooted in corpus linguistics and computational philology and because data in the Humanities and Social Sciences are often (but not always) textual, digital text representation, processing, and mining are a major area of attention. Computational linguistics has a lot to contribute to this, both at the lower end of the scale (e.g., tools for OCR error correction and preprocessing) and at the higher end (e.g., sophisticated text mining tools). Computational linguistics can also benefit from evaluating its algorithms and tools on data from the Humanities as these data are often difficult, e.g. due to non-standard language and spelling, missing sentence boundaries, noisy input data and domains that are different from those typically considered in CL. Hence, CL for DH requires the development of very robust methods that work well on noisy data and do not require large amounts of training data. In this talk, I will address some of the chances and the challenges that arise when applying computational linguistic methods to data from the Humanities and Social Sciences.

Caroline Sporleder is a Professor in Digital Humanities and the Executive Director of the Göttingen Centre for Digital Humanities at the University of Göttingen, Germany. She was the head of the project ’’Asymmetrical Encounters: Digital Humanities Approaches to Reference Cultures in Europe, 1815-1992’’ (a EU HERA-Programme). She has served in various committees and is the current President of the ACL (Association for Computational Linguistics) Special Interest Group Language Technologies for the Socio-Economic Sciences and Humanities (SIGHUM). Her publications are at the crossroads of Digital Humanities and Natural language processing.

Elena González-Blanco, UNED (Madrid)

The need of standardization has been increasingly important in different fields as a common way of understanding and exchanging information. Scientific disciplines have early established formal protocols and languages, which have been quickly adopted and adapted to their particular problems. Humanities and cultural disciplines have followed, however, an independent path in which creativity and tradition play an essential role. Literature, and especially poetry, are a clear reflection of this idiosyncrasy. From the philological point of view, there is no uniform academic approach to analyze, classify or study the different poetic manifestations, and the divergence of theories is even bigger when comparing poetry schools from different languages and periods. POSTDATA project has been born to bridge the digital gap among traditional cultural assets and the growing world of data. It is focused on poetry analysis, classification and publication, applying Digital Humanities methods of academic analysis -such as XML-TEI encoding- in order to look for standardization. Interoperability problems between the different poetry collections are solved by using semantic web technologies to link and publish literary datasets in a structured way in the linked data cloud. The advantages of making poetry available online as machine-readable linked data are threefold: first, the academic community will have an accessible digital platform to work with poetic corpora and to contribute to its enrichment with their own texts; second, this way of encoding and standardizing poetic information will be a guarantee of preservation for poems published only in old books or even transmitted orally, as texts will be digitized and stored as XML files; third: datasets and corpora will be available and open access to be used by the community for other purposes, such as education, cultural diffusion or entertainment.

Elena González-Blanco is a Faculty member of the Spanish Literature and Literary Theory Department at Universidad Nacional de Educación a Distancia UNED (Open University) of Spain in Madrid. Her main research and teaching areas are Comparative Medieval Literature, Metrics and Poetry, and Digital Humanities. She has a high publication record with 45 papers in academic journals and book chapters published and she has presented papers and talks in many international conferences. She is the Director of the Digital Humanities Lab at UNED: LINHD (Laboratorio de Innovación en Humanidades Digitales). Elena González-Blanco is a very active member of the Digital Humanities community. She is member of the Executive Committee of EADH since 2013, member of the Executive Committee of the ADHO SIG GO::DH since 2013, and the Secretary of the Spanish Association for Digital Humanities: HDH (Humanidades Digitales Hispánicas, Sociedad Internacional).