Skip to main content
+

Access to GitLab of CorTexT

 

Terms of republication

You can republish this article for free on your website, blog, etc.

CorTexT, a platform for researchers in the humanities and social sciences

Published in July 2024
-

In response to the diversification and multiplicity of increasingly complex data sources, the CorTexT platform offers innovative tools and methods for exploring and analysing texts for those working in the humanities and social sciences. Developed in the LISIS laboratory, it is based on three key principles: expertise, sharing and openness.

“An interface between researchers and analysis methods”. This was how Lionel Villard, a Lecturer at ESIEE Paris and Researcher at LISIS, and Philippe Breucker, an Engineer at INRAE, presented the CorTexT platform, of which they are Director and Technical Director respectively. Initiated in 2008 as a project supported successively by IFRIS, the SITES LabEx and RISIS2 (H2020) and initially hosted by the INRAE Science in Society research unit (UR SenS) until 2015, when the LISIS LABORATORY took over, CorTexT is based on two observations: on the one hand, the massification of data and the heterogeneity of its distribution sources, due to the development of information and communication technologies; and on the other, the inadequacy of traditional methods of calculation and computer processing in the humanities and social sciences with regard to meeting these new challenges.

Simplified processing of complex elements

CorTexT has been designed as a digital laboratory offering high-performance tools and services for researchers, grouped together within a flagship application: CorTexT Manager. The application is designed to produce a range of different analyses relating to the methodological fields of natural language processing, social network analysis, statistics and, more recently, the geographical dimension integrated into the data concerned.

In response to the complexity of the information to be processed, CorTexT Manager is designed to facilitate things for users. After registering, users simply submit a textual corpus, the content of which - speeches, names, quotations, places, dates, etc. - is analysed by the platform's algorithms, alongside extensive textual archives composed of articles, quotations or patents from the international, national and even regional press, whether specialist or general, as well as social media. A distributional, relational and geo-coded analysis is then produced that highlights the links between the different concepts, people or organisations in order to provide an overview of a particular study area. This makes it possible, for example, to compare how the use of hydroxychloroquine was addressed in the press and academic circles during the Covid-19 pandemic, in order to study the relationship between science and public debate.

By, for and with researchers

Lecturers at Université Gustave Eiffel, Sciences Po and EM Lyon, doctoral students and researchers, experts from ANR and ANSES, members of public research bodies... CorTexT users come from a wide range of backgrounds. Whether for research linked to a publication, support in running a course or strategic analysis to help political decision-making, CorTexT Manager now has more than 9,000 unique users in 120 countries.

The development of the platform's structure and code is part of a partnership-based approach involving CorTexT experts and users, who work together to develop new methods that meet an immediate and specific need and are designed to enhance the range of services available to the scientific community as a whole. This desire to pool resources is reflected in the openness of the product codes, allowing users to evaluate, reproduce and even improve the proposed operating methods.

Glossary

Natural Language Processing (NLP): a multidisciplinary field involving linguistics, computer science and artificial intelligence, which aims to create tools for processing text and speech (including signed speech) for different applications. NLP combines advances in computational linguistics (rule-based language models) and statistical, machine- and deep-learning methods. Natural Language Processing is one of the major fields of application of artificial intelligence.

Source : https://www.inshs.cnrs.fr/fr/traitement-automatique-de-la-langue

Identity card of code

Code access:https://gitlab.com/cortext/
Citations:https://docs.cortext.net/how-to-cite-cortext-manager/
Contact:lionel.villard@esiee.fr
URL:www.cortext.net
References:www.cortext.net/publications/
Key words:Humanities and Social Sciences, corpus, socio-semantic analysis