Catalan Parliamentary Plenary Session Transcriptions from 2015 to 2022. The ParlaMintCAT Corpus

Marilina Pisani,Rodolfo Zevallos,Nuria Bel

PROCESAMIENTO DEL LENGUAJE NATURAL(2023)

Cited 0|Views4
No score
Abstract
Parliamentary speeches are considered to be of interest for different research areas because they are publicly available transcriptions, produced under controlled and regulated procedures that add totally reliable sociodemographic data like gender, age, and other details of the speakers. Moreover, speeches are rich in topics and domains, and they are actually public domain data, not subject to copyright restrictions. The ParlaMint project: Towards Comparable Parliamentary Corpora is developing a comparable and uniformly annotated multilingual corpus with the data from 33 different parliaments in Europe. This paper describes the details of building the ParlaMintCAT corpus, for which the transcriptions of the Catalan Parliament General Assembly sessions from 2015 to 2022 have been compiled, processed and annotated.
More
Translated text
Key words
parliamentary corpora,ParlaMint,linguistic annotation,metadata,Catalan
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined