Chrome Extension
WeChat Mini Program
Use on ChatGLM

An improved Bulgarian natural language processing pipeline

Melania Berbatova, Filip Ivanov

Annual of Sofia University St. Kliment Ohridski. Faculty of Mathematics and Informatics(2023)

Cited 0|Views0
No score
Abstract
In this paper, we present a language pipeline for processing Bulgarian language data. The pipeline consists of the following steps: tokenization, sentence splitting, part-of-speech tagging, dependency parsing, named entity recognition, lemmatization, and word sense disambiguation. The first two components are based on rules and lists of words specific to the Bulgarian language, while the rest of the components use machine learning algorithms trained on universal dependency data and pretrained word vectors. The pipeline is implemented in the Python library spaCy (https://spacy.io/) and achieves significant results on all the included subtasks. The pipeline is open source and is available on Github (https://github.com/melaniab/spacy-pipeline-bg/) for use by researchers and developers for a variety of natural language processing and text analysis tasks.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined