PyCantonese: Cantonese Linguistics and NLP in Python.

Jackson Lee,Litong Chen,Charles Lam, Chaak Ming Lau, Tsz-Him Tsui

International Conference on Language Resources and Evaluation (LREC)(2022)

Cited 0|Views8
No score
Abstract
This paper introduces PyCantonese, an open-source Python library for Cantonese linguistics and natural language processing. After the library design, implementation, corpus data format, and key datasets included are introduced, the paper provides an overview of the currently implemented functionality: stop words, handling Jyutping romanization, word segmentation, part-of-speech tagging, and parsing Cantonese text.
More
Translated text
Key words
Cantonese, Jyutping, word segmentation, part-of-speech tagging, stop words
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined