ViHealthBERT: Pre-trained Language Models for Vietnamese in Health Text Mining.

International Conference on Language Resources and Evaluation (LREC)(2022)

Cited 3|Views4
No score
Abstract
Pre-trained language models have become crucial to achieving competitive results across many Natural Language Processing (NLP) problems. For monolingual pre-trained models in low-resource languages, the quantity has been significantly increased. However, most of them relate to the general domain, and there are limited strong baseline language models for domain-specific. We introduce ViHealthBERT, the first domain-specific pre-trained language model for Vietnamese healthcare. The performance of our model shows strong results while outperforming the general domain language models in all health-related datasets. Moreover, we also present Vietnamese datasets for the healthcare domain for two tasks: Acronym Disambiguation (AD) and Frequently Asked Questions (FAQ) Summarization. We release ViHealthBERT to facilitate future research and downstream applications for Vietnamese NLP in domain-specific. Our dataset and code are available in https://github.com/demdecuong/vihealthbert.
More
Translated text
Key words
Low-resource language, language model, healthcare, acronym disambiguation, summarization * denotes equal contribution
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined