Indexing Portuguese NLP Resources with PT-Pump-Up
CoRR(2024)
摘要
The recent advances in natural language processing (NLP) are linked to
training processes that require vast amounts of corpora. Access to this data is
commonly not a trivial process due to resource dispersion and the need to
maintain these infrastructures online and up-to-date. New developments in NLP
are often compromised due to the scarcity of data or lack of a shared
repository that works as an entry point to the community. This is especially
true in low and mid-resource languages, such as Portuguese, which lack data and
proper resource management infrastructures. In this work, we propose
PT-Pump-Up, a set of tools that aim to reduce resource dispersion and improve
the accessibility to Portuguese NLP resources. Our proposal is divided into
four software components: a) a web platform to list the available resources; b)
a client-side Python package to simplify the loading of Portuguese NLP
resources; c) an administrative Python package to manage the platform and d) a
public GitHub repository to foster future collaboration and contributions. All
four components are accessible using: https://linktr.ee/pt_pump_up
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要