Authorship Attribution in Bangla Literature (AABL) via Transfer Learning using ULMFiT
ACM Transactions on Asian and Low-Resource Language Information Processing(2024)
摘要
Authorship Attribution is the task of creating an appropriate
characterization of text that captures the authors' writing style to identify
the original author of a given piece of text. With increased anonymity on the
internet, this task has become increasingly crucial in various security and
plagiarism detection fields. Despite significant advancements in other
languages such as English, Spanish, and Chinese, Bangla lacks comprehensive
research in this field due to its complex linguistic feature and sentence
structure. Moreover, existing systems are not scalable when the number of
author increases, and the performance drops for small number of samples per
author. In this paper, we propose the use of Average-Stochastic Gradient
Descent Weight-Dropped Long Short-Term Memory (AWD-LSTM) architecture and an
effective transfer learning approach that addresses the problem of complex
linguistic features extraction and scalability for authorship attribution in
Bangla Literature (AABL). We analyze the effect of different tokenization, such
as word, sub-word, and character level tokenization, and demonstrate the
effectiveness of these tokenizations in the proposed model. Moreover, we
introduce the publicly available Bangla Authorship Attribution Dataset of 16
authors (BAAD16) containing 17,966 sample texts and 13.4+ million words to
solve the standard dataset scarcity problem and release six variations of
pre-trained language models for use in any Bangla NLP downstream task. For
evaluation, we used our developed BAAD16 dataset as well as other publicly
available datasets. Empirically, our proposed model outperformed
state-of-the-art models and achieved 99.8
Furthermore, we showed that the proposed system scales much better even with an
increasing number of authors, and performance remains steady despite few
training samples.
更多查看译文
关键词
authorship attribution,bangla literature,aabl,transfer learning,ulmfit
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要