Empowering Large Language Models for Textual Data Augmentation
CoRR(2024)
Abstract
With the capabilities of understanding and executing natural language
instructions, Large language models (LLMs) can potentially act as a powerful
tool for textual data augmentation. However, the quality of augmented data
depends heavily on the augmentation instructions provided, and the
effectiveness can fluctuate across different downstream tasks. While manually
crafting and selecting instructions can offer some improvement, this approach
faces scalability and consistency issues in practice due to the diversity of
downstream tasks. In this work, we address these limitations by proposing a new
solution, which can automatically generate a large pool of augmentation
instructions and select the most suitable task-informed instructions, thereby
empowering LLMs to create high-quality augmented data for different downstream
tasks. Empirically, the proposed approach consistently generates augmented data
with better quality compared to non-LLM and LLM-based data augmentation
methods, leading to the best performance on 26 few-shot learning tasks sourced
from a wide range of application domains.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined