M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection
CoRR(2024)
Abstract
The advent of Large Language Models (LLMs) has brought an unprecedented surge
in machine-generated text (MGT) across diverse channels. This raises legitimate
concerns about its potential misuse and societal implications. The need to
identify and differentiate such content from genuine human-generated text is
critical in combating disinformation, preserving the integrity of education and
scientific fields, and maintaining trust in communication. In this work, we
address this problem by introducing a new benchmark involving multilingual,
multi-domain and multi-generator for MGT detection – M4GT-Bench. It is
collected for three task formulations: (1) mono-lingual and multi-lingual
binary MGT detection; (2) multi-way detection identifies which particular model
generates the text; and (3) human-machine mixed text detection, where a word
boundary delimiting MGT from human-written content should be determined. Human
evaluation for Task 2 shows less than random guess performance, demonstrating
the challenges to distinguish unique LLMs. Promising results always occur when
training and test data distribute within the same domain or generators.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined