Chrome Extension
WeChat Mini Program
Use on ChatGLM

DeltaShield: Information Theory for Human- Trafficking Detection

ACM Transactions on Knowledge Discovery from Data(2023)

Cited 0|Views32
No score
Abstract
Given a million escort advertisements, how can we spot near-duplicates? Such micro-clusters of ads are usually signals of human trafficking (HT). How can we summarize them to convince law enforcement to act? Spotting micro-clusters of near-duplicate documents is useful in multiple, additional settings, including spam-bot detection in Twitter ads, plagiarism, and more. We present InfoShield , which makes the following contributions: practical , being scalable and effective on real data; parameter-free and principled , requiring no user-defined parameters; interpretable , finding a document to be the cluster representative, highlighting all the common phrases, and automatically detecting “slots” (i.e., phrases that differ in every document); and generalizable , beating or matching domain-specific methods in Twitter bot detection and HT detection, respectively, as well as being language independent. Interpretability is particularly important for the anti-HT domain, where law enforcement must visually inspect ads. Our experiments on real data show that InfoShield correctly identifies Twitter bots with an F1 score over 90% and detects HT ads with 84% precision. Moreover, it is scalable, requiring about 8 hours for 4 million documents on a stock laptop. Our incremental version, DeltaShield , allows for fast, incremental updates, with minor loss of accuracy.
More
Translated text
Key words
human- trafficking,information theory
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined