Leveraging fusion of sequence tagging models for toxic spans detection


The upsurge of prolific blogging and microblogging platforms enabled the abusers to spread negativity and threats greater than ever. Negative and hateful comments are averting users from sharing their opinion freely on social media platforms. It often breaks people’s confidence and causes extensive damage to their mental health. Hence, identifying these toxic contents and taking appropriate measures against them is crucial to preserve a safe environment on social media. Numerous state-of-the-art approaches classify the whole content as toxic or non-toxic, but they don’t distinguish the precise toxic portion from the whole content. Detecting the toxic portions is essential as it substantially aids to moderate the toxic contents through excluding the abusive parts. This paper describes our proposed approach to detect the toxic portions from text contents efficiently and accurately. We explore an ensemble of sequence labeling models including the word embedding-based Spark NLP NER (named entity recognition) deep learning model, spaCy NER model with custom toxic tags, and ALBERT NER model to identify the toxic spans. The NER-based models usually intend to capture the contextual attributes of phrases and spans that are essential for named entity recognition. As the toxic span detection task also requires us to apprehend the phrasal context for detecting toxic span, the similarity between these two tasks inspires us to exploit these NER models. Finally, we determine the final toxic spans using a prevalence-based fusion of the predictions generated by these models. The fusion strategy enables us to consolidate the diversity of these models for perceiving the phrasal context in all aspects. Experimental results achieved on the SemEval-2021 toxic spans detection dataset depict that our model meticulously captures the toxic fragment and achieves a competitive result among the other state-of-the-art methods.
Toxic span,Named entity recognition (NER),Spark NLP,Custom spaCy,ALBERT,Fusion technique
