Towards Comprehensive Detection of Chinese Harmful Memes: Dataset and Detector

Abstract

Harmful memes have proliferated on the Chinese Internet, while research on detecting Chinese harmful memes significantly lags behind due to the absence of reliable datasets and effective detectors. To this end, we present the comprehensive detection of Chinese harmful memes. We introduce ToxiCN MM, the first Chinese harmful meme dataset, which consists of 12,000 samples with fine-grained annotations for meme types. Additionally, we propose a baseline detector, Multimodal Harmful Knowledge Enhancement (MHKE), designed to incorporate contextual information from meme content, thereby enhancing the model's understanding of Chinese memes. In the evaluation phase, we conduct extensive quantitative experiments and qualitative analyses on multiple baselines, including LLMs and our MHKE. Experimental results indicate that detecting Chinese harmful memes is challenging for existing models, while demonstrating the effectiveness of MHKE.

ToxiCN MM

ToxiCN MM contains 12,000 diverse samples collected from Chinese social platforms. In addition to the basic binary labels (i.e., harmful or non-harmful), we provide fine-grained annotations for harmful memes at two levels of granularity: harmful type and modality combination feature.
For the harmful type, we focus on both targeted harmful memes and those exhibiting potential toxicity without specific targets, including general offense, sexual innuendo, and dispirited culture. These memes are identified as the most common harmful types of memes on Chinese platforms based on the consensus of social psychology. Their harm to individuals and society has been widely discussed.
For the modality combination feature, we examine how harmful memes convey toxicity through the interplay of textual and visual elements, either combined or independently, including text-image fusion, harmful text, and harmful image.

MHKE Detector

For the detector development, we present a Multimodal Harmful Knowledge Enhancement (MHKE) detector, intuitively introducing the contextual information of meme content. We utilize the large language model (LLM) to capture the context of both the text and image of the meme, leveraging its extensive knowledge acquired through pre-training. This information is then integrated into a trainable detector as enhanced captions to improve the understanding of the meme.

Author Statement

We, the authors of ToxiCN MM, hereby declare that we take full responsibility for any infringement of rights that may arise from the use of this dataset. Our study aims to facilitate the comprehensive detection of Chinese harmful memes and raise researchers' attention to non-English memes. We believe the benefits of our proposed resources outweigh the associated risks. We strictly follow the data use agreements of each public online social platform. It is important to note that all data has been anonymized and does not include any personal information. The opinions and findings contained in the samples of our presented dataset should not be interpreted as representing the views expressed or implied by the authors.

BibTeX

@article{lu2024towards,
  title={Towards Comprehensive Detection of Chinese Harmful Memes},
  author={Lu, Junyu and Xu, Bo and Zhang, Xiaokun and Wang, Hongbo and Zhu, Haohao and Zhang, Dongyu and Yang, Liang and Lin, Hongfei},
  journal={arXiv preprint arXiv:2410.02378},
  year={2024}
}