http://scholars.ntou.edu.tw/handle/123456789/17875
Title: | TOCP: A Dataset for Chinese Profanity Processing. | Authors: | Hsu Yang Chuan-Jie Lin |
Issue Date: | May-2020 | Publisher: | European Language Resources Association (ELRA) | Journal Volume: | Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying | Start page/Pages: | 6–12 | Abstract: | This paper introduced TOCP, a larger dataset of Chinese profanity. This dataset contains natural sentences collected from social media sites, the profane expressions appearing in the sentences, and their rephrasing suggestions which preserve their meanings in a less offensive way. We proposed several baseline systems using neural network models to test this benchmark. We trained embedding models on a profanity-related dataset and proposed several profanity-related features. Our baseline systems achieved an F1-score of 86.37% in profanity detection and an accuracy of 77.32% in profanity rephrasing. |
URI: | http://scholars.ntou.edu.tw/handle/123456789/17875 |
Appears in Collections: | 資訊工程學系 |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.