Cyberbullying identification and classification using transformer models

Wachiraporn Tapaopong

Cyberbullying identification and classification using transformer models

2

4

Files

TH_Wachiraporn_T_2023.pdf (3.91 MB)

Issued Date

2023

Copyright Date

2023

Resource Type

Master Thesis

Language

eng

File Type

application/pdf

No. of Pages/File Size

xiii, 91 leaves : ill.

Access Rights

open access

Rights

ผลงานนี้เป็นลิขสิทธิ์ของมหาวิทยาลัยมหิดล ขอสงวนไว้สำหรับเพื่อการศึกษาเท่านั้น ต้องอ้างอิงแหล่งที่มา ห้ามดัดแปลงเนื้อหา และห้ามนำไปใช้เพื่อการค้า

Rights Holder(s)

Mahidol University

Bibliographic Citation

Thematic Paper (M.Sc. (Information Technology Management))--Mahidol University, 2023

Suggested Citation

Wachiraporn Tapaopong Cyberbullying identification and classification using transformer models. Thematic Paper (M.Sc. (Information Technology Management))--Mahidol University, 2023. Retrieved from: https://repository.li.mahidol.ac.th/handle/123456789/115340

Title

Cyberbullying identification and classification using transformer models

Alternative Title(s)

การระบุและการจำแนกการกลั่นแกล้งทางอินเทอร์เน็ตโดยใช้แบบจำลอง Transformer

Author(s)

Wachiraporn Tapaopong

Advisor(s)

Taweesak Samanchuen
Prush Sangangam

Abstract

The widespread use of social media platforms has made cyberbullying an increasingly prevalent issue, causing physical, emotional, and mental harm to victims. This study focuses on utilizing Natural Language Processing techniques to enhance the identification and analysis of cyberbullying in hazardous social media messages. To achieve this goal, we utilized the transfer learning and fine-tuning techniques with the Transformer models, specifically BERT, RoBERTa, ALBERT, DistilBERT, and ConvBERT, to compare the effectiveness of category prediction. The study tested two methods to evaluate the effectiveness of automated cyberbullying detection. The first involves six categories: religion, age, ethnicity, gender, not cyberbullying, and other types of cyberbullying. The second method did not include "other types of cyberbullying." The findings revealed that DistilBERT was the model with the best performance for multi-class text classification with the least amount of training time. This research emphasized the value of automated cyberbullying detection by using the potential of Natural Language Processing techniques to reduce the negative consequences of cyberbullying. Overall, this study contributed to the development of effective and efficient approaches for identifying and classifying cyberbullying in order to prevent and intervene in cyberbullying. Implication of thematic paper: The implications of this research are significant in terms of addressing the issue of cyberbullying, which is becoming increasingly prevalent due to the widespread use of social media platforms. The use of Transformer models for cyberbullying identification and classification has the potential to significantly improve the accuracy and efficiency of automated detection systems. These models can learn to recognize and categorize cyberbullying behaviors in a way that is more nuanced and accurate than traditional machine learning approaches by utilizing deep learning. This could help social media platforms more effectively identify and address instances of cyberbullying, ultimately leading to a more secure and encouraging online environment for all users. Furthermore, the use of Transformer models for cyberbullying detection could also have broader implications for the fields of natural language processing and machine learning, as it demonstrates the potential for these models to effectively analyze and understand complex, human-generated text data.
การใช้แพลตฟอร์มสื่อสังคมออนไลน์อย่างแพร่หลายทำให้การกลั่นแกล้งทางอินเทอร์เน็ตกลายเป็นปัญหาที่แพร่หลายมากขึ้น ก่อให้เกิดอันตรายทั้งทางร่างกาย อารมณ์ และจิตใจต่อผู้ที่ตกเป็นเหยื่อ การศึกษานี้มุ่งเน้นไปที่การใช้เทคนิคการประมวลผลภาษาธรรมชาติเพื่อปรับปรุงการระบุและวิเคราะห์การกลั่นแกล้งทางอินเทอร์เน็ตในข้อความสื่อสังคมออนไลน์ที่เป็นอันตราย เพื่อให้บรรลุเป้าหมายนี้ เราใช้เทคนิค Transfer learning และ Fine-tuning ด้วยแบบจำลอง Transformer ประกอบด้วย BERT, RoBERTa, ALBERT, DistilBERT และ ConvBERT เพื่อเปรียบเทียบประสิทธิภาพของการทำนายประเภทการกลั่นแกล้งทางอินเทอร์เน็ต การศึกษาได้ทดสอบสองวิธีในการประเมินประสิทธิภาพของการตรวจจับการกลั่นแกล้งทางอินเทอร์เน็ตแบบอัตโนมัติ วิธีแรกทดสอบหกประเภท ได้แก่ การกลั่นแกล้งทางอินเทอร์เน็ตด้านศาสนา, อายุ, เชื้อชาติ, เพศ, ไม่ใช่การกลั่นแกล้งทางอินเทอร์เน็ต และการกลั่นแกล้งทางอินเทอร์เน็ตประเภทอื่นๆ และการทดสอบที่สองไม่รวม "การกลั่นแกล้งทางอินเทอร์เน็ตประเภทอื่น" ผลการวิจัยพบว่า DistilBERT เป็นแบบจำลองที่มีประสิทธิภาพดีที่สุดสำหรับการจำแนกข้อความหลายประเภทโดยใช้เวลาฝึกอบรมน้อยที่สุด งานวิจัยนี้เน้นย้ำถึงคุณค่าของการตรวจจับการกลั่นแกล้งบนอินเทอร์เน็ตโดยอัตโนมัติโดยใช้ศักยภาพของเทคนิคการประมวลผลภาษาธรรมชาติ (Natural Language Processing) เพื่อลดผลกระทบด้านลบของการกลั่นแกล้งบนอินเทอร์เน็ต โดยรวมแล้ว การศึกษานี้มีส่วนช่วยในการพัฒนาแนวทางที่มีประสิทธิภาพและประสิทธิผลในการระบุและจำแนกการกลั่นแกล้งทางอินเทอร์เน็ต เพื่อป้องกันและแทรกแซงการกลั่นแกล้งทางอินเทอร์เน็ต การนำผลของสารนิพนธ์ไปใช้: วิจัยนี้มีความสำคัญในแง่ของการจัดการปัญหาการกลั่นแกล้งบนอินเทอร์เน็ต ซึ่งกำลังเป็นที่แพร่หลายมากขึ้นเนื่องจากการใช้สื่อสังคมออนไลน์อย่างกว้างขวาง การใช้แบบจำลอง Transformer สำหรับการระบุและการจำแนกการกลั่นแกล้งบนอินเทอร์เน็ตมีศักยภาพในการปรับปรุงความแม่นยำและประสิทธิภาพของระบบตรวจจับอัตโนมัติได้อย่างมาก แบบจำลองเหล่านี้สามารถเรียนรู้ที่จะจดจำและจัดหมวดหมู่พฤติกรรมการกลั่นแกล้งบนอินเทอร์เน็ตด้วยวิธีที่เหมาะสมและแม่นยำกว่าวิธีการเรียนรู้ด้วยเครื่องแบบดั้งเดิมโดยใช้การเรียนรู้เชิงลึก สิ่งนี้สามารถช่วยให้แพลตฟอร์มสื่อสังคมออนไลน์สามารถระบุและจัดการกับปัญหาของการกลั่นแกล้งทางอินเทอร์เน็ตได้อย่างมีประสิทธิภาพยิ่งขึ้น ท้ายที่สุดจะนำไปสู่สภาพแวดล้อมออนไลน์ที่ปลอดภัยและส่งเสริมสนับสนุนสำหรับผู้ใช้ทุกคน นอกจากนี้ การใช้แบบจำลอง Transformer สำหรับการตรวจจับการกลั่นแกล้งบนอินเทอร์เน็ตยังอาจมีความหมายที่กว้างขึ้นสำหรับสาขาการประมวลผลภาษาธรรมชาติและการเรียนรู้ของเครื่อง เนื่องจากแบบจำลองดังกล่าวแสดงให้เห็นถึงศักยภาพในการวิเคราะห์และทำความเข้าใจข้อมูลข้อความที่ซับซ้อนที่มนุษย์สร้างขึ้นได้อย่างมีประสิทธิภาพ

Degree Name

Master of Science

Degree Level

Master's degree

Degree Department

Faculty of Engineering

Degree Discipline

Information Technology Management

Degree Grantor(s)

Mahidol University

Keyword(s)

Cyberbullying -- Prevention -- Data processing.
Natural language processing (Computer science)
Machine learning -- Testing.
Social media -- Religious aspects.
M.Sc. (2023)
Information Technology Management (Mahidol University 2023)

URI

https://repository.li.mahidol.ac.th/handle/123456789/115340

Collections

Thesis and Thematic paper

Full item page

Send Feedback

	Office Hour: Monday-Friday 08.30-12.00 and 13.00-16.30 hrs.
	Phutthamonthon Sai 4 Rd. Salaya, Nakhon Pathom 73170, Thailand
	The office: +66 (2) 800 2680 ext.4306
	thipsuda.van@mahidol.ac.th
	https://repository.li.mahidol.ac.th