MULTI-LABEL CLASSIFICATION OF INDONESIAN ONLINE TOXICITY USING BERT AND ROBERTA

YOGA SAGAMA

Informasi Dasar

386 kali
23.04.3843
006.31
Karya Ilmiah - Skripsi (S1) - Reference

Online toxicity detection in Indonesian digital interactions poses a significant challenge due to the complexity and nuances of language. This study aims to evaluate the effectiveness of the BERT and RoBERTa language models, specifically IndoBERTweet, IndoBERT, and Indonesian RoBERTa, for identifying toxic content in Bahasa Indonesia. Our research methodology includes data collection, dataset pre-processing, data annotation, and model fine-tuning for multi-label classification tasks. The model performance is assessed using macro average of precision, recall, and F1-score. Our findings show that IndoBERTweet, fine-tuned under optimal hyperparameters (5e-5 learning rate, a batch size of 32, and three epochs), outperforms the other models with a precision of 0.85, recall of 0.94, and an F1-score of 0.89. These findings indicate that IndoBERTweet performs better in detecting and classifying online toxicity in Bahasa Indonesia. The study's implications extend to fostering a safer and healthier online environment for Indonesian users, while also providing a foundation for future research exploring additional models, hyperparameter optimizations, and techniques for enhancing toxicity detection and classification in the Indonesian language.

Subjek

Machine Learning
MACHINE ENGINEERING, CLASSIFICATION,

Katalog

MULTI-LABEL CLASSIFICATION OF INDONESIAN ONLINE TOXICITY USING BERT AND ROBERTA
 
 
Indonesia

Sirkulasi

Rp. 0
Rp. 0
Tidak

Pengarang

YOGA SAGAMA
Perorangan
Andry Alamsyah
 

Penerbit

Universitas Telkom, S1 Manajemen (Manajemen Bisnis Telekomunikasi & Informatika)
Bandung
2023

Koleksi

Kompetensi

  • SM855036 - SKRIPSI

Download / Flippingbook

 

Ulasan

Belum ada ulasan yang diberikan
anda harus sign-in untuk memberikan ulasan ke katalog ini