Sentiment Analysis on Indonesian-Sundanese Code-Mixed Data

HAJAROT NAJIHA

Informasi Dasar

71 kali
23.04.2524
006.35
Karya Ilmiah - Skripsi (S1) - Reference

In this work, we conduct sentiment analysis on Indonesian-Sundanese code-mixed tweets. Sundanese is one of Indonesia’s regional languages with over 42.000.000 speakers. We use a pre-trained language model, IndoBERT, to tackle the sentiment analysis task. Our evaluation result shows that the best accuracy is 81%. We analyze the errors and find that most mislabeled tweets are because the words on the wrongly predicted tweet contain many words from other labels. It is also possible that it happens since the sentence in the tweet is ambiguous, the words used in the tweet are unavailable in the training data set, or the use of abbreviated words in the tweet.

Subjek

NATURAL LANGUAGE PROCESSING
DATA ANALYSIS,

Katalog

Sentiment Analysis on Indonesian-Sundanese Code-Mixed Data
 
 
Indonesia

Sirkulasi

Rp. 0
Rp. 0
Tidak

Pengarang

HAJAROT NAJIHA
Perorangan
Ade Romadhony
 

Penerbit

Universitas Telkom, S1 Informatika
Bandung
2023

Koleksi

Kompetensi

  • CII4G3 - PEMROSESAN BAHASA ALAMI

Download / Flippingbook

 

Ulasan

Belum ada ulasan yang diberikan
anda harus sign-in untuk memberikan ulasan ke katalog ini