POS Tagger Improvisation using HMM with the Addition of Foreign Word Labels on Telkom University News

WINKIE SETYONO

POS Tagger Improvisation using HMM with the Addition of Foreign Word Labels on Telkom University News

WINKIE SETYONO

Informasi Dasar

POS Tagger Improvisation using HMM with the Addition of Foreign Word Labels on Telkom University News

Dilihat

222 kali

No. Katalog

22.04.2283

Klasifikasi

006.35

Jenis katalog

Karya Ilmiah - Skripsi (S1) - Reference

Abstraksi

News is a medium of daily information usually obtained by the public. The news consists of a lot of information in it and is composed of sentence structures. Each language is unique with its own sentence structure, like Indonesian and other foreign languages. But nowadays, many media mix Indonesian with foreign languages, making the sentence structure different from Bahasa Indonesia. To classify these words, Part Of Speech Tagging needed to determine the class of words composed of sentences by learning from the Corpus of each language. The language structure can determine the results of tagging from the POS Tagger. If there are words that are not in the Corpus, it can reduce the accuracy of the POS Tagger. With the new sentence structure, POS Tagger requires a larger Corpus to learn, but the current corpus doesn’t cover it yet. We conducted to enhance the research results by adding data to Corpus with a different sentence structure from the Indonesian Language Corpus using sentences from online media. Added about 242 sentences with 7,043 tokens on Corpus focused on Foreign Word tags, which total 3819 tags. After some testing and scenarios, the results of the accuracy of POS Tagger show an accuracy of 94.7% using the Hidden Markov Model method with the F1-Score tag FW 78%.