ABSTRAKSI: Di dunia internet, informasi selalu bertambah jumlah dan ragamnya sehingga kurang terkelola dan terorganisir yang mengakibatkan sulit dalam pengaksesan dan tidak sedikit informasi penting yang terbuang. Dan salah satu contoh kasusnya adalah informasi dalam bentuk artikel berita berbahasa Indonesia. Melalui proses text mining dengan metode kategorisasi, kumpulan artikel yang ada dapat terorganisir dengan baik melalui pengelompokkan berdasar topik dari setiap isinya sehingga user mudah mendapatkan informasi yang diinginkan.
RIPPER merupakan metode kategorisasi yang didasarkan pada konteks, dimana konteks merupakan kehadiran ataupun ketidakhadiran suatu kata dengan kata lain. Tugas akhir ini mengimplementasikan RIPPER pada kumpulan artikel berita berbahasa Indonesia dari beberapa sumber berita yang berasal dari web dan bersifat offline. Untuk menganalisa performansi dari RIPPER, penulis membandingkan dengan Naive Bayes yang merupakan metode klasifikasi yang nonkonteks pada data bersih dan noise.
Berdasar hasil pengujian secara keseluruhan RIPPER memiliki performansi yang lebih rendah dari Naive Bayes dan lebih lambat dalam pembangunan model dalam data bersih ataupun noise. Hal tersebut menyatakan bahwa dengan jumlah data yang digunakan, metode klasifikasi yang memperhatikan konteks kurang tepat dalam mengklasifikasian dokumen dibanding dengan yang tidak memperhatikan konteks. Dan dikarenakan memperhatikan konteks dari setiap isi dokumen sehingga waktu pembangunan rule metode klasifikasi context-sensitive lebih lama dibandingan yang insensitive.
Kata Kunci : text mining, metode kategorisasi, context-sensitive, insensitive, RIPPER, Naive BayesABSTRACT: In the world of internet, the amount of information and its variety are always continuously increased. Therefore, it is resulted to become unclassified and unorganized that could make number of important information becoming left out and the user would have difficulty to access his or her desired information. The most obvious example is the case information in the form of article on Indonesian based language. Through process of text mining with the method of categorization, the selected number of articles could be well organized through classification based on the topic and its content so that user can access his or her desired information easily.
RIPPER is the method of categorization that is based on contexts; these contexts are the relationship between the existence and non-existence of one word with another word. This final project would implement RIPPER in the selected number of articles in Indonesian based language from some of news source that come from web and offline based. In order to analyze the performance of RIPPER, the author correlated it with Naïve Bayes that has method of categorization based on the non-contexts in the clean and noise data.
Based on the overall test result, it showed that RIPPER has weaker performance than Naïve Bayes and slower in the development model on the clean and noise data. This was suggested that the use of classification method based on contexts was less effective to classify document than the use of classification method based on non-contexts. Moreover, since the project was looking at the contexts from its content, then time in developing classification method of contexts-sensitive was longer than insensitive.
Keyword: text mining, categorization method, context-sensitive, insensitive, RIPPER, Naïve Bayes