PENANGANAN IMBALANCED DATASET PADA KLASIFIKASI KOMENTAR TWITTER TERHADAP PROGRAM KAMPUS MERDEKA

Magnolia, Cindy (2023) PENANGANAN IMBALANCED DATASET PADA KLASIFIKASI KOMENTAR TWITTER TERHADAP PROGRAM KAMPUS MERDEKA. Other thesis, Universitas Amikom Purwokerto.

[thumbnail of COVER.pdf] Text
COVER.pdf

Download (256kB)
[thumbnail of DAFTAR ISI.pdf] Text
DAFTAR ISI.pdf

Download (164kB)
[thumbnail of ABSTRAK.pdf] Text
ABSTRAK.pdf

Download (33kB)
[thumbnail of BAB I.pdf] Image
BAB I.pdf
Restricted to Registered users only

Download (163kB)
[thumbnail of BAB II.pdf] Image
BAB II.pdf
Restricted to Registered users only

Download (733kB)
[thumbnail of BAB III.pdf] Image
BAB III.pdf
Restricted to Registered users only

Download (189kB)
[thumbnail of BAB IV.pdf] Image
BAB IV.pdf
Restricted to Registered users only

Download (496kB)
[thumbnail of BAB V.pdf] Image
BAB V.pdf
Restricted to Registered users only

Download (34kB)
[thumbnail of DAFTAR PUSTAKA.pdf] Image
DAFTAR PUSTAKA.pdf
Restricted to Registered users only

Download (408kB)
[thumbnail of LAMPIRAN.pdf] Image
LAMPIRAN.pdf
Restricted to Registered users only

Download (1MB)

Abstract

Imbalanced dataset is a problem that is often found in classification. An imbalanced condition affects the level of accuracy of model predictions as happened in the classification of the Kampus Merdeka program comments. This research focuses on handling the imbalanced dataset to improve the performance of the classification of comments from Twitter. The methods used are Near Miss, SMOTE, ADASYN, and Random Combination Sampling. Performance evaluation was carried out using the Support Vector Machine (SVM) algorithm with a composition of the training and testing data at 70:30, 80:20, and 90:10. Through the tests carried out, the best results were obtained with a composition of 90:10. This is understandable because machines tend to learn and train with more data. The results obtained on the accuracy value, F1-Score, and the ROC-AUC curve show similar results. The highest results were obtained in the ADASYN method with an F1-Score of 0.9. While the lowest results were obtained in the Near Miss method with an F1-Score value of 0.68. This can be concluded that the dataset balancing method used has been implemented according to procedures and resulted in a satisfactory increase in model performance.
Item Type: Thesis (Other)
Additional Information: Dosen Pembimbing: Bagus Adhi Kusuma, S.T., M.Eng. dan Ade Nurhopipah, S.Si., M.Cs.
Uncontrolled Keywords: Imbalanced Dataset, Kampus Merdeka, Support Vector Machine, Twitter, Klasifikasi Teks
Subjects: T Technology > T Technology (General)
Divisions: Fakultas Ilmu Komputer > Informatika
Depositing User: UPT Perpustakaan Pusat Universitas Amikom Purwokerto
Date Deposited: 24 Jun 2023 03:00
Last Modified: 24 Jun 2023 03:00
URI: https://eprints.amikompurwokerto.ac.id/id/eprint/1618

Actions (login required)

View Item
View Item