Magnolia, Cindy (2023) PENANGANAN IMBALANCED DATASET PADA KLASIFIKASI KOMENTAR TWITTER TERHADAP PROGRAM KAMPUS MERDEKA. Other thesis, Universitas Amikom Purwokerto.
Text
COVER.pdf
Download (256kB)
COVER.pdf
Download (256kB)
Text
DAFTAR ISI.pdf
Download (164kB)
DAFTAR ISI.pdf
Download (164kB)
Text
ABSTRAK.pdf
Download (33kB)
ABSTRAK.pdf
Download (33kB)
Image
BAB I.pdf
Restricted to Registered users only
Download (163kB)
BAB I.pdf
Restricted to Registered users only
Download (163kB)
Image
BAB II.pdf
Restricted to Registered users only
Download (733kB)
BAB II.pdf
Restricted to Registered users only
Download (733kB)
Image
BAB III.pdf
Restricted to Registered users only
Download (189kB)
BAB III.pdf
Restricted to Registered users only
Download (189kB)
Image
BAB IV.pdf
Restricted to Registered users only
Download (496kB)
BAB IV.pdf
Restricted to Registered users only
Download (496kB)
Image
BAB V.pdf
Restricted to Registered users only
Download (34kB)
BAB V.pdf
Restricted to Registered users only
Download (34kB)
Image
DAFTAR PUSTAKA.pdf
Restricted to Registered users only
Download (408kB)
DAFTAR PUSTAKA.pdf
Restricted to Registered users only
Download (408kB)
Image
LAMPIRAN.pdf
Restricted to Registered users only
Download (1MB)
LAMPIRAN.pdf
Restricted to Registered users only
Download (1MB)
Abstract
Imbalanced dataset is a problem that is often found in classification. An imbalanced condition affects the level of accuracy of model predictions as happened in the classification of the Kampus Merdeka program comments. This research focuses on handling the imbalanced dataset to improve the performance of the classification of comments from Twitter. The methods used are Near Miss, SMOTE, ADASYN, and Random Combination Sampling. Performance evaluation was carried out using the Support Vector Machine (SVM) algorithm with a composition of the training and testing data at 70:30, 80:20, and 90:10. Through the tests carried out, the best results were obtained with a composition of 90:10. This is understandable because machines tend to learn and train with more data. The results obtained on the accuracy value, F1-Score, and the ROC-AUC curve show similar results. The highest results were obtained in the ADASYN method with an F1-Score of 0.9. While the lowest results were obtained in the Near Miss method with an F1-Score value of 0.68. This can be concluded that the dataset balancing method used has been implemented according to procedures and resulted in a satisfactory increase in model performance.
Item Type: | Thesis (Other) |
---|---|
Additional Information: | Dosen Pembimbing: Bagus Adhi Kusuma, S.T., M.Eng. dan Ade Nurhopipah, S.Si., M.Cs. |
Uncontrolled Keywords: | Imbalanced Dataset, Kampus Merdeka, Support Vector Machine, Twitter, Klasifikasi Teks |
Subjects: | T Technology > T Technology (General) |
Divisions: | Fakultas Ilmu Komputer > Informatika |
Depositing User: | UPT Perpustakaan Pusat Universitas Amikom Purwokerto |
Date Deposited: | 24 Jun 2023 03:00 |
Last Modified: | 24 Jun 2023 03:00 |
URI: | https://eprints.amikompurwokerto.ac.id/id/eprint/1618 |