BALINESE AUTOMATIC TEXT SUMMARIZATION USING GENETIC ALGORITHM
Abstract
A summary contains the important idea of a text. However, summarizing a text requires one to read its entire content. In this study, text summarization is done automatically by applying a genetic algorithm to optimize the weight of five sentence features. The features include positive and negative keywords, the similarity between sentences and titles, the similarity between sentences, and cosine similarity. The collection of documents in this study are Balinese text stories. The summarization technique used is the extraction technique which eliminates unnecessary sentences, without changing the structure of the original sentence. The score of a sentence is generated by multiplying the feature value of each sentence by the weight of the feature. Summarization of the text is done by sorting the sentences based on the score. At the training stage, the best weight combination is chosen based on the average fitness value. Evaluation of the proposed method is carried out using 50 test data in the form of Balinese text stories. From the test results, it can be concluded that the fitness value of the feature weights is affected by the crossover and mutation rate of the genetic algorithm. Furthermore, accuracy is also influenced by the compression parameters used.
Downloads
References
J. S. Saputra and M. Fachrurrozi, “Peringkasan Teks Berita Berbahasa Indonesia Menggunakan Metode Latent Semantic Analysis (LSA) dan Teknik Steinberger & Jezek,” 2017.
M. Yousefi-Azar and L. Hamey, “Text summarization using unsupervised deep learning,” Expert Syst. Appl., vol. 68, pp. 93–105, 2017, doi: https://doi.org/10.1016/j.eswa.2016.10.017.
R. Nallapati, B. Xiang, B. Zhou, W. Question, A. Algorithms, and Y. Heights, “Sequence-To-Sequence Rnns For Text Summarization,” Iclr, pp. 4–7, 2016.
D. Anggraini and L. Wulandari, “Peringkasan Teks Artikel Ilmiah Berbahasa Indonesia Menggunakan Teknik Ekstraktif dan Fitur Kalimat Untuk Dokumen Tunggal,” Semin. Nas. Rekayasa Komput. dan Apl., pp. 126–130, 2015.
M. Y. Saputra, Jerry. Fachrurrozi, “Peringkasan Teks Berita Berbahasa Indonesia Menggunakan Metode Latent Semantic Analysis (LSA) dan Teknik Steinberger & Jezek,” Comput. Sci. ICT, vol. 3, no. 1, pp. 215–219, 2017.
M. Mustaqhfiri, Z. Abidin, and R. Kusumawati, “Peringkasan Teks Otomatis Berita Berbahasa Indonesia Menggunakan Metode Maximum Marginal Relevance,” Matics, 2012, doi: 10.18860/mat.v0i0.1578.
E. Y. Hidayat, F. Firdausillah, K. Hastuti, I. N. Dewi, and Azhari, “Automatic text summarization using latent drichlet allocation (LDA) for document clustering,” Int. J. Adv. Intell. Informatics, vol. 1, no. 3, pp. 132–139, 2015, doi: 10.26555/ijain.v1i3.43.
Aristoteles, “Penerapan Algoritma Genetika pada Peringkasan Teks Dokumen Bahasa Indonesia,” Semirata FMIPA Univ. Lampung, pp. 29–33, 2013, [Online]. Available: http://jurnal.fmipa.unila.ac.id/index.php/semirata/article/download/703/523.
K. Jezek and J. Steinberger, “Automatic Text Summarization (The state of the art 2007 and new challenges),” Proc. Znalosti, pp. 1–12, 2008.
I. P. G. H. Suputra, “Peringkasan teks otomatis untuk dokumen bahasa Bali berbasis metode ektraktif,” J. Ilm. Komput., vol. X, no. 1, pp. 1–6, 2017.
I. P. M. Wirayasa, I. M. A. Wirawan, and I. M. A. Pradnyana, “Algoritma Bastal: Adaptasi Algoritma Nazief & Adriani Untuk Stemming Teks Bahasa Bali,” J. Nas. Pendidik. Tek. Inform., vol. 8, no. 1, p. 60, 2019, doi: 10.23887/janapati.v8i1.13500.
I. G. A. P. Arimbawa and N. A. S. ER, “Lemmatization in Balinese Language,” JELIKU - J. Elektron. Ilmu Komput. Udayana, vol. 8, no. 3, pp. 235–242, 2020, [Online]. Available: https://ojs.unud.ac.id/index.php/JLK/article/view/51892.
E. S. Y. Pandie, “Sistem Informasi Pengambilan Keputusan Pengajuan Kredit Dengan Algoritma K-Nearest Neighbour (Studi Kasus: Koperasi Simpan Pinjam),” 2012.
D. R. Radev, E. Hovy, and K. McKeown, “Introduction to the Special Issue on Summarization,” Comput. Linguist., vol. 28, no. 4, pp. 399–408, 2002, doi: 10.1162/089120102762671927.