COMPARATIVE PERFORMANCE OF TRANSFORMER AND LSTM MODELS FOR INDONESIAN INFORMATION RETRIEVAL WITH INDOBERT

Nendi Sunendar Sunendar; Irwansyah Saputra

doi:10.33480/pilar.v21i2.6920

Penulis

Nendi Sunendar Sunendar Universitas Nusa Mandiri
Irwansyah Saputra Universitas Nusa Mandiri

DOI:

https://doi.org/10.33480/pilar.v21i2.6920

Kata Kunci:

information retrieval, IndoBERT, LSTM, neural network, retrieval

Abstrak

Neural network-based Information Retrieval (IR), particularly with Transformer models, has gained prominence in information search technology. However, the application of this technology in Indonesian, a low-resource language, remains limited. This study aims to compare the performance of the LSTM model and IndoBERT for IR tasks in Indonesian. The dataset consists of 5,000 query–document pairs collected via scraping from three Indonesian news portals: CNN Indonesia, Kompas, and Detik. Evaluation was performed using MAP, MRR, Precision@5, and Recall@5 metrics. The results show that IndoBERT outperforms LSTM in all metrics with a MAP of 0.82 and MRR of 0.84, while LSTM only reached a MAP of 0.63 and MRR of 0.65. These findings confirm that Transformer models like IndoBERT are more effective at capturing semantic relevance between queries and documents, even with limited datasets.

Unduhan

Data unduhan belum tersedia.

Referensi

Aji, A. F., Winata, G. I., Koto, F., Cahyawijaya, S., Romadhony, A., Mahendra, R., Kurniawan, K., Moeljadi, D., Prasojo, R. E., Baldwin, T., Lau, J. H., & Ruder, S. (2022). One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/2022.acl-long.500

Campos, D., Marques, A., Nguyen, T., Kurtz, M., & Zhai, C. (2022). Sparse*BERT: Sparse Models are Robust. arXiv. https://doi.org/10.48550/arXiv.2205.12452

Chang, S., Ahn, G.-J., & Park, S. (2024). Improving Performance of Neural IR Models by Using a Keyword-Extraction-Based Weak-Supervision Method. IEEE Access, 12, 46851–46863. https://doi.org/10.1109/access.2024.3382190

Hernandez, J. A., & Colom, M. (2025). Reproducible research policies and software/data management in scientific computing journals: a survey, discussion, and perspectives. Frontiers in Computer Science, 6. https://doi.org/10.3389/fcomp.2024.1491823

Hambarde, K. A., & Proença, H. (2023). Information Retrieval: Recent Advances and Beyond. IEEE Access, 11, 76581–76604. https://doi.org/10.1109/access.2023.3295776

Kanumolu, G., Madasu, L., Surange, N., & Shrivastava, M. (2024). TeClass: A human-annotated relevance-based headline classification and generation dataset for Telugu. arXiv. https://arxiv.org/abs/2404.11349

Kurniawan, J. D., Parhusip, H. A., & Trihandaru, S. (2024). Predictive Performance Evaluation of ARIMA and Hybrid ARIMA-LSTM Models for Particulate Matter Concentration. Jurnal Online Informatika, 9(2), 259–268. https://doi.org/10.15575/join.v9i2.1318

Lassance, C., & Clinchant, S. (2022). An Efficiency Study for SPLADE Models. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2220–2226. https://doi.org/10.1145/3477495.3531833

Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P. S., & He, L. (2022). A Survey on Text Classification: From Traditional to Deep Learning. ACM Transactions on Intelligent Systems and Technology, 13(2), 1–41. https://doi.org/10.1145/3495162

Li, Y., Cai, H., Kong, R., Chen, X., Chen, J., Yang, J., Zhang, H., Li, J., Wu, J., Chen, Y., Qu, C., Kong, K., Ye, W., Su, L., Ma, X., Xia, L., Shi, D., Zhao, J., Xiong, H., Wang, S., & Yin, D. (2025). Towards AI search paradigm. arXiv. https://arxiv.org/abs/2506.17188

Nogueira, R., & Cho, K. (2023). Passage Re-ranking with BERT for Efficient Retrieval. Journal of Information Retrieval, 24(3), 215–230.

Sajun, A. R., Zualkernan, I., & Sankalpa, D. (2024). A Historical Survey of Advances in Transformer Architectures. Applied Sciences, 14(10), 4316. https://doi.org/10.3390/app14104316

Shi, S., Zhang, C., & Li, X. (2022). Learning Latent Representations for Retrieval Using Pre-trained BERT Models. Information Processing & Management, 59(4), 102896.

Siregar, A. M., Faisal, S., Fauzi, A., Indra, J., Masruriyah, A. F. N., & Pratama, A. R. (2024). Model machine learning for sentiment analysis of the presence of electric vehicle in Indonesia. BIS Information Technology and Computer Science, 1, V124022. https://doi.org/10.31603/bistycs.140

Suhartono, D., Majiid, M. R. N., & Fredyan, R. (2024). Towards automatic question generation using pre-trained model in academic field for Bahasa Indonesia. Education and Information Technologies, 29(16), 21295–21330. https://doi.org/10.1007/s10639-024-12717-9

Trabelsi, M., Chen, Z., Davison, B. D., & Heflin, J. (2021). Neural ranking models for document retrieval. Information Retrieval Journal, 24(6), 400–444. https://doi.org/10.1007/s10791-021-09398-0

Trisnawati, L., Samsudin, N. A. B., Khalid, S. K. B. A., Shaubari, E. F. B. A., -, S., & Indra, Z. (2025). An Ensemble Semantic Text Representation with Ontology and Query Expansion for Enhanced Indonesian Quranic Information Retrieval. International Journal of Advanced Computer Science and Applications, 16(1). https://doi.org/10.14569/ijacsa.2025.0160148

Wang, X., Macdonald, C., Tonellotto, N., & Ounis, I. (2023). Reproducibility, Replicability, and Insights into Dense Multi-Representation Retrieval Models: from ColBERT to Col*. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2552–2561. https://doi.org/10.1145/3539618.3591916

Zhan, J., Liu, J., Mao, Y., & Li, H. (2021). An Analysis of BERT for Passage Re-ranking. Information Retrieval Journal, 24(4), 343–367.