






Diterbitkan Oleh:
Lembaga Penelitian Pengabdian Masyarakat Universitas Nusa Mandiri
Creation is distributed below Lisensi Creative Commons Atribusi-NonKomersial 4.0 Internasional.
Neural network-based Information Retrieval (IR), particularly with Transformer models, has gained prominence in information search technology. However, the application of this technology in Indonesian, a low-resource language, remains limited. This study aims to compare the performance of the LSTM model and IndoBERT for IR tasks in Indonesian. The dataset consists of 5,000 query–document pairs collected via scraping from three Indonesian news portals: CNN Indonesia, Kompas, and Detik. Evaluation was performed using MAP, MRR, Precision@5, and Recall@5 metrics. The results show that IndoBERT outperforms LSTM in all metrics with a MAP of 0.82 and MRR of 0.84, while LSTM only reached a MAP of 0.63 and MRR of 0.65. These findings confirm that Transformer models like IndoBERT are more effective at capturing semantic relevance between queries and documents, even with limited datasets.
Aji, A. F., Winata, G. I., Koto, F., Cahyawijaya, S., Romadhony, A., Mahendra, R., Kurniawan, K., Moeljadi, D., Prasojo, R. E., Baldwin, T., Lau, J. H., & Ruder, S. (2022). One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/2022.acl-long.500
Campos, D., Marques, A., Nguyen, T., Kurtz, M., & Zhai, C. (2022). Sparse*BERT: Sparse Models are Robust. arXiv. https://doi.org/10.48550/arXiv.2205.12452
Chang, S., Ahn, G.-J., & Park, S. (2024). Improving Performance of Neural IR Models by Using a Keyword-Extraction-Based Weak-Supervision Method. IEEE Access, 12, 46851–46863. https://doi.org/10.1109/access.2024.3382190
Hernandez, J. A., & Colom, M. (2025). Reproducible research policies and software/data management in scientific computing journals: a survey, discussion, and perspectives. Frontiers in Computer Science, 6. https://doi.org/10.3389/fcomp.2024.1491823
Hambarde, K. A., & Proença, H. (2023). Information Retrieval: Recent Advances and Beyond. IEEE Access, 11, 76581–76604. https://doi.org/10.1109/access.2023.3295776
Kanumolu, G., Madasu, L., Surange, N., & Shrivastava, M. (2024). TeClass: A human-annotated relevance-based headline classification and generation dataset for Telugu. arXiv. https://arxiv.org/abs/2404.11349
Kurniawan, J. D., Parhusip, H. A., & Trihandaru, S. (2024). Predictive Performance Evaluation of ARIMA and Hybrid ARIMA-LSTM Models for Particulate Matter Concentration. Jurnal Online Informatika, 9(2), 259–268. https://doi.org/10.15575/join.v9i2.1318
Lassance, C., & Clinchant, S. (2022). An Efficiency Study for SPLADE Models. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2220–2226. https://doi.org/10.1145/3477495.3531833
Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P. S., & He, L. (2022). A Survey on Text Classification: From Traditional to Deep Learning. ACM Transactions on Intelligent Systems and Technology, 13(2), 1–41. https://doi.org/10.1145/3495162
Li, Y., Cai, H., Kong, R., Chen, X., Chen, J., Yang, J., Zhang, H., Li, J., Wu, J., Chen, Y., Qu, C., Kong, K., Ye, W., Su, L., Ma, X., Xia, L., Shi, D., Zhao, J., Xiong, H., Wang, S., & Yin, D. (2025). Towards AI search paradigm. arXiv. https://arxiv.org/abs/2506.17188
Nogueira, R., & Cho, K. (2023). Passage Re-ranking with BERT for Efficient Retrieval. Journal of Information Retrieval, 24(3), 215–230.
Sajun, A. R., Zualkernan, I., & Sankalpa, D. (2024). A Historical Survey of Advances in Transformer Architectures. Applied Sciences, 14(10), 4316. https://doi.org/10.3390/app14104316
Shi, S., Zhang, C., & Li, X. (2022). Learning Latent Representations for Retrieval Using Pre-trained BERT Models. Information Processing & Management, 59(4), 102896.
Siregar, A. M., Faisal, S., Fauzi, A., Indra, J., Masruriyah, A. F. N., & Pratama, A. R. (2024). Model machine learning for sentiment analysis of the presence of electric vehicle in Indonesia. BIS Information Technology and Computer Science, 1, V124022. https://doi.org/10.31603/bistycs.140
Suhartono, D., Majiid, M. R. N., & Fredyan, R. (2024). Towards automatic question generation using pre-trained model in academic field for Bahasa Indonesia. Education and Information Technologies, 29(16), 21295–21330. https://doi.org/10.1007/s10639-024-12717-9
Trabelsi, M., Chen, Z., Davison, B. D., & Heflin, J. (2021). Neural ranking models for document retrieval. Information Retrieval Journal, 24(6), 400–444. https://doi.org/10.1007/s10791-021-09398-0
Trisnawati, L., Samsudin, N. A. B., Khalid, S. K. B. A., Shaubari, E. F. B. A., -, S., & Indra, Z. (2025). An Ensemble Semantic Text Representation with Ontology and Query Expansion for Enhanced Indonesian Quranic Information Retrieval. International Journal of Advanced Computer Science and Applications, 16(1). https://doi.org/10.14569/ijacsa.2025.0160148
Wang, X., Macdonald, C., Tonellotto, N., & Ounis, I. (2023). Reproducibility, Replicability, and Insights into Dense Multi-Representation Retrieval Models: from ColBERT to Col*. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2552–2561. https://doi.org/10.1145/3539618.3591916
Zhan, J., Liu, J., Mao, Y., & Li, H. (2021). An Analysis of BERT for Passage Re-ranking. Information Retrieval Journal, 24(4), 343–367.
Copyright (c) 2025 Nendi Sunendar Sunendar, Irwansyah Saputra
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
An author who publishes in the Pilar Nusa Mandiri: Journal of Computing and Information System agrees to the following terms:
Diterbitkan Oleh:
Lembaga Penelitian Pengabdian Masyarakat Universitas Nusa Mandiri
Creation is distributed below Lisensi Creative Commons Atribusi-NonKomersial 4.0 Internasional.