PERFORMANCE EVALUATION OF RECENT YOLO VERSIONS FOR CLASSROOM STUDENT BEHAVIOR DETECTION
DOI:
https://doi.org/10.33480/jitk.v11i4.7773Keywords:
Accuracy–Speed Trade-off, Classroom Behavior Detection, Computer Vision, Smart Classroom, YOLOAbstract
The increasing adoption of smart classroom systems underscores the need for automated, objective, and real-time monitoring of student behavior to support effective teaching and learning. Computer vision–based object detection, particularly the You Only Look Once (YOLO) family, has shown strong potential for this task. However, existing studies predominantly evaluate YOLO models in isolation or across different frameworks, resulting in biased comparisons. To address this gap, this study presents a controlled intra-family comparative evaluation of four recent YOLO generations YOLOv8, YOLOv10, YOLOv11, and YOLOv12 across three weight variants (nano, small, and medium), yielding 12 model configurations. All experiments were conducted under a uniform training pipeline and computing environment using an NVIDIA T4 GPU to ensure fair benchmarking. Model performance was assessed using Precision, Recall, F1-Score, mean Average Precision (mAP), inference speed (FPS), and computational complexity. The results reveal a consistent trade-off between detection accuracy and inference speed: YOLOv12m achieves the highest detection accuracy but the lowest FPS due to increased architectural complexity. At the same time, YOLOv10n offers the fastest inference at the cost of reduced reliability for subtle behaviors. Within the scope of the evaluated dataset and controlled classroom setting, YOLOv8s and YOLOv11s demonstrate the most balanced accuracy–speed performance, making them suitable candidates for real-time classroom monitoring under similar conditions. This study provides practical insights for researchers and developers by offering an objective benchmark and model-selection guidance tailored to smart classroom applications, while accounting for dataset and environmental constraints.
Downloads
References
[1] H. Chen and J. Guan, “Teacher–Student Behavior Recognition in Classroom Teaching Based on Improved YOLOv4 and Internet of Things Technology,” Electronics (Switzerland), vol. 11, no. 23, Dec. 2022, doi: 10.3390/electronics11233998.
[2] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th ed. Noida, India: Pearson India Education Services Pvt. Ltd., 2022.
[3] R. Szeliski, “Computer Vision: Algorithms and Applications 2nd Edition,” 2021. [Online]. Available: https://szeliski.org/Book,
[4] N. Tran, H. Nguyen, H. Luong, M. Nguyen, K. Luong, and H. Tran, “Recognition of student behavior through actions in the classroom,” IAENG Int. J. Comput. Sci., vol. 50, no. 3, pp. 1031–1041, 2023. [Online]. Available: https://www.iaeng.org/IJCS/issues_v50/issue_3/IJCS_50_3_26.pdf.
[5] P. D. Nguyen et al., “A new dataset and systematic evaluation of deep learning models for student activity recognition from classroom videos,” in 2022 International Conference on Multimedia Analysis and Pattern Recognition, MAPR 2022 Proceedings, Institute of Electrical and Electronics Engineers Inc., 2022. doi: 10.1109/MAPR56351.2022.9924673.
[6] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.
[7] A. Deshpande and K. Warhade, “SADY: Student Activity Detection Using YOLObased Deep Learning Approach,” vol. 13, no. 4, 2023, doi: 10.18517/ijaseit.13.4.18393.
[8] H. Das, H. K. Hira, M. Uddin, A. K. Roy, and A. Mahmud, “A Hybrid YOLOBased Approach for FineGrained Detection of Classroom Student Behaviors,” in 2024 27th International Conference on Computer and Information Technology, ICCIT 2024 Proceedings, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 2928–2933. doi: 10.1109/ICCIT64611.2024.11022537.
[9] W. Cao, P. Lu, and W. Cao, “Multimodal Gesture Recognition with SpatioTemporal Features Fusion Based on YOLOv5 and MediaPipe,” Intern J Pattern Recognit Artif Intell, vol. 38, no. 8, Jun. 2024, doi: 10.1142/S0218001424550073.
[10] Q. Jia and J. He, “Student Behavior Recognition in Classroom Based on Deep Learning,” Applied Sciences (Switzerland), vol. 14, no. 17, Sep. 2024, doi: 10.3390/app14177981.
[11] S. Yuan, X. Kong, and S. Zhang, “Research on Enhanced YOLOv8 Gesture Recognition Method for Complex Environments,” in Proceeding of the WRC Symposium on Advanced Robotics and Automation, WRC SARA, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 141–146. doi: 10.1109/WRCSARA64167.2024.10685785.
[12] L. Han, X. Ma, M. Dai, and L. Bai, “A WADYOLOv8based method for classroom student behavior detection,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: 10.1038/s4159802587661w.
[13] L. Zhou, X. Liu, X. Guan, and Y. Cheng, “CSSAYOLO: CrossScale Spatiotemporal Attention Network for FineGrained Behavior Recognition in Classroom Environments,” Sensors, vol. 25, no. 10, May 2025, doi: 10.3390/s25103132.
[14] J. Širmenis, “Research on Techniques for Automatic Recognition and Tracking of Basketball Shots from Video,” Master’s thesis, Kaunas University of Technology, 2025.
[15] B. Qin, H. Hu, and S. Du, “ACMYOLOv10: Research on Classroom Learning Behavior Recognition Algorithm Based on Improved YOLOv10,” IEEE Access, vol. 13, pp. 144863–144877, 2025, doi: 10.1109/ACCESS.2025.3599686.
[16] M. Rashid, J. Wang, S. Ahmed, and F. Ahmed, “Survey on DLBased Object Detection and Pose Estimation for HumanRobot Collaboration Manufacturing,” Jun. 2025. [Online]. Available: https://ssrn.com/abstract=5286783
[17] N. Jegham, C. Y. Koh, M. Abdelatti, and A. Hendawi, “YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions,” Mar. 2025, [Online]. Available: http://arxiv.org/abs/2411.00201
[18] Z. Sun and V. Y. Mariano, “SiTYOLOv9: An Efficient Algorithm for Learning Behavior Detection in the Home Environment,” Journal of Computational and Cognitive Engineering, vol. 4, no. 2, pp. 173–185, May 2025, doi: 10.47852/bonviewJCCE42023949.
[19] C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information,” in Computer Vision – ECCV 2024, A. Leonardis, A. Ricci, E. Roth, S. Russakovsky, J. Sattler, and G. Varol, Eds., Lecture Notes in Computer Science, vol. 15089. Cham, Switzerland: Springer, 2025, doi: 10.1007/978-3-031-72751-1_1.
[20] E. Kim, “YOLOv8 Nano vs YOLOv8 Large,” Medium, Aug. 2023. [Online]. Available: https://medium.com/@elvenkim1/yolov8nanovsyolov8large4f21324baa38. [Accessed: Oct. 29, 2025].
[21] M. Yaseen, “What is YOLOv9: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector,” arXiv preprint arXiv:2409.07813, 2024, doi: 10.48550/arXiv.2409.07813.
[22] A. Imran, M. S. Hulikal, and H. A. A. Gardi, “Real-time American Sign Language Detection Using Yolo-v9,” arXiv preprint arXiv:2407.17950, 2024, doi: 10.48550/arXiv.2407.17950.
[23] J. Terven and D. CordovaEsparza, “A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLONAS,” Feb. 2024, doi: 10.3390/make5040083.
[24] V. H. Le, “Selected hand gesture recognition model based on crossevaluation of deep learning from large RGB image datasets,” Multimed Tools Appl, vol. 84, no. 32, pp. 40009–40058, Sep. 2025, doi: 10.1007/s1104202520743z.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Mahendra Adiastoro, Febry Putra Rochim, Syahroni Hidayat

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






-a.jpg)
-b.jpg)











