METODE VECTOR SPACE MODEL UNTUK WEB SCRAPING PADA WEBSITE FREELANCE

  • Andi Nurkholis (1*) Universitas Teknokrat Indonesia
  • Yusra Fernando (2) Universitas Teknokrat Indonesia
  • Faris Arkans Ans (3) Universitas Teknokrat Indonesia

  • (*) Corresponding Author
Keywords: data visualization, freelance job, vector space model, web scraping

Abstract

Abstract— In digitalization era, internet is at the center of all lines of community activity, just like the field of work. Currently, many platforms provide job vacancies, especially for freelancers. To obtain this information, users usually need to open several websites to find information about suitable job vacancies. Web scraping offers solution to overcome these problems. Based on research that has been done, the BeautifulSoup and Selenium libraries will be used to collect data. To search for data, vector space model method is used to find the level of data similarity between the query and the document. In exploring data, the average near-perfect recall value is 98%, while the average precision value is 56%. This is because data search uses three parameters, so the possibility of retrieving irrelevant data is more significant if the document contains a word in the user's query, even though the context does not match. Utilizing the Streamlit framework in Python can display the data processing results and help users navigate the web scraping process, data processing, and data search. This study aims to implement the web scraping method to retrieve data from freelance websites: Freelance, Project, and Sribulancer. By applying the vector space model method, users can search data from several websites without opening freelance websites one by one. Using data visualization in the form of a web application using the Streamlit framework, the web scraping results can also be processed to be presented in a more helpful form and save the user's time

Downloads

Download data is not yet available.

References

Alfarizi, M. I., Syafaah, L., & Lestandy, M. (2022). Emotional Text Classification Using TF-IDF (Term Frequency-Inverse Document Frequency) And LSTM (Long Short-Term Memory). JUITA: Jurnal Informatika, 10(2), 225–232.

Anna, A., & Hendini, A. (2018). Implementasi vector space model pada sistem pencarian mesin karaoke. Evolusi : Jurnal Sains Dan Manajemen, 6(1). https://doi.org/10.31294/evolusi.v6i1.3535

Ayoobzadeh, M. (2022). Freelance job search during times of uncertainty: protean career orientation, career competencies and job search. Personnel Review, 51(1), 40–56.

Azis, M. A., Hamid, A., Fauzi, A., Yulianto, E., & Riyanto, V. (2019). Information retrieval system in text-based skripsi document search file using vector space model method. Journal of Physics: Conference Series, 1367(1), 012016.

Belwal, R. C., Rai, S., & Gupta, A. (2021). Text summarization using topic-based vector space model and semantic measure. Information Processing & Management, 58(3), 102536.

Eminagaoglu, M. (2022). A new similarity measure for vector space models in text classification and information retrieval. Journal of Information Science, 48(4), 463–476.

Firrezqi, W. A. (2020). Peran Situs Freelance Project. co. id Dalam Membantu Masalah Perekonomian di Indonesia. Analisis Peran Situs Freelance Project. Co. Id Dalam Membantu Masalah Perekonomian Di Indonesia, 2(2), 1–8.

Han, S., & Anderson, C. K. (2021). Web scraping for hospitality research: Overview, opportunities, and implications. Cornell Hospitality Quarterly, 62(1), 89–104.

Henrys, K. (2021). Importance of web scraping in e-commerce and e-marketing. Available at SSRN 3769593.

Humaini, I., Wulandari, L., Ikasari, D., & Yusnitasari, T. (2020). Penerapan Algoritma TF-IDF Vector Space Model (VSM) Pada Information Retrieval Terjemahan Al Quran Surat 1 Samai Dengan Surat 16 Berdasarkan Kesamaan Makna. Prosiding-Seminar Nasional Teknik Elektro UIN Sunan Gunung Djati Bandung, 525–534.

Isnain, A. R., Sulistiani, H., Hurohman, B. M., Nurkholis, A., & Styawati, S. (2022). Analisis Perbandingan Algoritma LSTM dan Naive Bayes untuk Analisis Sentimen. JEPIN (Jurnal Edukasi Dan Penelitian Informatika), 8(2), 299–303.

Julianto, Y., Setiabudi, D. H., & Rostianingsih, S. (2022). Analisis Sentimen Ulasan Restoran Menggunakan Metode Support Vector Machine. Jurnal Infra, 10(1), 1–7.

Kadam, S., Shinde, S., Sharma, A., Mali, S., & Student, B. E. (2018). Price comparison of computer parts using web scraping. Int. J. Eng. Sci.

Khder, M. A. (2021). Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application. International Journal of Advances in Soft Computing & Its Applications, 13(3).

Mustofa, M. (2018). Pekerja Lepas (Freelancer) dalam Dunia Bisnis. Jurnal MoZaiK, 10(1), 19–25.

Nurkholis, A., Alita, D., & Munandar, A. (2022). Comparison of Kernel Support Vector Machine Multi-Class in PPKM Sentiment Analysis on Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(2).

Pratama, S. E., Darmalaksana, W., Maylawati, D. S., Sugilar, H., Mantoro, T., & Ramdhani, M. A. (2020). Weighted inverse document frequency and vector space model for hadith search engine. Indones. J. Electr. Eng. Comput. Sci, 18(2), 1004–1014.

Ridwan, R., & Hermawan, T. A. (2019). Penerapan mesin pencari informasi dengan menggunakan metode Vector Space Model. JURNAL TEKNIK INFORMATIKA (JUTEKIN), 7(2).

Semendawai, J. N., Febiola, I., Pamungkas, B., & Ruliansyah, M. D. (2021). Perancangan Aplikasi Otomatisasi Menggunakan Bahasa Pemrograman Python Pada Aktivitas Monitoring Pemakaian Data Harian Kartu Internet Of Things. Jurnal Rekayasa Elektro Sriwijaya, 3(1), 193–198.

Sidorov, G., & Sidorov, G. (2019). Vector Space Model for Texts and the tf-idf Measure. Syntactic N-Grams in Computational Linguistics, 11–15.

Singh, R., & Singh, S. (2021). Text similarity measures in news articles by vector space model using NLP. Journal of The Institution of Engineers (India): Series B, 102, 329–338.

Styawati, S., Nurkholis, A., Abidin, Z., & Sulistiani, H. (2021). Optimasi Parameter Support Vector Machine Berbasis Algoritma Firefly Pada Data Opini Film. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(5), 904–910.

Thomas, D. M., & Mathur, S. (2019). Data analysis by web scraping using python. 2019 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA), 450–454.

Zen, B. P., Susanto, I., & Finaliamartha, D. (2021). TF-IDF Method and Vector Space Model Regarding the Covid-19 Vaccine on Online News. Sinkron: Jurnal Dan Penelitian Teknik Informatika, 6(1), 69–79.

Published
2023-08-02
How to Cite
Nurkholis, A., Fernando, Y., & Ans, F. (2023). METODE VECTOR SPACE MODEL UNTUK WEB SCRAPING PADA WEBSITE FREELANCE. INTI Nusa Mandiri, 18(1), 52 - 58. https://doi.org/10.33480/inti.v18i1.4266
Article Metrics

Abstract viewed = 172 times
PDF downloaded = 153 times