Database-Specific Keyword Frequency Analysis in Merged Web Log Data: A Preprocessing Method

Authors

  • Wan Hussain Wan Ishak Universiti Utara Malaysia
  • Nurul Farhana Ismail
  • Fadhilah Mat Yamin
  • Abdullah Husin

Keywords:

Data Preprocessing,, Web Log Analysis, Electronic Resources Module, Keyword Extraction, Database Usage Patterns

Abstract

This study investigates the complex intricacies of web log data within the Electronic Resources module of the Perpustakaan Sultanah Bahiyah (PSB) website at Universiti Utara Malaysia (UUM). Serving as a cornerstone of academic infrastructure, the Electronic Resources module acts as a vital gateway, seamlessly connecting the UUM academic community to a vast repository of scholarly information. To tackle challenges posed by the size and complexity of web log data, the research employs a meticulous preprocessing method, involving the restructuring of raw data, outlier cleaning, and user session identification, laying the foundation for a comprehensive analysis. The study further explores the identification of search keywords embedded in the log file, employing a systematic process that transforms data into a structured format. The subsequent extraction of databases and keywords yields intriguing findings, prominently highlighting IEEE and Serial Solution databases. The analysis of 19,146 keywords associated with 11 databases offers valuable insights into user behavior, preferences, and the overall effectiveness of the Electronic Resources module. The identification of frequent keywords not only provides analytical insights but also serves to accelerate users' search processes, reducing cognitive load and fostering a more efficient research experience. This research contributes to the optimization of user experiences and the ongoing refinement of digital library services, aligning them with the evolving needs of the academic community

References

R. Basu, W. M. Lim, A. Kumar, and S. Kumar, "Marketing analytics: The bridge between customer psychology and marketing decision-making," Psychology & Marketing, vol. 40, pp. 2588–2611, 2023, https://doi.org/10.1002/mar.21908.

C. Wu, K. Jenab, S. Khoury, and S. Moslehpourd, "A quality analysis of keyword searching in different search engines projects," Journal of Project Management, vol. 3, pp. 89–104, 2018.

F. M. Yamin, T. Ramayah, and W. H. W. Ishak, "Information Searching: The Impact of User Knowledge on User Search Behavior," Journal of Information and Knowledge Management, vol. 12, no. 3, p. 1350023, 2013.

U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "From data mining to knowledge discovery in databases," AI Mag., pp. 37–54, 1996.

K. K. Ibrahim and A. J. Obaid, "Web Mining Techniques and Technologies: A Landscape View," Journal of Physics: Conference Series, vol. 1879, no. 3, p. 032125, May 2021, https://dx.doi.org/10.1088/1742-6596/1879/3/032125.

P. Shah and H. B. Pandit, "A Review: Web Content Mining Techniques," in Data Engineering for Smart Systems, P. Nanda, V. K. Verma, S. Srivastava, R. K. Gupta, and A. P. Mazumdar, Eds. Springer, Singapore, 2022, vol. 238, https://doi.org/10.1007/978-981-16-2641-8_15.

M. Dhandi and R. K. Chakrawarti, "A comprehensive study of web usage mining," in Symposium on Colossal Data Analysis and Networking (CDAN), 2016, pp. 1-5, doi: 10.1109/CDAN.2016.7570889.

H. Gu, "Data mining in the application of e-commerce website," Adv. Intell. Syst. Comput., vol. 180 AISC, no. 8, pp. 493–497, 2013.

G. R. Bharamagoudar, S. G. Totad, and P. Reddy, "Literature Survey on Web Mining," IOSR Journal of Computer Engineering (IOSRJCE), vol. 5, no. 4, pp. 31-36, 2012.

K. Dharmarajan and M. A. Dorairangaswamy, "Current Literature Review - Web Mining," Elysium Journal of Engineering Research & Management, vol. 1, no. 1, pp. 38-42, 2014.

M. A. I. Aquil and W. H. W. Ishak, "Predicting Software Defects using Machine Learning Techniques," International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE), vol. 9, no. 4, pp. 6609-6616, EISSN: 2278-3091, 2020.

S. B. Thakare and S. Z. Gawali, "A effective and complete preprocessing for Web Usage Mining," Int. J. Comput. Sci. Eng., vol. 2, no. 3, pp. 848–851, 2014.

A. Deepa and P. Raajan, "An Efficient Preprocessing Methodology of Log File for Web Usage Mining," in Computer Science, 2015, pp. 13–16, 2015.

M. Jafari, F. S. Sabzchi, and A. J. Irani, "Applying Web Usage Mining Techniques to Design Effective Web Recommendation Systems: A Case Study," Advances in Computer Science:an International Journal, vol. 3, no. 2, pp. 78-90, 2014.

M. H. A. Elhiber and A. Abraham, "Access Patterns in Web Log Data: A Review," Journal of Network and Innovative Computing, vol. 1, pp. 348-355, 2013.

S. R. Aghabozorgi and T. Y. Wah, "Recommender Systems: Incremental Clustering on Web Log Data," in Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, 2009, pp. 812–818, https://doi.org/10.1145/1655925.1656073.

A. Alphy and S. Prabakaran, "Cluster optimization for improved web usage mining using ant nestmate approach," in Int. Conf. Recent Trends Inf. Technol. ICRTIT 2011, 2011, pp. 1271–1276.

A. B. M. R. Islam and T. Chung, "An Improved Frequent Pattern Tree Based Association Rule Mining Technique," in International Conference on Information Science and Applications, 2011, pp. 1-8, doi: 10.1109/ICISA.2011.5772412.

R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," in Proceedings of the 20th VLDB Conference, 1994, pp. 487-499.

R. Mishra and A. Choubey, "Discovery of frequent patterns from web log data by using Fp-Growth algorithm for web usage mining," International Journal of Advanced Research in Computer Science and Software Engineering, vol. 2, no. 9, pp. 311-318, 2012.

Analysis of Keywords by Databases

Downloads

Published

2024-02-29

How to Cite

Wan Ishak, W. H., Nurul Farhana Ismail, Fadhilah Mat Yamin, & Husin, A. (2024). Database-Specific Keyword Frequency Analysis in Merged Web Log Data: A Preprocessing Method. Data Science Insights, 2(1). Retrieved from https://citedness.com/index.php/jdsi/article/view/16