1. Data Collection and Preprocessing in Web Usage Mining: Implementation and Analysis
- Author
-
Mohammed Ali Mohammed, Rula A. Hamid, and Reem Razzaq AbdulHussein
- Subjects
web usage mining ,access log file ,data collection ,data preprocessing ,Technology - Abstract
Data collection and data preprocessing are crucial stages in web usage mining, mainly because of the unstructured, diverse, and noisy nature of log data. During data collection, log file datasets are loaded and merged. Effective and comprehensive data preprocessing plays a vital role in ensuring the efficiency and scalability of algorithms used in the pattern discovery phase of web usage mining. This work aims to address these phases by introducing two innovative approaches. The first approach focuses on determining the device used for accessing the web, distinguishing between computers and mobile devices. The second approach aims to determine user sessions and complete paths by utilizing the referrer URL. The entire preprocessing pipeline has been implemented using the C# programming language, and the source code is available on GitHub at the following link: https://github.com/Mohammed91/Web-Usage-Mining.
- Published
- 2024
- Full Text
- View/download PDF