1. A novel multi-threaded web crawling model
- Author
-
Jiang, Weijie.
- Subjects
Computer Science - Databases - Abstract
This paper proposes a novel model for web crawling suitable for large-scale web data acquisition. This model first divides web data into several sub-data, with each sub-data corresponding to a thread task. In each thread task, web crawling tasks are concurrently executed, and the crawled data are stored in a buffer queue, awaiting further parsing. The parsing process is also divided into several threads. By establishing the model and continuously conducting crawler tests, it is found that this model is significantly optimized compared to single-threaded approaches.
- Published
- 2024