1. An Approach for Crawling Dynamic WebPages Based on Script Language Analysis
- Author
-
Zhang Yifei, Wang Da-ling, Feng Shi, Zhang Yao, and Leng Fangling
- Subjects
Ajax ,Information retrieval ,Computer science ,business.industry ,Static web page ,computer.software_genre ,World Wide Web ,Scripting language ,Server ,Web page ,Web application ,The Internet ,business ,Web crawler ,computer ,computer.programming_language - Abstract
Traditional Web crawlers use one or more URLs of the initial Webpages to extract new URLs continuously, and then access data of the pages. AJAX, as one of the core technologies of Web2.0, greatly enhances the response efficiency of Web applications, brings good user experience, and therefore has been widely used. However, due to the use of AJAX techniques shatters the architecture of traditional Web pages which is based on static pages, the traditional Web crawlers cannot meet the challenges of dynamic partial refresh and asynchronous loading. In this paper, we propose an efficient approach for the information in dynamic pages by analyzing script language, and use path repository and judge the page refreshing state to improve the accuracy and efficiency of the algorithm. Experimental evaluation shows the efficiency and effectiveness of our approach.
- Published
- 2012
- Full Text
- View/download PDF