Back to Search
Start Over
Optimizing content freshness of relations extracted from the web using keyword search
- Source :
- SIGMOD Conference
- Publication Year :
- 2010
- Publisher :
- ACM, 2010.
-
Abstract
- An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data accesses. As the data on the Web evolves, it is critical that the local copy be kept up-to-date. Data freshness is one of the most important data quality issues, and has been extensively studied for various applications including web crawling. However, web crawling is focused on obtaining as many raw web pages as possible. Our applications, on the other hand, are interested in specific content from specific data sources. Knowing the content or the semantics of the data enables us to differentiate data items based on their importance and volatility, which are key factors that impact the design of the data synchronization strategy. In this work, we formulate the concept of content freshness, and present a novel approach that maintains content freshness with least amount of web communication. Specifically, we assume data is accessible through a general keyword search interface, and we form keyword queries based on their selectivity, as well their contribution to content freshness of the local copy. Experiments show the effectiveness of our approach compared with several naive methods for keeping data fresh.
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
- Accession number :
- edsair.doi...........dd73d6f097bec8eda1327f0170105a6f
- Full Text :
- https://doi.org/10.1145/1807167.1807256