Back to Search
Start Over
DETECTING SPATIAL PATTERNS OF NATURAL HAZARDS FROM THE WIKIPEDIA KNOWLEDGE BASE
- Source :
- ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol II-4/W2, Pp 87-93 (2015)
- Publication Year :
- 2015
- Publisher :
- Copernicus Publications, 2015.
-
Abstract
- The Wikipedia database is a data source of immense richness and variety. Included in this database are thousands of geotagged articles, including, for example, almost real-time updates on current and historic natural hazards. This includes usercontributed information about the location of natural hazards, the extent of the disasters, and many details relating to response, impact, and recovery. In this research, a computational framework is proposed to detect spatial patterns of natural hazards from the Wikipedia database by combining topic modeling methods with spatial analysis techniques. The computation is performed on the Neon Cluster, a high performance-computing cluster at the University of Iowa. This work uses wildfires as the exemplar hazard, but this framework is easily generalizable to other types of hazards, such as hurricanes or flooding. Latent Dirichlet Allocation (LDA) modeling is first employed to train the entire English Wikipedia dump, transforming the database dump into a 500-dimension topic model. Over 230,000 geo-tagged articles are then extracted from the Wikipedia database, spatially covering the contiguous United States. The geo-tagged articles are converted into an LDA topic space based on the topic model, with each article being represented as a weighted multidimension topic vector. By treating each article’s topic vector as an observed point in geographic space, a probability surface is calculated for each of the topics. In this work, Wikipedia articles about wildfires are extracted from the Wikipedia database, forming a wildfire corpus and creating a basis for the topic vector analysis. The spatial distribution of wildfire outbreaks in the US is estimated by calculating the weighted sum of the topic probability surfaces using a map algebra approach, and mapped using GIS. To provide an evaluation of the approach, the estimation is compared to wildfire hazard potential maps created by the USDA Forest service.
- Subjects :
- Hazard (logic)
Volunteered geographic information
Topic model
lcsh:Applied optics. Photonics
Database dump
Information retrieval
Computer science
Map algebra
business.industry
lcsh:T
lcsh:TA1501-1820
computer.software_genre
Latent Dirichlet allocation
lcsh:Technology
symbols.namesake
Knowledge base
lcsh:TA1-2040
Natural hazard
symbols
Data mining
business
lcsh:Engineering (General). Civil engineering (General)
computer
Subjects
Details
- Language :
- English
- ISSN :
- 21949050 and 21949042
- Database :
- OpenAIRE
- Journal :
- ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
- Accession number :
- edsair.doi.dedup.....37b0dbe4d643c05dc36ccbefa045c7fa