1. Combining data from multiple sources for urban travel mode choice modelling
- Author
-
Grzenda, Maciej, Luckner, Marcin, Zawieska, Jakub, and Wrona, Przemysław
- Subjects
Computer Science - Computers and Society ,Computer Science - Machine Learning - Abstract
Demand for sustainable mobility is particularly high in urban areas. Hence, there is a growing need to predict when people will decide to use different travel modes with an emphasis on environmentally friendly travel modes. As travel mode choice (TMC) is influenced by multiple factors, in a growing number of cases machine learning methods are used to predict travel mode choices given respondent and journey features. Typically, travel diaries are used to provide core relevant data. However, other features such as attributes of mode alternatives including, but not limited to travel times, and, in the case of public transport (PT), also walking distances have a major impact on whether a person decides to use a travel mode of interest. Hence, in this work, we propose an architecture of a software platform performing the data fusion combining data documenting journeys with the features calculated to summarise transport options available for these journeys, built environment and environmental factors such as weather conditions possibly influencing travel mode decisions. Furthermore, we propose various novel features, many of which we show to be among the most important for TMC prediction. We propose how stream processing engines and other Big Data systems can be used for their calculation. The data processed by the platform is used to develop machine learning models predicting travel mode choices. To validate the platform, we propose ablation studies investigating the importance of individual feature subsets calculated by it and their impact on the TMC models built with them. In our experiments, we combine survey data, GPS traces, weather and pollution time series, transport model data, and spatial data of the built environment. The growth in the accuracy of TMC models built with the additional features is up to 18.2% compared to the use of core survey data only., Comment: 34 pages, 5 figures
- Published
- 2024