1. Are BERT embeddings able to infer travel patterns from Twitter efficiently using a unigram approach?
- Author
-
Rosaldo J. F. Rossetti, Tania Fontes, and Francisco Murcos
- Subjects
Data source ,Working hours ,Crowdsensing ,Information retrieval ,business.industry ,Computer science ,Order (business) ,Social media ,Public opinion ,business - Abstract
Public opinion is nowadays a valuable data source for many sectors. In this study, we analysed the transportation sector using messages extracted from Twitter. Contrasting with the traditional surveying methods that are high-cost and inefficient used in transportation sector, social media are popular sources of crowdsensing. This work used BERT embeddings, an unsupervised pre-trained model released in 2018, to classify travel-related terms using tweets collected from three distinct cities: New York, London, and Melbourne. In order to understand if a simple model can have a good performance, we used unigrams. A list of 24 travel-related words was used to classify the messages. Popular words are train, walk, car, station, street, and avenue. Between 3% to 5% of all messages are classified as traffic-related, while along the typical working hours of the day the values is around 5-6%. A high model performance was obtained, with precision and accuracy higher than 0.80 and 0.90, respectively. The results are consistent for all the three cities assessed.
- Published
- 2021
- Full Text
- View/download PDF