51. Fuzzy Matching for Symptom Detection in Tweets: Application to Covid-19 During the First Wave of the Pandemic in France
- Author
-
Stéphane Schück, Anita Burgun, P. Foulquié, Xiaoyi Chen, Carole Faviez, Sophie Quennelle, Sandrine Katsahian, Nathalie Texier, Adel Mebarki, Health data- and model- driven Knowledge Acquisition (HeKA), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche des Cordeliers (CRC (UMR_S_1138 / U1138)), École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université Paris Cité (UPCité)-École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université Paris Cité (UPCité), Centre de Recherche des Cordeliers (CRC (UMR_S_1138 / U1138)), Kap Code, Hôpital Européen Georges Pompidou [APHP] (HEGP), Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Hôpitaux Universitaires Paris Ouest - Hôpitaux Universitaires Île de France Ouest (HUPO), École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université de Paris (UP)-École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université de Paris (UP), Faviez, Carole, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Sorbonne Université (SU)-Université Paris Cité (UPCité)-École pratique des hautes études (EPHE)
- Subjects
2019-20 coronavirus outbreak ,Coronavirus disease 2019 (COVID-19) ,Computer science ,business.industry ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,Fuzzy matching ,[INFO] Computer Science [cs] ,Approximate string matching ,computer.software_genre ,Social media ,[SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie ,Pandemic ,Symptoms ,[INFO]Computer Science [cs] ,[SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie ,Artificial intelligence ,business ,Covid-19 ,computer ,Natural language processing ,Content analysis - Abstract
International audience; The exhaustive automatic detection of symptoms in social media posts is made difficult by the presence of colloquial expressions, misspellings and inflected forms of words. The detection of self-reported symptoms is of major importance for emergent diseases like the Covid-19. In this study, we aimed to (1) develop an algorithm based on fuzzy matching to detect symptoms in tweets, (2) establish a comprehensive list of Covid-19-related symptoms and (3) evaluate the fuzzy matching for Covid-19-related symptom detection in French tweets. The Covid-19-related symptom list was built based on the aggregation of different data sources. French Covid-19-related tweets were automatically extracted using a dedicated data broker during the first wave of the pandemic in France. The fuzzy matching parameters were finetuned using all symptoms from MedDRA and then evaluated on a subset of 5000 Covid-19-related tweets in French for the detection of symptoms from our Covid-19-related list. The fuzzy matching improved the detection by the addition of 42% more correct matches with an 81% precision.
- Published
- 2021