1. SuperMat: construction of a linked annotated dataset from superconductors-related publications
- Author
-
Pedro Baptista de Castro, Yoshihiko Takano, Kensei Terashima, Luca Foppiano, Miren Esparza Echevarria, Sae Dieb, Masashi Ishii, Akira Suzuki, Yan Meng, Azusa Uzuki, Suguru Iwasaki, Laurent Romary, National Institute for Materials Science (NIMS), Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH), Inria de Paris, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
Exploit ,Computer science ,media_common.quotation_subject ,text and data mining ,Materials informatics ,FOS: Physical sciences ,data structure ,Ontology (information science) ,superconductors ,01 natural sciences ,materials informatics ,Domain (software engineering) ,Superconductivity (cond-mat.supr-con) ,Databases ,03 medical and health sciences ,Annotation ,0103 physical sciences ,dataset ,Quality (business) ,ontology ,010306 general physics ,030304 developmental biology ,media_common ,0303 health sciences ,Information retrieval ,Condensed Matter - Superconductivity ,General Medicine ,Linked data ,Data structure ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,[PHYS.COND.CM-S]Physics [physics]/Condensed Matter [cond-mat]/Superconductivity [cond-mat.supr-con] ,tdm ,machine learning ,annotation ,annotation guidelines - Abstract
A growing number of papers are published in the area of superconducting materials science. However, novel text and data mining (TDM) processes are still needed to efficiently access and exploit this accumulated knowledge, paving the way towards data-driven materials design. Herein, we present SuperMat (Superconductor Materials), an annotated corpus of linked data derived from scientific publications on superconductors, which comprises 142 articles, 16052 entities, and 1398 links that are characterised into six categories: the names, classes, and properties of materials; links to their respective superconducting critical temperature (Tc); and parametric conditions such as applied pressure or measurement methods. The construction of SuperMat resulted from a fruitful collaboration between computer scientists and material scientists, and its high quality is ensured through validation by domain experts. The quality of the annotation guidelines was ensured by satisfactory Inter Annotator Agreement (IAA) between the annotators and the domain experts. SuperMat includes the dataset, annotation guidelines, and annotation support tools that use automatic suggestions to help minimise human errors., Luca Foppiano, Sae Dieb, Akira Suzuki, Pedro Baptista de Castro, Suguru Iwasaki, et al.. SuperMat: Construction of a linked annotated dataset from superconductors-related publications. 2021. ⟨hal-03101177v3⟩
- Published
- 2021
- Full Text
- View/download PDF