Back to Search Start Over

ChemSchematicResolver: A Toolkit to Decode 2D Chemical Diagrams with Labels and R-Groups into Annotated Chemical Named Entities.

Authors :
Beard EJ
Cole JM
Source :
Journal of chemical information and modeling [J Chem Inf Model] 2020 Apr 27; Vol. 60 (4), pp. 2059-2072. Date of Electronic Publication: 2020 Apr 07.
Publication Year :
2020

Abstract

The number of journal articles in the scientific domain has grown to the point where it has become impossible for researchers to capitalize on all findings in their relevant discipline. Information is stored in these articles in a number of ways, including figures that describe important results. In organic chemistry, these figures often present chemical schematic diagrams that graphically define the structures of carbon-based compounds. These diagrams are intuitive for an expert to comprehend, but they are not designed for machines. This work presents ChemSchematicResolver, a software tool that can be used to identify chemical schematic diagrams within the figure of a document, resolve any R-group substituents within them, and convert the resulting diagrams to a machine-readable format in a high-throughput, autonomous fashion. The tool includes a new algorithm that is used to identify relevant diagrams and a mechanism that combines these data with contextual information from the rest of the document for the creation of highly relational databases. It includes support for a variety of general R-group structures, the first time this is available in any open-source chemical schematic diagram extraction tool. It is presented alongside a self-generated evaluation set, on which the most important assessment metric, precision, achieved 83-100% for all assessed areas. The ChemSchematicResolver tool is released under the MIT license and is available to download from www.chemschematicresolver.org.

Details

Language :
English
ISSN :
1549-960X
Volume :
60
Issue :
4
Database :
MEDLINE
Journal :
Journal of chemical information and modeling
Publication Type :
Academic Journal
Accession number :
32212690
Full Text :
https://doi.org/10.1021/acs.jcim.0c00042