Back to Search
Start Over
Descriptor collision and confusion: Toward the design of descriptors to mask chemical structures
- Source :
- Journal of Computer-Aided Molecular Design. 19:625-635
- Publication Year :
- 2005
- Publisher :
- Springer Science and Business Media LLC, 2005.
-
Abstract
- We examined "descriptor collision" for several chemical fingerprint systems (MDL 320, Daylight, SMDL), and for a 2D-based descriptor set. For large databases (ChemNavigator and WOMBAT), the smallest collision rate remains around 5%. We systematically increase the "descriptor collision" rate (here termed "descriptor confusion"), in order to design a set of "descriptors to mask chemical structures", DMCS. If effective, a DMCS system would not allow third parties to determine the original chemical structures used to derive the DMCS set (i.e., reverse engineering). Using SMDL keys, the "confusion" rate is increased to 45.6% by eliminating those keys that have a low frequency of occurrence in WOMBAT structures. We applied an automated PLS engine, WB-PLS [Olah et al., J. Comput. Aided Mol. Des., 18 (2004) 437], to 1277 series of structures from 948 targets in WOMBAT, in order to validate the biological relevance of the SMDL descriptors as a potential DMCS set. The "reduced set" of SMDL descriptors has a small loss of modeling power (around 20%) compared to the initial descriptor set, while the collision rate is significantly increased. These results indicate that the development of an effective DMCS is possible. If well documented, DMCS systems would encourage private sector data release (e.g., related to water solubility) and directly benefit public sector science.
- Subjects :
- Models, Molecular
Reverse engineering
Quantitative structure–activity relationship
Databases, Factual
Computer science
Chemistry, Pharmaceutical
Quantitative Structure-Activity Relationship
computer.software_genre
Set (abstract data type)
Drug Discovery
medicine
Computer Simulation
Relevance (information retrieval)
Physical and Theoretical Chemistry
Confusion
Molecular Structure
Series (mathematics)
business.industry
Fingerprint (computing)
Pattern recognition
Collision
Computer Science Applications
Models, Chemical
Artificial intelligence
Data mining
medicine.symptom
business
computer
Subjects
Details
- ISSN :
- 15734951 and 0920654X
- Volume :
- 19
- Database :
- OpenAIRE
- Journal :
- Journal of Computer-Aided Molecular Design
- Accession number :
- edsair.doi.dedup.....756407b5a97e46569061fed7b0233de7
- Full Text :
- https://doi.org/10.1007/s10822-005-9020-4