Back to Search
Start Over
Extension of Similarity Functions and their Application toChemical Informatics Problems
- Publication Year :
- 2018
-
Abstract
- Similarity is the most pervasive concept in chemoinformatics and it providesdirection for many of the problems which arise in that field. Similarity functionsare mathematical tools for quantifying the similarity of one molecule with respectto another molecule. In this work, we developed a method for the quantificationof the similarity of one molecule with respect to a set of molecules. This methodrequires a similarity function which is symmetric and positive definite. If thesimilarity function meets two additional mild requirements, namely if it is boundbetween zero and unity and is unity when evaluated on two identical molecules,then we say that the similarity function is extendable. In this case, the similarityof a molecule with respect to a set containing one molecule reduces to theoriginal similarity function evaluated on those two molecules. We additionallystated and proved several properties of the extension of similarity functions.We then applied the extension of similarity functions to two problems inchemoinformatics. First, we used the extension of similarity functions as thebasis for machine learning models for the prediction of various molecularendpoints. These machine learning models were compared to the kNN machinelearning model. For each endpoint predicted, the model based on the extensionof similarity functions was shown either comparable to or to be exceeding thekNN model. Second, we used the extension of similarity functions as the basisfor defining the domain of applicability of a machine learning model. We appliedthis definition to a kNN model and showed that using the extension of similarityfunctions can be used to order predictions for the rational selection of moleculesfor further testing. We showed how doing so can increase the overall usefulnessof a machine learning model.Finally, we stated several mathematical questions related to the extension ofsimilarity functions which, if answered, could aid in the training of machinelearning models based on the extension of similarity functions.
Details
- Language :
- English
- Database :
- OpenDissertations
- Publication Type :
- Dissertation/ Thesis
- Accession number :
- ddu.oai.etd.ohiolink.edu.osu1542299336598615