Author: "Sushko, I." / Journal: journal of chemical information and modeling - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Sushko, I."' showing total 7 results

Start Over Author "Sushko, I." Journal journal of chemical information and modeling

7 results on '"Sushko, I."'

1. Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set

Author: Robert Körner, Gilles Marcou, Huanxiang Liu, Dragos Horvath, Roberto Todeschini, Phuong Dao, Xiaojun Yao, Douglas M. Young, Paola Gramatica, A. Varnek, A. Artemenko, Todd M. Martin, Anil Kumar Pandey, Farhad Hormozdiari, Eugene N. Muratov, Alexander Tropsha, Christophe Muller, Artem Cherkasov, Tomas Öberg, Katja Hansen, Lili Xi, Timon Schroeter, Pavel G. Polishchuk, Sergii Novotarskyi, Jiazhong Li, Volodymyr V. Prokopenko, Denis Fourches, Victor E. Kuz’min, Cenk Sahinalp, Igor I. Baskin, Klaus-Robert Müller, Igor V. Tetko, Iurii Sushko, Chimie de la matière complexe (CMC), Université de Strasbourg (UNISTRA)-Institut de Chimie du CNRS (INC)-Centre National de la Recherche Scientifique (CNRS), Sushko, I, Novotarskyi, S, Körner, R, Pandey, A, Cherkasov, A, Li, J, Gramatica, P, Hansen, K, Schroeter, T, Müller, K, Xi, L, Liu, H, Yao, X, Öberg, T, Hormozdiari, F, Dao, P, Sahinalp, C, Todeschini, R, Polishchuk, P, Artemenko, A, Kuz'Min, V, Martin, T, Young, D, Fourches, D, Tropsha, A, Baskin, I, Horbath, D, Marcou, G, Varnek, A, Prokopenko, V, and Tetko, I
Subjects: Quantitative structure–activity relationship, General Chemical Engineering, Quantitative Structure-Activity Relationship, Library and Information Sciences, computer.software_genre, 01 natural sciences, Standard deviation, Set (abstract data type), 03 medical and health sciences, CHIM/01 - CHIMICA ANALITICA, Similarity (network science), 030304 developmental biology, Mathematics, 0303 health sciences, Principal Component Analysis, QSAR, Mutagenicity Tests, mutagenicity, General Chemistry, Classification, 0104 chemical sciences, Computer Science Applications, Ames test, Data set, 010404 medicinal & biomolecular chemistry, Benchmarking, Test set, Metric (mathematics), Data mining, computer, Algorithm, [CHIM.CHEM]Chemical Sciences/Cheminformatics, Applicability domain
Abstract: The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .
Published: 2010

2. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection

Author: Tomas Öberg, Anil Kumar Pandey, Roberto Todeschini, Denis Fourches, Alexander Tropsha, Alexandre Varnek, Igor V. Tetko, Iurii Sushko, Ester Papa, Hao Zhu, Chimie de la matière complexe (CMC), Université de Strasbourg (UNISTRA)-Institut de Chimie du CNRS (INC)-Centre National de la Recherche Scientifique (CNRS), Tetko, I, Sushko, I, Pandey, A, Zhu, H, Tropsha, A, Papa, E, Oberg, T, Todeschini, R, Fourches, D, and Varnek, A
Subjects: Quantitative structure–activity relationship, Databases, Factual, QSAR, validation, applicability domain, variable selection, General Chemical Engineering, Normal Distribution, Quantitative Structure-Activity Relationship, Feature selection, Library and Information Sciences, Overfitting, 01 natural sciences, Models, Biological, Standard deviation, 03 medical and health sciences, CHIM/01 - CHIMICA ANALITICA, Predictive Value of Tests, Statistics, Toxicity Tests, Leverage (statistics), Animals, Computer Simulation, 030304 developmental biology, Statistical hypothesis testing, Mathematics, 0303 health sciences, Models, Statistical, Tetrahymena pyriformis, Reproducibility of Results, General Chemistry, 0104 chemical sciences, Computer Science Applications, 010404 medicinal & biomolecular chemistry, Test set, Environmental Pollutants, Biological system, [CHIM.CHEM]Chemical Sciences/Cheminformatics, Applicability domain
Abstract: The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site. © 2008 American Chemical Society.
Published: 2008

3. Development of dimethyl sulfoxide solubility models using 163,000 molecules: using a domain applicability metric to select more reliable predictions.

Author: Tetko IV, Novotarskyi S, Sushko I, Ivanov V, Petrenko AE, Dieden R, Lebon F, and Mathieu B
Subjects: Linear Models, Neural Networks, Computer, Reproducibility of Results, Solubility, Support Vector Machine, Artificial Intelligence, Databases, Pharmaceutical, Dimethyl Sulfoxide chemistry, Informatics methods
Abstract: The dimethyl sulfoxide (DMSO) solubility data from Enamine and two UCB pharma compound collections were analyzed using 8 different machine learning methods and 12 descriptor sets. The analyzed data sets were highly imbalanced with 1.7-5.8% nonsoluble compounds. The libraries' enrichment by soluble molecules from the set of 10% of the most reliable predictions was used to compare prediction performances of the methods. The highest accuracies were calculated using a C4.5 decision classification tree, random forest, and associative neural networks. The performances of the methods developed were estimated on individual data sets and their combinations. The developed models provided on average a 2-fold decrease of the number of nonsoluble compounds amid all compounds predicted as soluble in DMSO. However, a 4-9-fold enrichment was observed if only 10% of the most reliable predictions were considered. The structural features influencing compounds to be soluble or nonsoluble in DMSO were also determined. The best models developed with the publicly available Enamine data set are freely available online at http://ochem.eu/article/33409 .
Published: 2013
Full Text: View/download PDF

4. ToxAlerts: a Web server of structural alerts for toxic chemicals and compounds with potential adverse reactions.

Author: Sushko I, Salmina E, Potemkin VA, Poda G, and Tetko IV
Subjects: Drug Evaluation, Preclinical, Humans, Quantitative Structure-Activity Relationship, Databases, Chemical, Drug-Related Side Effects and Adverse Reactions, Internet, Pharmaceutical Preparations chemistry
Abstract: The article presents a Web-based platform for collecting and storing toxicological structural alerts from literature and for virtual screening of chemical libraries to flag potentially toxic chemicals and compounds that can cause adverse side effects. An alert is uniquely identified by a SMARTS template, a toxicological endpoint, and a publication where the alert was described. Additionally, the system allows storing complementary information such as name, comments, and mechanism of action, as well as other data. Most importantly, the platform can be easily used for fast virtual screening of large chemical datasets, focused libraries, or newly designed compounds against the toxicological alerts, providing a detailed profile of the chemicals grouped by structural alerts and endpoints. Such a facility can be used for decision making regarding whether a compound should be tested experimentally, validated with available QSAR models, or eliminated from consideration altogether. The alert-based screening can also be helpful for an easier interpretation of more complex QSAR models. The system is publicly accessible and tightly integrated with the Online Chemical Modeling Environment (OCHEM, http://ochem.eu). The system is open and expandable: any registered OCHEM user can introduce new alerts, browse, edit alerts introduced by other users, and virtually screen his/her data sets against all or selected alerts. The user sets being passed through the structural alerts can be used at OCHEM for other typical tasks: exporting in a wide variety of formats, development of QSAR models, additional filtering by other criteria, etc. The database already contains almost 600 structural alerts for such endpoints as mutagenicity, carcinogenicity, skin sensitization, compounds that undergo metabolic activation, and compounds that form reactive metabolites and, thus, can cause adverse reactions. The ToxAlerts platform is accessible on the Web at http://ochem.eu/alerts, and it is constantly growing.
Published: 2012
Full Text: View/download PDF

5. A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition.

Author: Novotarskyi S, Sushko I, Körner R, Pandey AK, and Tetko IV
Subjects: Enzyme Inhibitors chemistry, Humans, Molecular Conformation, Artificial Intelligence, Cytochrome P-450 CYP1A2 Inhibitors, Enzyme Inhibitors pharmacology, Quantitative Structure-Activity Relationship
Abstract: Prediction of CYP450 inhibition activity of small molecules poses an important task due to high risk of drug-drug interactions. CYP1A2 is an important member of CYP450 superfamily and accounts for 15% of total CYP450 presence in human liver. This article compares 80 in-silico QSAR models that were created by following the same procedure with different combinations of descriptors and machine learning methods. The training and test sets consist of 3745 and 3741 inhibitors and noninhibitors from PubChem BioAssay database. A heterogeneous external test set of 160 inhibitors was collected from literature. The studied descriptor sets involve E-state, Dragon and ISIDA SMF descriptors. Machine learning methods involve Associative Neural Networks (ASNN), K Nearest Neighbors (kNN), Random Tree (RT), C4.5 Tree (J48), and Support Vector Machines (SVM). The influence of descriptor selection on model accuracy was studied. The benefits of "bagging" modeling approach were shown. Applicability domain approach was successfully applied in this study and ways of increasing model accuracy through use of applicability domain measures were demonstrated as well as fragment-based model interpretation was performed. The most accurate models in this study achieved values of 83% and 68% correctly classified instances on the internal and external test sets, respectively. The applicability domain approach allowed increasing the prediction accuracy to 90% for 78% of the internal and 17% of the external test sets, respectively. The most accurate models are available online at http://ochem.eu/models/Q5747 .
Published: 2011
Full Text: View/download PDF

6. Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set.

Author: Sushko I, Novotarskyi S, Körner R, Pandey AK, Cherkasov A, Li J, Gramatica P, Hansen K, Schroeter T, Müller KR, Xi L, Liu H, Yao X, Öberg T, Hormozdiari F, Dao P, Sahinalp C, Todeschini R, Polishchuk P, Artemenko A, Kuz'min V, Martin TM, Young DM, Fourches D, Muratov E, Tropsha A, Baskin I, Horvath D, Marcou G, Muller C, Varnek A, Prokopenko VV, and Tetko IV
Subjects: Mutagenicity Tests standards, Principal Component Analysis, Benchmarking methods, Classification methods, Mutagenicity Tests methods, Quantitative Structure-Activity Relationship
Abstract: The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of "distance to model" (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .
Published: 2010
Full Text: View/download PDF

7. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.

Author: Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, and Varnek A
Subjects: Animals, Computer Simulation, Databases, Factual, Models, Statistical, Normal Distribution, Predictive Value of Tests, Reproducibility of Results, Environmental Pollutants chemistry, Environmental Pollutants toxicity, Models, Biological, Quantitative Structure-Activity Relationship, Tetrahymena pyriformis drug effects, Toxicity Tests standards
Abstract: The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site.
Published: 2008
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

7 results on '"Sushko, I."'

1. Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set

2. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection

3. Development of dimethyl sulfoxide solubility models using 163,000 molecules: using a domain applicability metric to select more reliable predictions.

4. ToxAlerts: a Web server of structural alerts for toxic chemicals and compounds with potential adverse reactions.

5. A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition.

6. Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set.

7. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

7 results on '"Sushko, I."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources