1. QSPR studies on normal boiling points and molar refractivities of organic compounds by correlation-ranking-based PCR and PC-ANN analyses of new topological indices
- Author
-
Ghavami, Raouf, Najafi, Amir, and Hemmateenejad, Bahram
- Subjects
Transition temperature -- Research ,Principal components analysis -- Research ,Eigenvalues -- Research ,Chemistry ,Research - Abstract
The new topological indices (Sh indices) based on the distance sum and connectivity of a molecular graph, previously developed by our team, were extended to predict the two physicochemical properties, including normal boiling point (NBP) and molar refractivity (MR), of a large set of organic compounds consisting of alkanes, alkenes, ethers, amines, alcohols, alkylbenzenes, and alkylhalides. The sets of molecular descriptors were derived directly from the two-dimensional molecular structure of the compounds based on graph theory. Both linear and nonlinear modelings were implemented by using principal component regression (PCR) and principal component -- artificial neural network (PC--ANN) with back-propagation learning algorithm, respectively. Eigenvalue and correlation-ranking procedures were used to rank the principal components and entered them into the models. Principal component analysis of Sh data matrix showed that the respective six and seven PCs could explain 97.49% and 99.22% of variances in the Sh indices. PCR analysis of the NBP and MR data demonstrated that the proposed Sh indices could explain about 97.52% and 99.52% of variations, while the variations explained by the PC--ANN modeling were more than 99.00% and 99.82%, respectively. The predictive ability of the models were evaluated using an external test set for NBP and MR of the molecules with the respective root-mean-square errors lower than 9.69 K and 0.660 [cm.sup.3] [mol.sup.-1] for the linear model and 6.17 K and 0.416 [cm.sup.3] [mol.sup.-1] for the nonlinear model. Key words: topological Sh indices, eigenvalue ranking, correlation ranking, normal boiling point, molar refractivity, principal component analysis, principal component regression (PCR), principal component-artificial neural network (PC-ANN). Les nouveaux indices topologiques (indices Sh) bases sur la somme de la distance et la connectivite d'un graphe moleculaire et developpes anterieurement par notre equipe ont ete etendus afin de pouvoir predire deux proprietes physicochimiques, dont le point d'ebullition normal (PEN) et la refractivite molaire (RM) d'un large ensemble de composes organiques formes d'alcanes, d'alcenes, d'amines, d'alkylbenzenes et d'halogenures d'alkyle. Les ensembles de descripteurs moleculaires ont ete derives directement de la structure moleculaire bidimensionnelle des composes basee sur la theorie des graphes. On a developpe des modeles lineaires et non lineaires en utilisant l'algorithme d'apprentissage en retropropagation respectivement avec une regression du composant principal (RCP) et avec le reseau neural artificiel du composant principal (RNA--CP). On a fait appel aux methodes de la valeur de eigen et du classement des correlations pour classer les composants principaux qui ont ensuite ete introduits dans les modeles. L'analyse du composant principal d'une matrice de donnees Sh a permis de montrer que respectivement six ou sept composants principaux (CP) peuvent expliquer 97,49 % et 99,22 % des variances dans les indices Sh. L'analyse de la RCP des donnees relatives aux PEN et aux RM a permis de demontrer que les indices Sh peuvent expliquer environ 97,52 % et 99,52 % des variations alors que les variations expliquees par me modele RNA-CP expliquent respectivement 99,00 % et 99,82 % des variations. On a evalue l'habilite des modeles a faire des predictions relatives aux PEN et aux RM des molecules en utilisant un ensemble externe et les erreurs quadratiques moyennes respectives etaient inferieures reseau neural artificiel du composant principal (RNA--CP) a 9,69 K et 0,660 [cm.sup.3] [m.sup.-1] pour le modele lineaire et de 6,17 K et 0,416 [cm.sup.3] [m.sup.-1] pour le modele non lineaire. Mots-cles : indices SH topologiques, classement de valeurs de eigen, classement de correlation, point d'ebullition normal (PEN), refractivite moleculaire (RM), analyse du composant principal, regression du composant principal (RCP), reseau neural artificiel du composant principal (RNA-CP). [Traduit par la Redaction], Introduction One of the most fundamental ideas of chemistry is that the physicochemical properties, for instance, normal boiling point and molar refractivity, for chemical and chemical engineering substances are determined [...]
- Published
- 2009
- Full Text
- View/download PDF