24 results on '"Chemical compounds"'
Search Results
2. Degree-Based Topological Invariants of Metal-Organic Networks
- Author
-
Gang Hong, Zhen Gu, Muhammad Javaid, Hafiz Muhammad Awais, and Muhammad Kamran Siddiqui
- Subjects
Topological indices ,chemical compounds ,metals-organic networks ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Metal-organic networks (MONs) is a family of chemical compounds consisting of clusters or metal ions and organic ligands. These are studied as one, two or three dimensional structures of porous materials and subclasses of coordination polymers. MONs are mostly used in catalysis for the separation & purification of gases and as conducting solids or super-capacitors. In some situations, these networks are found to be stable in the process of removal or solvent of the guest molecules and could be restored with some other chemical compounds. The physical stability and mechanical properties of these networks have become a topic of great interest due to the aforesaid characteristics. Topological indices (TIs) are numeric quantities that are used to forecast the natural relationships among the physico-chemical characteristics of the chemical compounds in their fundamental network. During the studies of the MONs, TIs show an essential role in the theoretical & environmental chemistry and pharmacology. In this paper, we compute various latest developed degree-based TIs for two different metal-organic networks with increasing number of layers consisting on both metal and organic ligands vertices as well. A comparison among the computed different versions of the TIs with the help of the numerical values and their graphs is also included.
- Published
- 2020
- Full Text
- View/download PDF
3. Tree++: Truncated Tree Based Graph Kernels.
- Author
-
Ye, Wei, Wang, Zhen, Redberg, Rachel, and Singh, Ambuj
- Subjects
- *
TREE graphs , *NATURAL language processing - Abstract
Graph-structured data arise ubiquitously in many application domains. A fundamental problem is to quantify their similarities. Graph kernels are often used for this purpose, which decompose graphs into substructures and compare these substructures. However, most of the existing graph kernels do not have the property of scale-adaptivity, i.e., they cannot compare graphs at multiple levels of granularities. Many real-world graphs such as molecules exhibit structure at varying levels of granularities. To tackle this problem, we propose a new graph kernel called Tree++ in this paper. At the heart of Tree++ is a graph kernel called the path-pattern graph kernel. The path-pattern graph kernel first builds a truncated BFS tree rooted at each vertex and then uses paths from the root to every vertex in the truncated BFS tree as features to represent graphs. The path-pattern graph kernel can only capture graph similarity at fine granularities. In order to capture graph similarity at coarse granularities, we incorporate a new concept called super path into it. The super path contains truncated BFS trees rooted at the vertices in a path. Our evaluation on a variety of real-world graphs demonstrates that Tree++ achieves the best classification accuracy compared with previous graph kernels. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
4. Using Research Literature to Generate Datasets of Implicit Feedback for Recommending Scientific Items
- Author
-
Marcia Barros, Andre Moitinho, and Francisco M. Couto
- Subjects
Recommender systems ,collaborative filtering ,scientific literature ,dataset ,astronomy ,chemical compounds ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In an age of information overload, we are faced with seemingly endless options from which a small number of choices must be made. For applications such as search engines and online stores, Recommender Systems have long become the key tool for assisting users in their choices. Interestingly, the use of Recommender Systems for recommending scientific items remains a rarity. One difficulty is that the development of such systems depends on the availability of adequate datasets of users' feedback. While there are several datasets available with the ratings of the users for books, music, or films, there is a lack of similar datasets for scientific fields, such as Astronomy and Life and Health Sciences. To address this issue, we propose a methodology that explores scientific literature for generating utility matrices of implicit feedback. The proposed methodology consists in identifying a list of items, finding research articles related to them, extracting the authors from each article, and finally creating a dataset where users are unique authors from the collected articles, and the rating values are the number of articles a unique author wrote about an item. Considering that literature is available for every scientific field, the methodology is in principle applicable to Recommender Systems in any scientific field. The methodology, which we call LIBRETTI (LIterature Based RecommEndaTion of scienTific Items), was assessed in two distinct study cases, Astronomy and Chemistry. Several evaluation metrics for the datasets generated with LIBRETTI were compared to those derived from other available datasets using the same set of recommender algorithms. The results were found to be similar, which provides a solid indication that LIBRETTI is a promising approach for generating datasets of implicit feedback for recommending scientific items.
- Published
- 2019
- Full Text
- View/download PDF
5. Answering Top-$k$ k Graph Similarity Queries in Graph Databases.
- Author
-
Zhu, Yuanyuan, Qin, Lu, Yu, Jeffrey Xu, and Cheng, Hong
- Subjects
- *
PRUNING , *RELAXATION techniques , *ELECTRICITY pricing , *CONSTRUCTION costs , *DATABASES - Abstract
Searching similar graphs in graph databases for a query graph has attracted extensive attention recently. Existing works on graph similarity queries are threshold based approaches which return graphs with distances to the query smaller than a given threshold. However, in many applications the number of answer graphs for the same threshold can vary significantly for different queries. In this paper, we study the problem of finding top- $k$ k most similar graphs for a query under the distance measure based on maximum common subgraph (MCS). Since computing MCS is NP-hard, we devise a novel framework to prune unqualified graphs based on the lower bounds of graph distance, and accordingly derive four lower bounds with different tightness and computational cost for pruning. To further reduce the number of MCS computations, we also propose an improved framework based on both lower and upper bounds, and derive three new upper bounds. To support efficient pruning, we design three indexes with different tradeoffs between pruning power and construction cost. To accelerate the index construction, we explore bound relaxation techniques, based on which approximate indexes can be efficiently built. We conducted extensive performance studies on real-life graph datasets to validate the effectiveness and efficiency of our approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
6. Resource Cut, a New Bounding Procedure to Algorithms for Enumerating Tree-Like Chemical Graphs.
- Author
-
Nishiyama, Yuhei, Shurbevski, Aleksandar, Nagamochi, Hiroshi, and Akutsu, Tatsuya
- Abstract
Enumerating chemical compounds with given structural properties plays an important role in structure elucidation, with applications such as drug design. We focus on the problem of enumerating tree-like chemical graphs specified by upper and lower bounds on feature vectors, where chemical graphs represent compounds, and a feature vector characterizes frequencies of finite paths in a graph. Building on the branch-and-bound algorithm proposed in earlier work, we propose a new bounding procedure, called Resource Cut, to speed up the enumeration process. Tree-like chemical graphs are modeled as vertex-colored trees, colors representing chemical elements. The algorithm is based on a scheme of generating each unique colored tree with a specified number $n$ of vertices. A colored tree is constructed by repeatedly appending vertices. Given a set $\mathcal {R}$ of $n$ colored vertices, we found that the algorithm often constructs trees that cannot be extended to a unique representation of a colored tree no matter how the remaining unused colored vertices in the set $\mathcal {R}$ are appended. We derive a mathematical condition to detect and discard such trees. Experimental results show that Resource Cut significantly reduces the search space. We have been able to obtain exact numbers of chemical graphs with up to 17 vertices excluding hydrogen atoms. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
7. $K$ -Ary Tree Hashing for Fast Graph Classification.
- Author
-
Wu, Wei, Li, Bin, Chen, Ling, Zhu, Xingquan, and Zhang, Chengqi
- Subjects
- *
HASHING , *ELECTRONIC file management , *INFORMATION resources management , *DATA modeling , *SEARCH algorithms , *DATA analysis - Abstract
Existing graph classification usually relies on an exhaustive enumeration of substructure patterns, where the number of substructures expands exponentially w.r.t. with the size of the graph set. Recently, the Weisfeiler-Lehman (WL) graph kernel has achieved the best performance in terms of both accuracy and efficiency among state-of-the-art methods. However, it is still time-consuming, especially for large-scale graph classification tasks. In this paper, we present a -Ary Tree based Hashing (KATH) algorithm, which is able to obtain competitive accuracy with a very fast runtime. The main idea of KATH is to construct a traversal table to quickly approximate the subtree patterns in WL using $K$-ary trees. Based on the traversal table, KATH employs a recursive indexing process that performs only $r$
times of matrix indexing to generate all $(r-1)$ -ary trees, where the leaf node labels of a tree can uniquely specify the pattern. After that, the MinHash scheme is used to fingerprint the acquired subtree patterns for a graph. Our experimental results on both real world and synthetic data sets show that KATH runs significantly faster than state-of-the-art methods while achieving competitive or better accuracy. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
8. Enumerating Substituted Benzene Isomers of Tree-Like Chemical Graphs.
- Author
-
Li, Jinghui, Nagamochi, Hiroshi, and Akutsu, Tatsuya
- Abstract
Enumeration of chemical structures is useful for drug design, which is one of the main targets of computational biology and bioinformatics. A chemical graph $G$
possibly with multiple edges if we contract each benzene ring into a single virtual atom of valence 6. All tree-like chemical graphs with a given tree representation $T$ are called the substituted benzene isomers of $T$ . Our algorithm first counts the number $f$- Published
- 2018
- Full Text
- View/download PDF
9. Predicting the Absorption Potential of Chemical Compounds Through a Deep Learning Approach.
- Author
-
Shin, Moonshik, Jang, Donjin, Nam, Hojung, Lee, Kwang Hyung, and Lee, Doheon
- Abstract
The human colorectal carcinoma cell line (Caco-2) is a commonly used in-vitro test that predicts the absorption potential of orally administered drugs. In-silico prediction methods, based on the Caco-2 assay data, may increase the effectiveness of the high-throughput screening of new drug candidates. However, previously developed in-silico models that predict the Caco-2 cellular permeability of chemical compounds use handcrafted features that may be dataset-specific and induce over-fitting problems. Deep Neural Network (DNN) generates high-level features based on non-linear transformations for raw features, which provides high discriminant power and, therefore, creates a good generalized model. We present a DNN-based binary Caco-2 permeability classifier. Our model was constructed based on 663 chemical compounds with in-vitro Caco-2 apparent permeability data. Two hundred nine molecular descriptors are used for generating the high-level features during DNN model generation. Dropout regularization is applied to solve the over-fitting problem and the non-linear activation. The Rectified Linear Unit (ReLU) is adopted to reduce the vanishing gradient problem. The results demonstrate that the high-level features generated by the DNN are more robust than handcrafted features for predicting the cellular permeability of structurally diverse chemical compounds in Caco-2 cell lines. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
10. SAW Sensor’s Frequency Shift Characterization for Odor Recognition and Concentration Estimation.
- Author
-
Hotel, Olivier, Poli, Jean-Philippe, Mer-Calfati, Christine, Scorsone, Emmanuel, and Saada, Samuel
- Abstract
In this paper, we propose an approach to determine the time constants and the amplitudes of the mass loading effect and of the viscoelastic contribution of SAW sensor’s frequency shift. This approach consists in optimizing a function of these parameters, which is independent of the concentration profile. We experimentally establish in laboratory conditions ( $T$ = 22 °C), on a data set composed of seven different gases, that these features are suitable for chemical compounds identification. In particular, we obtain a higher classification rate than the traditional amplitudes of the signals during the steady state, and we show that the classification success rate can be increased by using both of them in conjunction with a feature subset selection heuristic. We also propose a method based on deconvolution and kernel regression to estimate the temporal concentration profile. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
11. Multiple Structure-View Learning for Graph Classification.
- Author
-
Wu, Jia, Pan, Shirui, Zhu, Xingquan, Zhang, Chengqi, and Yu, Philip S.
- Subjects
- *
GRAPH theory , *FEATURE extraction , *MACHINE learning - Abstract
Many applications involve objects containing structure and rich content information, each describing different feature aspects of the object. Graph learning and classification is a common tool for handling such objects. To date, existing graph classification has been limited to the single-graph setting with each object being represented as one graph from a single structure-view. This inherently limits its use to the classification of complicated objects containing complex structures and uncertain labels. In this paper, we advance graph classification to handle multigraph learning for complicated objects from multiple structure views, where each object is represented as a bag containing several graphs and the label is only available for each graph bag but not individual graphs inside the bag. To learn such graph classification models, we propose a multistructure-view bag constrained learning (MSVBL) algorithm, which aims to explore substructure features across multiple structure views for learning. By enabling joint regularization across multiple structure views and enforcing labeling constraints at the bag and graph levels, MSVBL is able to discover the most effective substructure features across all structure views. Experiments and comparisons on real-world data sets validate and demonstrate the superior performance of MSVBL in representing complicated objects as multigraph for classification, e.g., MSVBL outperforms the state-of-the-art multiview graph classification and multiview multi-instance learning approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
12. Convolutional neural networks for raman spectral analysis of chemical mixtures
- Author
-
M. Hamed Mozaffari and Li-Lin Tay
- Subjects
data models ,1D convolutional neural networks ,spectroscopy ,convolutional neural networks ,Raman mixture analysis ,deep learning ,mean square error methods ,deep learning in chemometrics ,compounds ,chemical compounds ,handheld Raman spectrum analyzer ,multi-label classification - Abstract
In the spectroscopy domain, one-dimensional Convolutional Neural Networks (1D CNN) assist researchers in recognizing one pure chemical compound and distinguishing it from unknown substances. The novelty of this approach is that a trained CNN operates automatically with almost no pre-or post-processing of data. However, the application of 1-D CNNs has typically been restricted to a binary classification of pure chemical substances. This study highlights a new approach in spectral recognition and quantification of components in chemical mixtures. Two 1-D CNN models, RaMixNet I and II, have been developed for this purpose as two multi-label classifiers. Depending on data availability, there is no limit to the number of compounds in an unknown mixture to recognize by RaMixNet models. We trained RaMixNet models using generated Raman spectra utilizing a novel data augmentation technique that adds random noise and different baselines to each spectrum as well as random wavenumber shifts for Raman peaks. The experimental results over hundreds of generated synthetic test mixtures revealed that the classification accuracy of RaMixNet I and II is 100%; at the same time, the RaMixNet II model could reach an average means square error rate of 0.06 and R2 score of 0.76 for the quantification of each component. In a comparison study, RaMixNet models could distinguish components of six actual chemical mixtures better than well-established distance-based techniques in the literature., 2021 5th SLAAI International Conference on Artificial Intelligence (SLAAI-ICAI), December 6-7, 2021, Colombo, Sri Lanka
- Published
- 2021
13. Joint Structure Feature Exploration and Regularization for Multi-Task Graph Classification.
- Author
-
Pan, Shirui, Wu, Jia, Zhu, Xingquan, Zhang, Chengqi, and Yu, Philip S.
- Subjects
- *
GRAPH theory , *LEARNING , *MACHINE learning , *ALGORITHMS , *MOLECULES , *CHEMICALS - Abstract
Graph classification aims to learn models to classify structure data. To date, all existing graph classification methods are designed to target one single learning task and require a large number of labeled samples for learning good classification models. In reality, each real-world task may only have a limited number of labeled samples, yet multiple similar learning tasks can provide useful knowledge to benefit all tasks as a whole. In this paper, we formulate a new multi-task graph classification (MTG) problem, where multiple graph classification tasks are jointly regularized to find discriminative subgraphs shared by all tasks for learning. The niche of MTG stems from the fact that with a limited number of training samples, subgraph features selected for one single graph classification task tend to overfit the training data. By using additional tasks as evaluation sets, MTG can jointly regularize multiple tasks to explore high quality subgraph features for graph classification. To achieve this goal, we formulate an objective function which combines multiple graph classification tasks to evaluate the informativeness score of a subgraph feature. An iterative subgraph feature exploration and multi-task learning process is further proposed to incrementally select subgraph features for graph classification. Experiments on real-world multi-task graph classification datasets demonstrate significant performance gain. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
14. Degree-Based Topological Invariants of Metal-Organic Networks
- Author
-
Muhammad Kamran Siddiqui, Gang Hong, Zhen Gu, Muhammad Javaid, and Hafiz Muhammad Awais
- Subjects
Pure mathematics ,General Computer Science ,Degree (graph theory) ,Topological indices ,General Engineering ,chemical compounds ,Metal ,visual_art ,metals-organic networks ,visual_art.visual_art_medium ,Topological invariants ,General Materials Science ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Electrical and Electronic Engineering ,lcsh:TK1-9971 ,Mathematics - Abstract
Metal-organic networks (MONs) is a family of chemical compounds consisting of clusters or metal ions and organic ligands. These are studied as one, two or three dimensional structures of porous materials and subclasses of coordination polymers. MONs are mostly used in catalysis for the separation & purification of gases and as conducting solids or super-capacitors. In some situations, these networks are found to be stable in the process of removal or solvent of the guest molecules and could be restored with some other chemical compounds. The physical stability and mechanical properties of these networks have become a topic of great interest due to the aforesaid characteristics. Topological indices (TIs) are numeric quantities that are used to forecast the natural relationships among the physico-chemical characteristics of the chemical compounds in their fundamental network. During the studies of the MONs, TIs show an essential role in the theoretical & environmental chemistry and pharmacology. In this paper, we compute various latest developed degree-based TIs for two different metal-organic networks with increasing number of layers consisting on both metal and organic ligands vertices as well. A comparison among the computed different versions of the TIs with the help of the numerical values and their graphs is also included.
- Published
- 2020
15. Efficient Answering of Why-Not Questions in Similar Graph Matching.
- Author
-
Islam, Md. Saiful, Liu, Chengfei, and Li, Jianxin
- Subjects
- *
GRAPH theory , *MATCHING theory , *DATABASES , *APPLICATION software , *MISSING data (Statistics) , *APPROXIMATE solutions (Logic) - Abstract
Answering why-not questions in databases is promised to have wide application prospect in many areas and thereby, has attracted recent attention in the database research community. This paper addresses the problem of answering these so-called why-not questions in similar graph matching for graph databases. Given a set of answer graphs of an initial query graph $q$
such that the missing graphs are included in the new answer set of $q^*$ . We present an approximate solution to address the above as the optimal solution is NP-hard to compute. In our approach, we first compute the bounded search space and the distance to be minimized for $q^*$ . Then, we present a two-phase algorithm to find the new query $q^*$ . In the first phase, we generate a set of candidate edges to be added/deleted into/from the initial query $q$ within the bounded search space and in the second phase, we select a subset of candidate edges generated in the first phase to minimize the distance for $q^*$ . We also demonstrate the effectiveness and efficiency of our approach by conducting extensive experiments on two real datasets. [ABSTRACT FROM PUBLISHER]- Published
- 2015
- Full Text
- View/download PDF
16. Magnetic Anisotropy in Bicomponent Self-Assembled Ni and Ni-Pd Nanowires Studied by Magnetic Resonance Spectroscopy.
- Author
-
Bayev, Vadim, Streltsov, Eugene, Milosavljevic, Momir, Malashchonak, Mikalai, Maximenko, Alexey, Koltunowicz, Tomasz N., Zukowski, Pawel, and Kierczynski, Konrad
- Subjects
- *
MAGNETIC anisotropy , *MOLECULAR self-assembly , *NANOWIRES , *NUCLEAR magnetic resonance spectroscopy , *ALUMINUM oxide - Abstract
Self-ordered arrays of Ni, Ni(50)Pd(50), and Ni(78)Pd(22) nanowires were synthesized by simultaneous electrochemical deposition of Ni and Pd components into porous templates of anodic aluminum oxide using alternating current. This paper is focused on the interplay of structure and chemical composition of Ni and Ni–Pd bicomponent nanowires arrays and the peculiarities of its magnetic anisotropy. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
17. Research of the efficiency Of using isotope exchange in carbon dioxide in the process of obtaining highly enriched 13C in a gas centrifuge cascade.
- Author
-
Viktor, Sovach and Alexey, Orlov
- Abstract
The mechanism of the isotope exchange reaction proceeding in carbon dioxide in an isotope exchange reactor (IER) is considered. Possible schemes of the given reaction are given. Changes in molecular spectrum of carbon dioxide and distribution of 13С isotope at the inlet and at the outlet of the reactor are shown. Dependence of efficiency of a gas centrifuge (GC) cascade for obtaining highly enriched C13 on IER position for different values of isotope exchange degree is studied. Techniques of calculating a GC cascade for separation of polyisotopic chemical compounds are used. It is shown that the cascade efficiency functional dependence of the place of IER installation is a unimodular convex line which has a maximum. Places of optimal location of one or two IERs in a GC cascade and the values of isotope exchange in them at which more than 99% enrichment in 13С can be achieved are determined. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
18. Analysis of Agarwood oil (Aquilaria Malaccensis) based on GC-MS data.
- Author
-
Ali, Nor Azah Mohd, Ismail, Nurlaila, and Taib, Mohd Nasir
- Abstract
Agarwood oil has been widely used especially in fragrance, incense, prayers and traditional medicinal. In the Middle East, the market demand for Agarwood oil is very high. Agarwood oil is traded based on high grade and low grade, corresponding to expensive price and cheap price, respectively. Currently, the grading of Agarwood oil, specifically Aquilaria Malaccensis, depends on its physical appearance such as color and odour. This paper presents the analysis of Aquilaria Malaccensis based on GC-MS data. The work involves of statistical technique such as boxplot and PCA. The analysis part was done on 64 chemical compounds on 7 samples of agarwood oil obtained by Forest Research Institute Malaysia (FRIM). It was done via MATLAB ver. R2010a. The result shows that the distribution of chemical compounds in Agarwood oil is not normal and five componets is identified from 64 variables Agarwood oil samples, gathered by boxplot and PCA, individually. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
19. High-speed hardware algorithm for continuous mode time-of-flight mass spectrometry.
- Author
-
Held, Roman, Goette, Josef, Jacomet, Marcel, Tanner, Christian, Gonin, Marc, and Tanner, Martin
- Abstract
We describe a real-time, high-speed data-acquisition and data-processing system for continuous mode time-of-flight mass spectrometers. To achieve data acquisition rates of 2 × 1.5 giga samples per second (GS/s), needed for the considered class of mass spectrometers, we implement the system as an Fpga-based hardware algorithm. We must solve two most challenging problems: First, the high-speed acquisition produces an enormous amount of data that we handle by on-the-fly data compression/uncompression to circumvent the memory-bandwidth restrictions. Second, the need for continuous acquisition of mass spectra and event-triggering ask for powerful hardware algorithms that allow to measure long signals that are composed of ultra-short signal pulses due to single aerosol- or nano particles. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
20. Using Research Literature to Generate Datasets of Implicit Feedback for Recommending Scientific Items
- Author
-
André Moitinho, M. Barros, and Francisco M. Couto
- Subjects
0301 basic medicine ,Research literature ,General Computer Science ,Computer science ,Literature based ,Scientific literature ,Scientific field ,Recommender system ,01 natural sciences ,Set (abstract data type) ,03 medical and health sciences ,Search engine ,Recommender systems ,dataset ,General Materials Science ,Electrical and Electronic Engineering ,Information retrieval ,scientific literature ,General Engineering ,chemical compounds ,0104 chemical sciences ,astronomy ,010404 medicinal & biomolecular chemistry ,030104 developmental biology ,collaborative filtering ,Key (cryptography) ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,lcsh:TK1-9971 - Abstract
In an age of information overload, we are faced with seemingly endless options from which a small number of choices must be made. For applications such as search engines and online stores, Recommender Systems have long become the key tool for assisting users in their choices. Interestingly, the use of Recommender Systems for recommending scientific items remains a rarity. One difficulty is that the development of such systems depends on the availability of adequate datasets of users' feedback. While there are several datasets available with the ratings of the users for books, music, or films, there is a lack of similar datasets for scientific fields, such as Astronomy and Life and Health Sciences. To address this issue, we propose a methodology that explores scientific literature for generating utility matrices of implicit feedback. The proposed methodology consists in identifying a list of items, finding research articles related to them, extracting the authors from each article, and finally creating a dataset where users are unique authors from the collected articles, and the rating values are the number of articles a unique author wrote about an item. Considering that literature is available for every scientific field, the methodology is in principle applicable to Recommender Systems in any scientific field. The methodology, which we call LIBRETTI (LIterature Based RecommEndaTion of scienTific Items), was assessed in two distinct study cases, Astronomy and Chemistry. Several evaluation metrics for the datasets generated with LIBRETTI were compared to those derived from other available datasets using the same set of recommender algorithms. The results were found to be similar, which provides a solid indication that LIBRETTI is a promising approach for generating datasets of implicit feedback for recommending scientific items.
- Published
- 2019
21. Frequent Substructure-Based Approaches for Classifying Chemical Compounds.
- Author
-
Deshpande, Mukund, Kuramochi, Michihiro, Wale, Nikil, and Karypis, George
- Subjects
- *
COMPUTATIONAL biology , *ALGORITHMS , *PHARMACEUTICAL industry , *PHARMACEUTICAL research , *SYSTEMS design , *PHARMACOLOGY - Abstract
Computational techniques that build models to correctly assign chemical compounds to various classes of interest have many applications in pharmaceutical research and are used extensively at various phases during the drug development process. These techniques are used to solve a number of classification problems such as predicting whether or not a chemical compound has the desired biological activity, is toxic or nontoxic, and filtering out drug-like compounds from large compound libraries. This paper presents a substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the data set. The advantage of this approach is that during classification model construction, all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Experimental evaluation on eight different classification problems shows that our approach is computationally scalable and, on average, outperforms existing schemes by 7 percent to 35 percent. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
22. Deriving Quantitative Structure-Activity Relationship Models Using Genetic Programming for Drug Discovery
- Author
-
Neophytou, K., Nicolaou, Christos A., Pattichis, Constantinos S., Schizas, Christos N., Pattichis, Constantinos S. [0000-0003-1271-8151], Schizas, Christos N. [0000-0001-6548-4980], and Nicolaou, Christos A. [0000-0002-1466-6992]
- Subjects
QSAR analysis ,Selwood dataset ,Quantitative structure–activity relationship ,Heuristic search algorithms ,Scale (ratio) ,Molecular graphics ,Computer science ,Quantitative Structure-Activity Relationship ,Genetic programming ,Learning algorithms ,Computer programming ,Machine learning ,computer.software_genre ,Descriptors ,Field (computer science) ,Molecular descriptors ,Chemical compounds ,QSAR modeling ,Constant (computer programming) ,Chemical activities ,Molecular descriptor ,Arsenic compounds ,Chemotherapy ,Heuristic algorithms ,Sulfur compounds ,Benchmark dataset ,QSAR ,business.industry ,Health care ,Drug dosage ,Genetic algorithms ,Heuristic programming ,Chlorine compounds ,Human experts ,Drug delivery ,Benchmark (computing) ,Drug discoveries ,Data sets ,Artificial intelligence ,Evolutionary techniques ,business ,computer ,Forecasting ,Applicability domain - Abstract
Genetic Programming is a heuristic search algorithm inspired by evolutionary techniques that has been shown to produce satisfactory solutions to problems related to several scientific domains [1]. Presented here is a methodology for the creation of Quantitative StructureActivity Relationship (QSAR) models for the prediction of chemical activity, using Genetic Programming, QSAR analysis is crucial for drug discovery since good QSAR models enable human experts to select compounds with increased chances of being active for further investigations. Our technique has been tested using the Selwood data set, a benchmark dataset for the QSAR field [2]. The results indicate that the QSAR models created are accurate, reliable and simple and can thus be used to identify molecular descriptors correlated with measured activity and for the prediction of the activity of untested molecules. The QSAR models we generated predict the activity of untested molecules with an error ranging between 0.46 - 0.8 on the scale [-1,1]. These results compare favourably with results sited in the literature for the same dataset [3], [4]. Our models are constructed using any combination of the arithmetic operators {+, -, /, *}, the descriptors available and constant values. ©2008 IEEE. 277 280 Conference code: 73030 Cited By :3
- Published
- 2007
- Full Text
- View/download PDF
23. Efficiently Indexing Large Sparse Graphs for Similarity Search.
- Author
-
Wang, Guoren, Wang, Bin, Yang, Xiaochun, and Yu, Ge
- Subjects
- *
PROTEIN-protein interactions , *GRAPHIC methods , *GRAPH theory , *NATURAL language processing , *ISOMORPHISM (Mathematics) , *DIRECTED acyclic graphs , *SEARCH algorithms - Abstract
The graph structure is a very important means to model schemaless data with complicated structures, such as protein-protein interaction networks, chemical compounds, knowledge query inferring systems, and road networks. This paper focuses on the index structure for similarity search on a set of large sparse graphs and proposes an efficient indexing mechanism by introducing the Q-Gram idea. By decomposing graphs to small grams (organized by κ-Adjacent Tree patterns) and pairing-up on those κ-Adjacent Tree patterns, the lower bound estimation of their edit distance can be calculated for candidate filtering. Furthermore, we have developed a series of techniques for inverted index construction and online query processing. By building the candidate set for the query graph before the exact edit distance calculation, the number of graphs need to proceed into exact matching can be greatly reduced. Extensive experiments on real and synthetic data sets have been conducted to show the effectiveness and efficiency of the proposed indexing mechanism. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
24. Manipulating the Steady State of Metabolic Pathways.
- Author
-
Song, Bin, Buyuktahtakin, I. Esra, Ranka, Sanjay, and Kahveci, Tamer
- Abstract
Metabolic pathways show the complex interactions among enzymes that transform chemical compounds. The state of a metabolic pathway can be expressed as a vector, which denotes the yield of the compounds or the flux in that pathway at a given time. The steady state is a state that remains unchanged over time. Altering the state of the metabolism is very important for many applications such as biomedicine, biofuels, food industry, and cosmetics. The goal of the enzymatic target identification problem is to identify the set of enzymes whose knockouts lead the metabolism to a state that is close to a given goal state. Given that the size of the search space is exponential in the number of enzymes, the target identification problem is very computationally intensive. We develop efficient algorithms to solve the enzymatic target identification problem in this paper. Unlike existing algorithms, our method works for a broad set of metabolic network models. We measure the effect of the knockouts of a set of enzymes as a function of the deviation of the steady state of the pathway after their knockouts from the goal state. We develop two algorithms to find the enzyme set with minimal deviation from the goal state. The first one is a traversal approach that explores possible solutions in a systematic way using a branch and bound method. The second one uses genetic algorithms to derive good solutions from a set of alternative solutions iteratively. Unlike the former one, this one can run for very large pathways. Our experiments show that our algorithms' results follow those obtained in vitro in the literature from a number of applications. They also show that the traversal method is a good approximation of the exhaustive search algorithm and it is up to 11 times faster than the exhaustive one. This algorithm runs efficiently for pathways with up to 30 enzymes. For large pathways, our genetic algorithm can find good solutions in less than 10 minutes. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.