398 results on '"Literature-based discovery"'
Search Results
2. AHAM: Adapt, Help, Ask, Model Harvesting LLMs for Literature Mining
- Author
-
Koloski, Boshko, Lavrač, Nada, Cestnik, Bojan, Pollak, Senja, Škrlj, Blaž, Kastrin, Andrej, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Miliou, Ioanna, editor, Piatkowski, Nico, editor, and Papapetrou, Panagiotis, editor
- Published
- 2024
- Full Text
- View/download PDF
3. Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique
- Author
-
Ilya Tyagin and Ilya Safro
- Subjects
Hypothesis Generation ,Literature-based Discovery ,Link Prediction ,Benchmarking ,Natural Language Processing ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. Results This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. Conclusions Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .
- Published
- 2024
- Full Text
- View/download PDF
4. Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique.
- Author
-
Tyagin, Ilya and Safro, Ilya
- Subjects
- *
KNOWLEDGE graphs , *BENCHMARKING (Management) , *NATURAL language processing , *HYPOTHESIS , *SCIENTIFIC discoveries , *SEMANTICS - Abstract
Background: Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. Results: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. Conclusions: Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. What is a related work? A typology of relationships in research literature
- Author
-
Doroudi, Shayan
- Subjects
Literature search ,Literature-based discovery ,Information retrieval ,Relevance ,Analogies ,Abstraction ,Structure-mapping theory ,Artificial Intelligence and Image Processing ,History and Philosophy of Specific Fields ,Philosophy - Abstract
An important part of research is situating one's work in a body of existing literature, thereby connecting to existing ideas. Despite this, the various kinds of relationships that might exist among academic literature do not appear to have been formally studied. Here I present a graphical representation of academic work in terms of entities and relations, drawing on structure-mapping theory (used in the study of analogies). I then use this representation to present a typology of operations that could relate two pieces of academic work. I illustrate the various types of relationships with examples from medicine, physics, psychology, history and philosophy of science, machine learning, education, and neuroscience. The resulting typology not only gives insights into the relationships that might exist between static publications, but also the rich process whereby an ongoing research project evolves through interactions with the research literature.
- Published
- 2023
6. Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models
- Author
-
Robert J. Millikin, Kalpana Raja, John Steill, Cannon Lock, Xuancheng Tu, Ian Ross, Lam C. Tsoi, Finn Kuusisto, Zijian Ni, Miron Livny, Brian Bockelman, James Thomson, and Ron Stewart
- Subjects
Literature-based discovery ,Knowledge graph ,Biomedical text mining ,Relation extraction ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A–B–C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. Results We demonstrate SKiM’s ability to discover useful A–B–C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface ( https://skim.morgridge.org ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. Conclusions SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.
- Published
- 2023
- Full Text
- View/download PDF
7. Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models.
- Author
-
Millikin, Robert J., Raja, Kalpana, Steill, John, Lock, Cannon, Tu, Xuancheng, Ross, Ian, Tsoi, Lam C., Kuusisto, Finn, Ni, Zijian, Livny, Miron, Bockelman, Brian, Thomson, James, and Stewart, Ron
- Subjects
- *
KNOWLEDGE graphs , *DRUG repositioning , *RESEARCH personnel , *BIOMEDICAL organizations - Abstract
Background: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A–B–C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. Results: We demonstrate SKiM's ability to discover useful A–B–C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface (https://skim.morgridge.org) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. Conclusions: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. Improving automated literature-based discovery with neural networks : neural biomedical named entity recognition, link prediction and discovery
- Author
-
Crichton, Gamal Kashaka Omari and Korhonen, Anna
- Subjects
Literature-based Discovery ,LBD ,Neural networks ,Named Entity Recognition ,NER ,Multi-task Learning ,LION LBD ,knowledge discovery ,Natural Language Processing ,NLP ,Machine Learning ,Deep Learning ,Biomedical NLP ,Biomedical Knowledge Discovery ,Link Predcition ,Language Technology Laboratory - Abstract
Literature-based Discovery (LBD) uses information from explicit statements in literature to generate new or unstated knowledge. Automated LBD can thus facilitate hypothesis testing and generation from large collections of publications to support and accelerate scientific research, which is adversely affected by publication explosion and knowledge fragmentation. Existing methods, however, use methodologies which are inadequate for capturing the complex information available in scientific literature and are prone to proposing spurious discoveries or an abundance of low-quality ones. To be capable of solving these problems, automated LBD needs to accurately glean the extensive information present in literature, cope with the dynamic nature of scientific knowledge and place high-quality proposals at the top of ranked outputs. Recent advances in Natural Language Processing (NLP) allow for deep textual analysis to obtain a wide coverage of information present in text and can adapt easily to recognising new biomedical entities and terms. Similarly, recent advances in graph processing have made it possible to do in-depth analysis on information represented as graphs, such as published biomedical connections, to facilitate high-quality knowledge discovery. Both of these advances utilise neural networks extensively. This work used neural networks in a bid to advance automated LBD in three ways: 1) improving biomedical Named Entity Recognition (NER) to extract entities from unstructured text by using multi-task learning across multiple biomedical datasets; 2) improving knowledge discovery from realistic, random- and time-sliced biomedical graphs using link prediction and 3) improving the ranking of published discoveries on open- and closed- LBD instances by scoring the strength of connection paths using neural models. Excitingly, the latter approaches outperformed those used by the state-of-the-art LION LBD system, indicating that their integration into it would provide better support to cancer researchers using it. The results from this work show that it is feasible to use neural networks to improve LBD in different ways. They also demonstrate that neural networks are versatile enough to be applied to improve traditional as well as non-traditional LBD. The principal implication of these findings is that neural biomedical knowledge discovery, especially LBD, is presently useful in addition to being a potentially rich field for further study.
- Published
- 2019
- Full Text
- View/download PDF
9. Using literature-based discovery to develop hypotheses for the moderating effect of massively multiplayer online games [version 1; peer review: 1 not approved]
- Author
-
Ananya Sinha Choudhury, Wendy Hui, and John Lau
- Subjects
Research Article ,Articles ,Literature-based discovery ,massively multiplayer online games ,flow ,addiction - Abstract
Background: Empirical studies have shown that the relationship between psychological flow state and game addiction tends to be weaker in massively multiplayer online (MMO) games compared with non-MMO games. However, a theoretical explanation for the moderating effect of MMO games is lacking in the literature. This paper uses interview data and a method for generating hypotheses, literature-based discovery (LBD), to identify potential moderating factors and develop theories about this relationship. Methods: The proposed method involved text mining 2,829 abstracts to generate a keyword list of potential underlying moderating factors. Interview data from three domain experts confirmed the usefulness of LBD. Instead of arriving at game addiction primarily through flow, the interview data revealed that different cognitive pathways may lead to game addiction in MMO games. Results: Specifically, the identified keywords led to three explanations for the observed moderating effect: (1) social interaction in MMOGs may prevent the progression from flow to game addiction or induce positive peer influence; (2) game performance typically measured using a score- or point-based system in non-MMO games offers an extrinsic motivation that is more in line with flow theory; and (3) intrinsic motivation and escapism may be more important drivers of MMO game addiction. This paper summarizes the domain experts’ views on the usefulness of LBD in theory development. Conclusions: This paper uses literature-based discovery (LBD) to demonstrate how the pathways to game addiction in MMO games differ from non-MMO games. LBD is a method for generating hypotheses seldom used in the social science literature.
- Published
- 2023
- Full Text
- View/download PDF
10. Editorial: Emerging areas in literature-based discovery
- Author
-
Yakub Sebastian and Neil R. Smalheiser
- Subjects
literature-based discovery ,fundamental issues ,knowledge gaps ,usability ,data visualization ,network analysis ,Bibliography. Library science. Information resources - Published
- 2023
- Full Text
- View/download PDF
11. Prioritization of COVID-19-Related Literature via Unsupervised Keyphrase Extraction and Document Representation Learning
- Author
-
Škrlj, Blaž, Jukič, Marko, Eržen, Nika, Pollak, Senja, Lavrač, Nada, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Soares, Carlos, editor, and Torgo, Luis, editor
- Published
- 2021
- Full Text
- View/download PDF
12. Semantic text classification for cancer text mining
- Author
-
Baker, Simon and Korhonen, Anna
- Subjects
616.99 ,Cancer ,Text Mining ,Machine Learning ,Classification ,Literature-based Discovery ,Hallmarks of Cancer ,Deep Learning ,Artificial Intelligence - Abstract
Cancer researchers and oncologists benefit greatly from text mining major knowledge sources in biomedicine such as PubMed. Fundamentally, text mining depends on accurate text classification. In conventional natural language processing (NLP), this requires experts to annotate scientific text, which is costly and time consuming, resulting in small labelled datasets. This leads to extensive feature engineering and handcrafting in order to fully utilise small labelled datasets, which is again time consuming, and not portable between tasks and domains. In this work, we explore emerging neural network methods to reduce the burden of feature engineering while outperforming the accuracy of conventional pipeline NLP techniques. We focus specifically on the cancer domain in terms of applications, where we introduce two NLP classification tasks and datasets: the first task is that of semantic text classification according to the Hallmarks of Cancer (HoC), which enables text mining of scientific literature assisted by a taxonomy that explains the processes by which cancer starts and spreads in the body. The second task is that of the exposure routes of chemicals into the body that may lead to exposure to carcinogens. We present several novel contributions. We introduce two new semantic classification tasks (the hallmarks, and exposure routes) at both sentence and document levels along with accompanying datasets, and implement and investigate a conventional pipeline NLP classification approach for both tasks, performing both intrinsic and extrinsic evaluation. We propose a new approach to classification using multilevel embeddings and apply this approach to several tasks; we subsequently apply deep learning methods to the task of hallmark classification and evaluate its outcome. Utilising our text classification methods, we develop and two novel text mining tools targeting real-world cancer researchers. The first tool is a cancer hallmark text mining tool that identifies association between a search query and cancer hallmarks; the second tool is a new literature-based discovery (LBD) system designed for the cancer domain. We evaluate both tools with end users (cancer researchers) and find they demonstrate good accuracy and promising potential for cancer research.
- Published
- 2018
- Full Text
- View/download PDF
13. Metaverse Shape of Your Life for Future: A bibliometric snapshot
- Author
-
Muhammet Damar
- Subjects
metaverse ,bibliometrics ,virtual world ,literature minings ,literature-based discovery ,Technology - Abstract
The metaverse was first introduced in 1992. Many people saw Metaverse as a new word but the concept of Metaverse is not a new term. However, Zuckerberg’s press release drew all the attention to the Metaverse. This study presents a bibliometric evaluation of metaverse technology, which has been discussed in the literature since the nineties. A field study is carried out especially for the metaverse, which is a new and trendy subject. In this way, descriptive information is presented on journals, institutions, prominent researchers, and countries in the field, as well as extra evaluation on the prominent topics in the field and researchers with heavy citations. In our study, which was carried out by extracting the data of all documents between the years 1990-2021 from the Web of Science database, it was seen that there were few studies in the literature in the historical process for the metaverse, whose popularity has reached its peak in recent months. In addition, it is seen that the subject is handled intensively with virtual reality and augmented reality technologies, and the education sector and digital marketing fields show interest in the field. Metaverse will probably have entered many areas of our lives in the next 15-20 years, shape our lives by taking advantage of the opportunities of developing technology.
- Published
- 2021
14. Who Is Who in Literature-Based Discovery: Preliminary Analysis
- Author
-
Kastrin, Andrej, Hristovski, Dimitar, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lu, Wei, editor, and Zhu, Kenny Q., editor
- Published
- 2020
- Full Text
- View/download PDF
15. A Network Approach for Mapping and Classifying Shared Terminologies Between Disparate Literatures in the Social Sciences
- Author
-
Mejia, Cristian, Kajikawa, Yuya, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lu, Wei, editor, and Zhu, Kenny Q., editor
- Published
- 2020
- Full Text
- View/download PDF
16. Towards Creating a New Triple Store for Literature-Based Discovery
- Author
-
Koroleva, Anna, Anisimova, Maria, Gil, Manuel, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lu, Wei, editor, and Zhu, Kenny Q., editor
- Published
- 2020
- Full Text
- View/download PDF
17. Research in Progress: Explaining the Moderating Effect of Massively Multiplayer Online (MMO) Games on the Relationship Between Flow and Game Addiction Using Literature-Based Discovery (LBD)
- Author
-
Hui, Wendy, Lau, Wai Kwong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lu, Wei, editor, and Zhu, Kenny Q., editor
- Published
- 2020
- Full Text
- View/download PDF
18. Connecting the Dots: Hypotheses Generation by Leveraging Semantic Shifts
- Author
-
Thilakaratne, Menasha, Falkner, Katrina, Atapattu, Thushari, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lauw, Hady W., editor, Wong, Raymond Chi-Wing, editor, Ntoulas, Alexandros, editor, Lim, Ee-Peng, editor, Ng, See-Kiong, editor, and Pan, Sinno Jialin, editor
- Published
- 2020
- Full Text
- View/download PDF
19. Using Link Prediction Methods to Examine Networks of Co-occurring MeSH Terms in Zika and CRISPR Research
- Author
-
Li, Meng-Hao, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sundqvist, Anneli, editor, Berget, Gerd, editor, Nolin, Jan, editor, and Skjerdingstad, Kjell Ivar, editor
- Published
- 2020
- Full Text
- View/download PDF
20. Transducer Cascades for Biological Literature-Based Discovery.
- Author
-
Maurel, Denis, Chéry, Sandy, Bidoit, Nicole, Chatalic, Philippe, Filali, Aziza, Froidevaux, Christine, and Poupon, Anne
- Subjects
- *
NATURAL language processing , *G protein coupled receptors , *MAMMAL genomes , *TRANSDUCERS , *DATABASE design , *RELATIONAL databases , *INFORMATION resources - Abstract
G protein-coupled receptors (GPCRs) control the response of cells to many signals, and as such, are involved in most cellular processes. As membrane receptors, they are accessible at the surface of the cell. GPCRs are also the largest family of membrane receptors, with more than 800 representatives in mammal genomes. For this reason, they are ideal targets for drugs. Although about one third of approved drugs target GPCRs, only about 16% of GPCRs are targeted by drugs. One of the difficulties comes from the lack of knowledge on the intra-cellular events triggered by these molecules. In the last two decades, scientists have started mapping the signaling networks triggered by GPCRs. However, it soon appeared that the system is very complex, which led to the publication of more than 320,000 scientific papers. Clearly, a human cannot take into account such massive sources of information. These papers represent a mine of information about both ontological knowledge and experimental results related to GPCRs, which have to be exploited in order to build signaling networks. The ABLISS project aims at the automatic building of GPCRs networks using automated deductive reasoning, allowing to integrate all available data. Therefore, we processed the automatic extraction of network information from the literature using Natural Language Processing (NLP). We mainly focused on the experimental results about GPCRs reported in the scientific papers, as so far there is no source gathering all these experimental results. We designed a relational database in order to make them available to the scientific community later. After introducing the more general objectives of the ABLISS project, we describe the formalism in detail. We then explain the NLP program using the finite state methods (Unitex graph cascades) we implemented and discuss the extracted facts obtained. Finally, we present the design of the relational database that stores the facts extracted from the selected papers. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. A Scalable Embedding Based Neural Network Method for Discovering Knowledge From Biomedical Literature.
- Author
-
Sang, Shengtian, Liu, Xiaoxia, Chen, Xiaoyu, and Zhao, Di
- Abstract
Nowadays, the amount of biomedical literatures is growing at an explosive speed, and much useful knowledge is yet undiscovered in the literature. Classical information retrieval techniques allow to access explicit information from a given collection of information, but are not able to recognize implicit connections. Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting literature. It could significantly support scientific research by identifying new connections between biomedical entities. However, most of the existing approaches to LBD are not scalable and may not be sufficient to detect complex associations in non-directly-connected literature. In this article, we present a model which incorporates biomedical knowledge graph, graph embedding, and deep learning methods for literature-based discovery. First, the relations between biomedical entities are extracted from biomedical abstracts and then a knowledge graph is constructed by using these obtained relations. Second, the graph embedding technologies are applied to convert the entities and relations in the knowledge graph into a low-dimensional vector space. Third, a bidirectional Long Short-Term Memory (BLSTM) network is trained based on the entity associations represented by the pre-trained graph embeddings. Finally, the learned model is used for open and closed literature-based discovery tasks. The experimental results show that our method could not only effectively discover hidden associations between entities, but also reveal the corresponding mechanism of interactions. It suggests that incorporating knowledge graph and deep learning methods is an effective way for capturing the underlying complex associations between entities hidden in the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Computational Literature-based Discovery for Natural Products Research: Current State and Future Prospects
- Author
-
Andreas Lardos, Ahmad Aghaebrahimian, Anna Koroleva, Julia Sidorova, Evelyn Wolfram, Maria Anisimova, and Manuel Gil
- Subjects
literature-based discovery ,natural products ,text mining ,knowledge graph ,natural language processing ,swanson ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Literature-based discovery (LBD) mines existing literature in order to generate new hypotheses by finding links between previously disconnected pieces of knowledge. Although automated LBD systems are becoming widespread and indispensable in a wide variety of knowledge domains, little has been done to introduce LBD to the field of natural products research. Despite growing knowledge in the natural product domain, most of the accumulated information is found in detached data pools. LBD can facilitate better contextualization and exploitation of this wealth of data, for example by formulating new hypotheses for natural product research, especially in the context of drug discovery and development. Moreover, automated LBD systems promise to accelerate the currently tedious and expensive process of lead identification, optimization, and development. Focusing on natural product research, we briefly reflect the development of automated LBD and summarize its methods and principal data sources. In a thorough review of published use cases of LBD in the biomedical domain, we highlight the immense potential of this data mining approach for natural product research, especially in context with drug discovery or repurposing, mode of action, as well as drug or substance interactions. Most of the 91 natural product-related discoveries in our sample of reported use cases of LBD were addressed at a computer science audience. Therefore, it is the wider goal of this review to introduce automated LBD to researchers who work with natural products and to facilitate the dialogue between this community and the developers of automated LBD systems.
- Published
- 2022
- Full Text
- View/download PDF
23. Literature-based discovery approaches for evidence-based healthcare: a systematic review.
- Author
-
Cheerkoot-Jalim, Sudha and Khedo, Kavi Kumar
- Abstract
Purpose: Literature-Based Discovery (LBD) is a text mining technique used to generate novel hypotheses from vast amounts of literature sources, by identifying links between concepts from disparate sources. One of the main areas where it has been predominantly applied is the healthcare domain, whereby promising results, in the form of novel hypotheses, have been reported. The purpose of this work was to conduct a systematic literature review of recent publications on LBD in the healthcare domain in order to assess the trends in the approaches used and to identify issues and challenges for such systems. Methods: The review was conducted following the principles of the Kitchenham method. The selected studies have been scrutinized and the derived findings have been reported following the PRISMA guidelines. Results: The review results reveal useful information regarding the application areas, the data sources considered, the approaches used, the performance in terms of accuracy and reliability and future research challenges. The results of this review will be beneficial to LBD researchers and other stakeholders in the healthcare domain, by providing them with useful insights on the approaches to adopt, data sources to consider, evaluation model to use and challenges to reflect on. Conclusion: The synthesis of the results of this work has shed light on recent issues and challenges that drive new LBD models and provides avenues for their application in other diverse areas in the healthcare domain. To the best of our knowledge, no such recent review has been conducted. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
24. Storytelling with Signal Injection: Focusing Stories with Domain Knowledge
- Author
-
Rigsby, J. T., Barbará, Daniel, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, and Perner, Petra, editor
- Published
- 2018
- Full Text
- View/download PDF
25. A bibliometric analysis for global research trends on ectomycorrhizae over the past thirty years
- Author
-
Jiang, Xu and Yanbin, Liu
- Published
- 2018
- Full Text
- View/download PDF
26. Exploration of Shared Themes Between Food Security and Internet of Things Research Through Literature-Based Discovery
- Author
-
Cristian Mejia and Yuya Kajikawa
- Subjects
literature-based discovery ,citation networks ,text mining ,food security ,poverty alleviation ,SDGs ,Bibliography. Library science. Information resources - Abstract
This paper applied a literature-based discovery methodology utilizing citation networks and text mining in order to extract and represent shared terminologies found in disjoint academic literature on food security and the Internet of Things. The topic of food security includes research on improvements in nutrition, sustainable agriculture, and a plurality of other social challenges, while the Internet of Things refers to a collection of technologies from which solutions can be drawn. Academic articles on both topics were classified into subclusters, and their text contents were compared against each other to find shared terms. These terms formed a network from which clusters of related keywords could be identified, potentially easing the exploration of common themes. Thirteen transversal themes, including blockchain, healthcare, and air quality, were found. This method can be applied by policymakers and other stakeholders to understand how a given technology could contribute to solving a pressing social issue.
- Published
- 2021
- Full Text
- View/download PDF
27. PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks
- Author
-
Blaž Škrlj, Enja Kokalj, and Nada Lavrač
- Subjects
literature-based discovery ,knowledge graphs ,PubMed ,data-mining ,machine-learning ,representation learning ,Bibliography. Library science. Information resources - Abstract
PubMed is the largest resource of curated biomedical knowledge to date, entailing more than 25 million documents. Large quantities of novel literature prevent a single expert from keeping track of all potentially relevant papers, resulting in knowledge gaps. In this article, we present CHEMMESHNET, a newly developed PubMed-based network comprising more than 10,000,000 associations, constructed from expert-curated MeSH annotations of chemicals based on all currently available PubMed articles. By learning latent representations of concepts in the obtained network, we demonstrate in a proof of concept study that purely literature-based representations are sufficient for the reconstruction of a large part of the currently known network of physical, empirically determined protein–protein interactions. We demonstrate that simple linear embeddings of node pairs, when coupled with a neural network–based classifier, reliably reconstruct the existing collection of empirically confirmed protein–protein interactions. Furthermore, we demonstrate how pairs of learned representations can be used to prioritize potentially interesting novel interactions based on the common chemical context. Highly ranked interactions are qualitatively inspected in terms of potential complex formation at the structural level and represent potentially interesting new knowledge. We demonstrate that two protein–protein interactions, prioritized by structure-based approaches, also emerge as probable with regard to the trained machine-learning model.
- Published
- 2021
- Full Text
- View/download PDF
28. Building a Knowledge Graph Representing Causal Associations Between Risk Factors and Incidence of Breast Cancer.
- Author
-
DAOWD, Ali, BARRETT, Michael, ABIDI, Samina, and ABIDI, Syed Sibte Raza
- Abstract
This paper explores the use of semantic- and evidence-based biomedical knowledge to build the RiskExplorer knowledge graph that outlines causal associations between risk factors and chronic disease or cancers. The intent of this work is to offer an interactive knowledge synthesis platform to empower health- information-seeking individuals to learn about and mitigate modifiable risk factors. Our approach analyzes biomedical text (from PubMed abstracts), Semantic Medline database, evidence-based semantic associations, literature-based discovery, and graph database to discover associations between risk factors and breast cancer. Our methodological framework involves (a) identifying relevant literature on specified chronic diseases or cancers, (b) extracting semantic associations via knowledge mining tool, (c) building rich semantic graph by transforming semantic associations to nodes and edges, (d) applying frequency-based methods and using semantic edge properties to traverse the graph and identify meaningful multi-node NCD risk paths. Generated multi-node risk paths consist of a source node (representing the source risk factor), one or more intermediate nodes (representing biomedical phenotypes), a target node (representing a chronic disease or cancer), and edges between nodes representing meaningful semantic associations. The results demonstrate that our methodology is capable of generating biomedically valid knowledge related to causal risk and protective factors related to breast cancer. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
29. A Knowledge Graph of Mechanistic Associations Between COVID-19, Diabetes Mellitus and Kidney Diseases.
- Author
-
BARRETT, Michael, DAOWD, Ali, ABIDI, Syed Sibte Raza, and ABIDI, Samina
- Abstract
This paper proposes an automated knowledge synthesis and discovery framework to analyze published literature to identify and represent underlying mechanistic associations that aggravate chronic conditions due to COVID-19. We present a literature-based discovery approach that integrates text mining, knowledge graphs and ontologies to discover semantic associations between COVID-19 and chronic disease concepts that were represented as a complex disease knowledge network that can be queried to extract plausible mechanisms by which COVID-19 may be exacerbated by underlying chronic conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
30. Relation path feature embedding based convolutional neural network method for drug discovery
- Author
-
Di Zhao, Jian Wang, Shengtian Sang, Hongfei Lin, Jiabin Wen, and Chunmei Yang
- Subjects
Literature-based discovery ,Drug discovery ,Knowledge graph ,Path ranking algorithm ,Convolutional neural network ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background Drug development is an expensive and time-consuming process. Literature-based discovery has played a critical role in drug development and may be a supplementary method to help scientists speed up the discovery of drugs. Methods Here, we propose a relation path features embedding based convolutional neural network model with attention mechanism for drug discovery from literature, which we denote as PACNN. First, we use predications from biomedical abstracts to construct a biomedical knowledge graph, and then apply a path ranking algorithm to extract drug-disease relation path features on the biomedical knowledge graph. After that, we use these drug-disease relation features to train a convolutional neural network model which combined with the attention mechanism. Finally, we employ the trained models to mine drugs for treating diseases. Results The experiment shows that the proposed model achieved promising results, comparing to several random walk algorithms. Conclusions In this paper, we propose a relation path features embedding based convolutional neural network with attention mechanism for discovering potential drugs from literature. Our method could be an auxiliary method for drug discovery, which can speed up the discovery of new drugs for the incurable diseases.
- Published
- 2019
- Full Text
- View/download PDF
31. Transducer Cascades for Biological Literature-Based Discovery
- Author
-
Denis Maurel, Sandy Chéry, Nicole Bidoit, Philippe Chatalic, Aziza Filali, Christine Froidevaux, and Anne Poupon
- Subjects
literature-based discovery ,Finite State Methods ,transducer cascades ,Unitex ,database design ,automated deductive reasoning ,Information technology ,T58.5-58.64 - Abstract
G protein-coupled receptors (GPCRs) control the response of cells to many signals, and as such, are involved in most cellular processes. As membrane receptors, they are accessible at the surface of the cell. GPCRs are also the largest family of membrane receptors, with more than 800 representatives in mammal genomes. For this reason, they are ideal targets for drugs. Although about one third of approved drugs target GPCRs, only about 16% of GPCRs are targeted by drugs. One of the difficulties comes from the lack of knowledge on the intra-cellular events triggered by these molecules. In the last two decades, scientists have started mapping the signaling networks triggered by GPCRs. However, it soon appeared that the system is very complex, which led to the publication of more than 320,000 scientific papers. Clearly, a human cannot take into account such massive sources of information. These papers represent a mine of information about both ontological knowledge and experimental results related to GPCRs, which have to be exploited in order to build signaling networks. The ABLISS project aims at the automatic building of GPCRs networks using automated deductive reasoning, allowing to integrate all available data. Therefore, we processed the automatic extraction of network information from the literature using Natural Language Processing (NLP). We mainly focused on the experimental results about GPCRs reported in the scientific papers, as so far there is no source gathering all these experimental results. We designed a relational database in order to make them available to the scientific community later. After introducing the more general objectives of the ABLISS project, we describe the formalism in detail. We then explain the NLP program using the finite state methods (Unitex graph cascades) we implemented and discuss the extracted facts obtained. Finally, we present the design of the relational database that stores the facts extracted from the selected papers.
- Published
- 2022
- Full Text
- View/download PDF
32. Investigating the role of interleukin-1 beta and glutamate in inflammatory bowel disease and epilepsy using discovery browsing
- Author
-
Thomas C. Rindflesch, Catherine L. Blake, Michael J. Cairelli, Marcelo Fiszman, Caroline J. Zeiss, and Halil Kilicoglu
- Subjects
Literature-based discovery ,Discovery browsing ,Epilepsy ,Inflammatory bowel disease ,Interleukin-1 beta ,Glutamate ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background Structured electronic health records are a rich resource for identifying novel correlations, such as co-morbidities and adverse drug reactions. For drug development and better understanding of biomedical phenomena, such correlations need to be supported by viable hypotheses about the mechanisms involved, which can then form the basis of experimental investigations. Methods In this study, we demonstrate the use of discovery browsing, a literature-based discovery method, to generate plausible hypotheses elucidating correlations identified from structured clinical data. The method is supported by Semantic MEDLINE web application, which pinpoints interesting concepts and relevant MEDLINE citations, which are used to build a coherent hypothesis. Results Discovery browsing revealed a plausible explanation for the correlation between epilepsy and inflammatory bowel disease that was found in an earlier population study. The generated hypothesis involves interleukin-1 beta (IL-1 beta) and glutamate, and suggests that IL-1 beta influence on glutamate levels is involved in the etiology of both epilepsy and inflammatory bowel disease. Conclusions The approach presented in this paper can supplement population-based correlation studies by enabling the scientist to identify literature that may justify the novel patterns identified in such studies and can underpin basic biomedical research that can lead to improved treatments and better healthcare outcomes.
- Published
- 2018
- Full Text
- View/download PDF
33. Scientometric analysis and knowledge mapping of literature-based discovery (1986–2020).
- Author
-
Kastrin, Andrej and Hristovski, Dimitar
- Abstract
Literature-based discovery (LBD) aims to discover valuable latent relationships between disparate sets of literatures. This paper presents the first inclusive scientometric overview of LBD research. We utilize a comprehensive scientometric approach incorporating CiteSpace to systematically analyze the literature on LBD from the last four decades (1986–2020). After manual cleaning, we have retrieved a total of 409 documents from six bibliographic databases and two preprint servers. The 35 years' history of LBD could be partitioned into three phases according to the published papers per year: incubation (1986–2003), developing (2004–2008), and mature phase (2009–2020). The annual production of publications follows Price's law. The co-authorship network exhibits many subnetworks, indicating that LBD research is composed of many small and medium-sized groups with little collaboration among them. Science mapping reveals that mainstream research in LBD has shifted from baseline co-occurrence approaches to semantic-based methods at the beginning of the new millennium. In the last decade, we can observe the leaning of LBD towards modern network science ideas. In an applied sense, the LBD is increasingly used in predicting adverse drug reactions and drug repurposing. Besides theoretical considerations, the researchers have put a lot of effort into the development of Web-based LBD applications. Nowadays, LBD is becoming increasingly interdisciplinary and involves methods from information science, scientometrics, and machine learning. Unfortunately, LBD is mainly limited to the biomedical domain. The cascading citation expansion announces deep learning and explainable artificial intelligence as emerging topics in LBD. The results indicate that LBD is still growing and evolving. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
34. Finding conflicting statements in the biomedical literature
- Author
-
Sarafraz, Farzaneh and Nenadic, Goran
- Subjects
006.312 ,Text Mining ,Natural Language Processing ,Information Extraction ,Biomedical Text Mining ,Bioinformatics ,Negation ,Contradiction ,Contrast ,Literature-based discovery ,Molecular Event Extraction - Abstract
The main archive of life sciences literature currently contains more than 18,000,000 references, and it is virtually impossible for any human to stay up-to-date with this large number of papers, even in a specific sub-domain. Not every fact that is reported in the literature is novel and distinct. Scientists report repeat experiments, or refer to previous findings. Given the large number of publications, it is not surprising that information on certain topics is repeated over a number of publications. From consensus to contradiction, there are all shades of agreement between the claimed facts in the literature, and considering the volume of the corpus, conflicting findings are not unlikely. Finding such claims is particularly interesting for scientists, as they can present opportunities for knowledge consolidation and future investigations. In this thesis we present a method to extract and contextualise statements about molecular events as expressed in the biomedical literature, and to find those that potentially conflict each other. The approach uses a system that detects event negations and speculation, and combines those with contextual features (e.g. type of event, species, and anatomical location) to build a representational model for establishing relations between different biological events, including relations concerning conflicts. In the detection of negations and speculations, rich lexical, syntactic, and semantic features have been exploited, including the syntactic command relation. Different parts of the proposed method have been evaluated in a context of the BioNLP 09 challenge. The average F-measures for event negation and speculation detection were 63% (with precision of 88%) and 48% (with precision of 64%) respectively. An analysis of a set of 50 extracted event pairs identified as potentially conflicting revealed that 32 of them showed some degree of conflict (64%); 10 event pairs (20%) needed a more complex biological interpretation to decide whether there was a conflict. We also provide an open source integrated text mining framework for extracting events and their context on a large-scale basis using a pipeline of tools that are available or have been developed as part of this research, along with 72,314 potentially conflicting molecular event pairs that have been generated by mining the entire body of accessible biomedical literature. We conclude that, whilst automated conflict mining would need more comprehensive context extraction, it is feasible to provide a support environment for biologists to browse potential conflicting statements and facilitate data and knowledge consolidation.
- Published
- 2012
35. Literature-based discovery: a classical approach of information science
- Author
-
Wang Lin, Lin Wei, and Zhao Miaomiao
- Subjects
literature-based discovery ,knowledge discovery ,arrowsmith ,swanson ,d.r ,Information technology ,T58.5-58.64 - Abstract
The implicit links between documents are often of great significance to scientific discovery. This paper briefly reviews the background, basic ideas, and the related research tool Arrowsmith system of the literature-based discovery method founded by Professor Don. R. Swanson of the University of Chicago. It represents a classical research approach in information science. It is believed that this method has opened up a broader research field for information retrieval and knowledge discovery, provided new research ideas for medical informatics, and injected new vitality into information science.
- Published
- 2022
- Full Text
- View/download PDF
36. Bisociative Literature-Based Discovery: Lessons Learned and New Word Embedding Approach.
- Author
-
Lavrač, Nada, Martinc, Matej, Pollak, Senja, Pompe Novak, Maruša, and Cestnik, Bojan
- Subjects
- *
LEARNING by discovery , *NEW words , *SCIENTIFIC literature , *TECHNICAL literature - Abstract
The field of bisociative literature-based discovery aims at mining scientific literature to reveal yet uncovered connections between different fields of specialization. This paper outlines several outlier-based literature mining approaches to bridging term detection and the lessons learned from selected biomedical literature-based discovery applications. The paper addresses also new prospects in bisociative literature-based discovery, proposing an advanced embeddings-based technology for cross-domain literature mining. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
37. A Systematic Review on Literature-based Discovery: General Overview, Methodology, & Statistical Analysis.
- Author
-
THILAKARATNE, MENASHA, FALKNER, KATRINA, and ATAPATTU, THUSHARI
- Subjects
- *
SCIENTIFIC literature , *STATISTICS , *RESEARCH & development , *KNOWLEDGE acquisition (Expert systems) , *TEXT mining , *WORKFLOW - Abstract
The vast nature of scientific publications brings out the importance of Literature-Based Discovery (LBD) research that is highly beneficial to accelerate knowledge acquisition and the research development process. LBD is a knowledge discovery workflow that automatically detects significant, implicit knowledge associations hidden in fragmented knowledge areas by analysing existing scientific literature. Therefore, the LBD output not only assists in formulating scientifically sensible, novel research hypotheses but also encourages the development of cross-disciplinary research. In this systematic review, we provide an in-depth analysis of the computational techniques used in the LBD process using a novel, up-to-date, and detailed classification. Moreover, we also summarise the key milestones of the discipline through a timeline of topics. To provide a general overview of the discipline, the review outlines LBD validation checks, major LBD tools, application areas, domains, and generalisability of LBD methodologies. We also outline the insights gathered through our statistical analysis that capture the trends in LBD literature. To conclude, we discuss the prevailing research deficiencies in the discipline by highlighting the challenges and opportunities of future LBD research. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
38. A systematic review on literature-based discovery workflow
- Author
-
Menasha Thilakaratne, Katrina Falkner, and Thushari Atapattu
- Subjects
Literature-Based Discovery ,Literature Mining ,Knowledge Discovery ,Systematic Review ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
As scientific publication rates increase, knowledge acquisition and the research development process have become more complex and time-consuming. Literature-Based Discovery (LBD), supporting automated knowledge discovery, helps facilitate this process by eliciting novel knowledge by analysing existing scientific literature. This systematic review provides a comprehensive overview of the LBD workflow by answering nine research questions related to the major components of the LBD workflow (i.e., input, process, output, and evaluation). With regards to the input component, we discuss the data types and data sources used in the literature. The process component presents filtering techniques, ranking/thresholding techniques, domains, generalisability levels, and resources. Subsequently, the output component focuses on the visualisation techniques used in LBD discipline. As for the evaluation component, we outline the evaluation techniques, their generalisability, and the quantitative measures used to validate results. To conclude, we summarise the findings of the review for each component by highlighting the possible future research directions.
- Published
- 2019
- Full Text
- View/download PDF
39. SemaTyP: a knowledge graph based literature mining method for drug discovery
- Author
-
Shengtian Sang, Zhihao Yang, Lei Wang, Xiaoxia Liu, Hongfei Lin, and Jian Wang
- Subjects
Literature-based discovery ,Knowledge graph ,Drug discovery ,Literature mining ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries. Methods Here, we propose a biomedical knowledge graph-based drug discovery method called SemaTyP, which discovers candidate drugs for diseases by mining published biomedical literature. We first construct a biomedical knowledge graph with the relations extracted from biomedical abstracts, then a logistic regression model is trained by learning the semantic types of paths of known drug therapies’ existing in the biomedical knowledge graph, finally the learned model is used to discover drug therapies for new diseases. Results The experimental results show that our method could not only effectively discover new drug therapies for new diseases, but also could provide the potential mechanism of action of the candidate drugs. Conclusions In this paper we propose a novel knowledge graph based literature mining method for drug discovery. It could be a supplementary method for current drug discovery methods.
- Published
- 2018
- Full Text
- View/download PDF
40. Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches
- Author
-
Gamal Crichton, Yufan Guo, Sampo Pyysalo, and Anna Korhonen
- Subjects
Link prediction ,Neural networks ,Data mining ,Literature-based discovery ,Drug-target interaction ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Link prediction in biomedical graphs has several important applications including predicting Drug-Target Interactions (DTI), Protein-Protein Interaction (PPI) prediction and Literature-Based Discovery (LBD). It can be done using a classifier to output the probability of link formation between nodes. Recently several works have used neural networks to create node representations which allow rich inputs to neural classifiers. Preliminary works were done on this and report promising results. However they did not use realistic settings like time-slicing, evaluate performances with comprehensive metrics or explain when or why neural network methods outperform. We investigated how inputs from four node representation algorithms affect performance of a neural link predictor on random- and time-sliced biomedical graphs of real-world sizes (∼ 6 million edges) containing information relevant to DTI, PPI and LBD. We compared the performance of the neural link predictor to those of established baselines and report performance across five metrics. Results In random- and time-sliced experiments when the neural network methods were able to learn good node representations and there was a negligible amount of disconnected nodes, those approaches outperformed the baselines. In the smallest graph (∼ 15,000 edges) and in larger graphs with approximately 14% disconnected nodes, baselines such as Common Neighbours proved a justifiable choice for link prediction. At low recall levels (∼ 0.3) the approaches were mostly equal, but at higher recall levels across all nodes and average performance at individual nodes, neural network approaches were superior. Analysis showed that neural network methods performed well on links between nodes with no previous common neighbours; potentially the most interesting links. Additionally, while neural network methods benefit from large amounts of data, they require considerable amounts of computational resources to utilise them. Conclusions Our results indicate that when there is enough data for the neural network methods to use and there are a negligible amount of disconnected nodes, those approaches outperform the baselines. At low recall levels the approaches are mostly equal but at higher recall levels and average performance at individual nodes, neural network approaches are superior. Performance at nodes without common neighbours which indicate more unexpected and perhaps more useful links account for this.
- Published
- 2018
- Full Text
- View/download PDF
41. Rediscovering Don Swanson:The Past, Present and Future of Literature-based Discovery
- Author
-
Smalheiser Neil R.
- Subjects
literature-based discovery ,biography ,text mining ,knowledge discovery in databases ,implicit information ,information science ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don’s contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.
- Published
- 2017
- Full Text
- View/download PDF
42. Empowering Bridging Term Discovery for Cross-Domain Literature Mining in the TextFlows Platform
- Author
-
Perovšek, Matic, Juršič, Matjaž, Cestnik, Bojan, Lavrač, Nada, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Holzinger, Andreas, editor
- Published
- 2016
- Full Text
- View/download PDF
43. Extending the boundaries of cancer therapeutic complexity with literature text mining.
- Author
-
Niezni, Danna, Taub-Tabib, Hillel, Harris, Yuval, Sason, Hagit, Amrusi, Yakir, Meron-Azagury, Dana, Avrashami, Maytal, Launer-Wachs, Shaked, Borchardt, Jon, Kusold, M., Tiktinsky, Aryeh, Hope, Tom, Goldberg, Yoav, and Shamay, Yosi
- Abstract
Drug combination therapy is a main pillar of cancer therapy. As the number of possible drug candidates for combinations grows, the development of optimal high complexity combination therapies (involving 4 or more drugs per treatment) such as RCHOP-I and FOLFIRINOX becomes increasingly challenging due to combinatorial explosion. In this paper, we propose a text mining (TM) based tool and workflow for rapid generation of high complexity combination treatments (HCCT) in order to extend the boundaries of complexity in cancer treatments. Our primary objectives were: (1) Characterize the existing limitations in combination therapy; (2) Develop and introduce the Plan Builder (PB) to utilize existing literature for drug combination effectively; (3) Evaluate PB's potential in accelerating the development of HCCT plans. Our results demonstrate that researchers and experts using PB are able to create HCCT plans at much greater speed and quality compared to conventional methods. By releasing PB, we hope to enable more researchers to engage with HCCT planning and demonstrate its clinical efficacy. [Display omitted] • In cancer therapy complex drug combination is crucial and might be the only option. • Current complexity space was studied, the current clinical limit is six drugs. • A novel text data mining-tool was developed to generate of complex treatment plans. • The tool's performance was validated and compared to the standard of care. • A comparison of the tool and human designed plans shows it can outperforms humans. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. Hypothesis Discovery Exploiting Closed Chains of Relations
- Author
-
Seki, Kazuhiro, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Hameurlain, Abdelkader, editor, Küng, Josef, editor, and Wagner, Roland, editor
- Published
- 2015
- Full Text
- View/download PDF
45. Predicting Future Links Between Disjoint Research Areas Using Heterogeneous Bibliographic Information Network
- Author
-
Sebastian, Yakub, Siew, Eu-Gene, Orimaye, Sylvester Olubolu, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Cao, Tru, editor, Lim, Ee-Peng, editor, Zhou, Zhi-Hua, editor, Ho, Tu-Bao, editor, Cheung, David, editor, and Motoda, Hiroshi, editor
- Published
- 2015
- Full Text
- View/download PDF
46. Link Prediction on the Semantic MEDLINE Network : An Approach to Literature-Based Discovery
- Author
-
Kastrin, Andrej, Rindflesch, Thomas C., Hristovski, Dimitar, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Džeroski, Sašo, editor, Panov, Panče, editor, Kocev, Dragi, editor, and Todorovski, Ljupčo, editor
- Published
- 2014
- Full Text
- View/download PDF
47. Mining Biomedical Literature and Ontologies for Drug Repositioning Discovery
- Author
-
Wei, Chih-Ping, Chen, Kuei-An, Chen, Lien-Chin, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Tanaka, Yuzuru, editor, Wahlster, Wolfgang, editor, Siekmann, Jörg, editor, Tseng, Vincent S., editor, Ho, Tu Bao, editor, Zhou, Zhi-Hua, editor, Chen, Arbee L. P., editor, and Kao, Hung-Yu, editor
- Published
- 2014
- Full Text
- View/download PDF
48. Impact factor correlations with Scimago Journal Rank, Source Normalized Impact per Paper, Eigenfactor Score, and the CiteScore in Radiology, Nuclear Medicine & Medical Imaging journals.
- Author
-
Villaseñor-Almaraz, Moises, Islas-Serrano, Juan, Murata, Chiharu, and Roldan-Valadez, Ernesto
- Abstract
Introduction: In the last decade, several journal's editors decided to publish alternative bibliometric indices parallel to the impact factor (IF): Scimago Journal Rank (SJR), Source Normalized Impact per Paper (SNIP), Eigenfactor Score (ES) and CiteScore™ (CiteScore); however, there is scarce information about the correlations among them. In this study, we aimed to evaluate the associations between this bibliometrics in the Radiology, Nuclear Medicine & Medical Imaging category of the Web of Knowledge. We hypothesized the IF did not show the best correlation with other metrics. Methods: Retrospective study. We used bibliometrics recorded from the 2017 publicly available versions of the Journal Citation Reports (JCR), SJR (www.scimagojr.com), SNIP (www.journalindicators.com), and CiteScore (www.scopus.com); we also included the Total Cites. We measured the correlations using the Spearman correlation coefficients (R
S ) for all combinations of the bivariate pair, performed pairwise comparisons of the RS values, and calculated the coefficients of determination. We also tested the statistical significance of the difference between r coefficients between groups. All analyses were conducted with the JMP Pro software. Results: The stronger bivariate correlations were represented by the ES↔Total Cites RS = 0.968, p < 0.001, R2 = 0.937; and the CiteScore↔SJR RS = 0.911, p < 0.001, R2 = 0.829. From 105 possible combinations of pairwise comparisons, 38 depicted a p value > 0.050 which would suggest interchangeability among bivariate correlations. Conclusions: Our findings support our hypothesis that the IF does not show the best correlation between other metrics. Radiologists, interventional radiologist, or nuclear medicine doctors should have a clear understanding of the associations among the journal's bibliometrics for their decision-making during the manuscript submission phase. [ABSTRACT FROM AUTHOR]- Published
- 2019
- Full Text
- View/download PDF
49. Associating biological context with protein-protein interactions through text mining at PubMed scale.
- Author
-
Sosa, Daniel N., Hintzen, Rogier, Xiong, Betty, de Giorgio, Alex, Fauqueur, Julien, Davies, Mark, Lever, Jake, and Altman, Russ B.
- Abstract
Inferring knowledge from known relationships between drugs, proteins, genes, and diseases has great potential for clinical impact, such as predicting which existing drugs could be repurposed to treat rare diseases. Incorporating key biological context such as cell type or tissue of action into representations of extracted biomedical knowledge is essential for principled pharmacological discovery. Existing global, literature-derived knowledge graphs of interactions between drugs, proteins, genes, and diseases lack this essential information. In this study, we frame the task of associating biological context with protein-protein interactions extracted from text as a classification task using syntactic, semantic, and novel meta-discourse features. We introduce the Insider corpora, which are automatically generated PubMed-scale corpora for training classifiers for the context association task. These corpora are created by searching for precise syntactic cues of cell type and tissue relevancy to extracted regulatory relations. We report F1 scores of 0.955 and 0.862 for identifying relevant cell types and tissues, respectively, for our identified relations. By classifying with this framework, we demonstrate that the problem of context association can be addressed using intuitive, interpretable features. We demonstrate the potential of this approach to enrich text-derived knowledge bases with biological detail by incorporating cell type context into a protein-protein network for dengue fever. [Display omitted] • We present a new feature-based classifier for text-mined context-PPI association. • Context relevancy to main findings leads to interpretable classification. • We develop precise datasets based on linguistic cues for the task. • Models trained with cell type contexts can be transferred to tissue contexts. • We demonstrate our approach to augment a knowledge base of dengue PPI. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
50. Graph embedding-based link prediction for literature-based discovery in Alzheimer's Disease.
- Author
-
Pu, Yiyuan, Beck, Daniel, and Verspoor, Karin
- Abstract
We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology. We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed. We constructed an AD corpus of over 16 k papers published in 1977–2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼ 11 k nodes and ∼ 394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation. Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases. Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd. [Display omitted] • We adapted Literature-based Discovery to the context of Alzheimer's Disease. • The first large-scale resource for disease link prediction with time-slicing is released. • A large-scale study of employing graph embedding-based methods is presented. • Interpretation impacts of varying prediction window lengths in time-slicing are explored. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.