Back to Search
Start Over
Quantifying risk associated with clinical trial termination: A text mining approach
- Source :
- Information Processing & Management. 56:516-525
- Publication Year :
- 2019
- Publisher :
- Elsevier BV, 2019.
-
Abstract
- Clinical trials that terminate prematurely without reaching conclusions raise financial, ethical, and scientific concerns. Scientific studies in all disciplines are initiated with extensive planning and deliberation, often by a team of highly trained scientists. To assure that the quality, integrity, and feasibility of funded research projects meet the required standards, research-funding agencies such as the National Institute of Health and the National Science Foundation, pass proposed research plans through a rigorous peer review process before making funding decisions. Yet, some study proposals successfully pass through all the rigorous scrutiny of the scientific peer review process, but the proposed investigations end up being terminated before yielding results. This study demonstrates an algorithm that quantifies the risk associated with a study being terminated based on the analysis of patterns in the language used to describe the study prior to its implementation. To quantify the risk of termination, we use data from the clinicialTrials.gov repository, from which we extracted structured data that flagged study characteristics, and unstructured text data that described the study goals, objectives and methods in a standard narrative form. We propose an algorithm to extract distinctive words from this unstructured text data that are most frequently used to describe trials that were completed successfully vs. those that were terminated. Binary variables indicating the presence of these distinctive words in trial proposals are used as input in a random forest, along with standard structured data fields. In this paper, we demonstrate that this combined modeling approach yields robust predictive probabilities in terms of both sensitivity (0.56) and specificity (0.71), relative to a model that utilizes the structured data alone (sensitivity = 0.03, specificity = 0.97). These predictive probabilities can be applied to make judgements about a trial's feasibility using information that is available before any funding is granted.
- Subjects :
- Scrutiny
Computer science
media_common.quotation_subject
Foundation (evidence)
Library and Information Sciences
Management Science and Operations Research
Deliberation
Data science
Computer Science Applications
Random forest
Clinical trial
03 medical and health sciences
0302 clinical medicine
Media Technology
Narrative
Quality (business)
030212 general & internal medicine
030217 neurology & neurosurgery
Information Systems
media_common
Subjects
Details
- ISSN :
- 03064573
- Volume :
- 56
- Database :
- OpenAIRE
- Journal :
- Information Processing & Management
- Accession number :
- edsair.doi...........35dedbd6774d0e858fa62bddfeb812be