3,891 results
Search Results
2. A Graph-Based Topic Modeling Approach to Detection of Irrelevant Citations.
- Author
-
Pham, Phu, Le, Hieu, Tam, Nguyen Thanh, and Tran, Quang-Dieu
- Subjects
NATURAL language processing ,DEEP learning ,MACHINE learning ,INFORMATION retrieval - Abstract
In the recent years, the academic paper influence analysis has been widely studied due to its potential applications in the multiple areas of science information metric and retrieval. By identifying the academic influence of papers, authors, etc., we can directly support researchers to easily reach academic papers. These recommended candidate papers are not only highly relevant with their desired research topics but also highly-attended by the research community within these topics. For very recent years, the rapid developments of academic networks, like Google Scholar, Research Gate, CiteSeerX, etc., have significantly boosted the number of new published papers annually. It also helps to strengthen the borderless cooperation between researchers who are interested on the same research topics. However, these current academic networks still lack the capabilities of provisioning researchers deeper into most-influenced papers. They also largely ignore quite/irrelevant papers, which are not fully related with their current interest topics. Moreover, the distributions of topics within these academic papers are considered as varying and it is difficult to extract the main concentrated topics in these papers. Thus, it leads to challenges for researchers to find their appropriated/high-qualified reference resources while doing researches. To overcome this limitation, in this paper, we proposed a novel approach of paper influence analysis through their content-based and citation relationship-based analyses within the biographical network. In order to effectively extract the topic-based relevance from papers, we apply the integrated graph-based citation relationship analysis with topic modeling approach to automatically learn the distributions of keyword-based labeled topics in forms of unsupervised learning approach, named as TopCite. Then, we base on the constructed graph-based paper–topic structure to identify their relevancy levels. Upon the identified relevancy levels between papers, we can support for improving the accuracy performance of other bibliographic network mining tasks, such as paper similarity measurement, recommendation, etc. Extensive experiments in real-world AMiner bibliographic dataset demonstrate the effectiveness of our proposed ideas in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. A Review of Machine Learning Algorithms for Text Classification
- Author
-
Li, Ruiguang, Liu, Ming, Xu, Dawei, Gao, Jiaqi, Wu, Fudong, Zhu, Liehuang, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Lu, Wei, editor, Zhang, Yuqing, editor, Wen, Weiping, editor, Yan, Hanbing, editor, and Li, Chao, editor
- Published
- 2022
- Full Text
- View/download PDF
4. Heuristicas para Data Augmentation en NLP: Aplicacion a Revisiones de Articulos Cientificos/Heuristics for Data Augmentation in NLP: Application to scientific paper reviews
- Author
-
Acosta, Rubén Sánchez, Villegas, Claudio Meneses, and Norambuena, Brian Keith
- Published
- 2019
- Full Text
- View/download PDF
5. A Machine Learning Model to Predict Citation Counts of Scientific Papers in Otology Field.
- Author
-
Alohali, Yousef A., Fayed, Mahmoud S., Mesallam, Tamer, Abdelsamad, Yassin, Almuhawas, Fida, and Hagr, Abdulrahman
- Subjects
DECISION trees ,SERIAL publications ,NATURAL language processing ,BIBLIOMETRICS ,MACHINE learning ,REGRESSION analysis ,RANDOM forest algorithms ,CITATION analysis ,DESCRIPTIVE statistics ,PREDICTION models ,ARTIFICIAL neural networks ,MEDICAL research ,MEDICAL specialties & specialists ,ALGORITHMS - Abstract
One of the most widely used measures of scientific impact is the number of citations. However, due to its heavy-tailed distribution, citations are fundamentally difficult to predict but can be improved. This study was aimed at investigating the factors and parts influencing the citation number of a scientific paper in the otology field. Therefore, this work proposes a new solution that utilizes machine learning and natural language processing to process English text and provides a paper citation as the predicted results. Different algorithms are implemented in this solution, such as linear regression, boosted decision tree, decision forest, and neural networks. The application of neural network regression revealed that papers' abstracts have more influence on the citation numbers of otological articles. This new solution has been developed in visual programming using Microsoft Azure machine learning at the back end and Programming Without Coding Technology at the front end. We recommend using machine learning models to improve the abstracts of research articles to get more citations. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Editorial: Digital Linguistic Biomarkers: Beyond Paper and Pencil Tests
- Author
-
Gloria Gagliardi, Dimitrios Kokkinakis, and Jon Andoni Duñabeitia
- Subjects
linguistic-based diagnosis ,natural language processing ,clinical linguistics ,computational linguistics ,speech processing and recognition ,machine learning ,Psychology ,BF1-990 - Published
- 2021
- Full Text
- View/download PDF
7. PaperNet: A Dataset and Benchmark for Fine-Grained Paper Classification.
- Author
-
Yue, Tan, Li, Yong, Shi, Xuzhao, Qin, Jiedong, Fan, Zijiao, and Hu, Zonghai
- Subjects
NATURAL language processing ,COMPUTER vision ,VISUAL fields ,CLASSIFICATION - Abstract
Document classification is an important area in Natural Language Processing (NLP). Because a huge amount of scientific papers have been published at an accelerating rate, it is beneficial to carry out intelligent paper classifications, especially fine-grained classification for researchers. However, a public scientific paper dataset for fine-grained classification is still lacking, so the existing document classification methods have not been put to the test. To fill this vacancy, we designed and collected the PaperNet-Dataset that consists of multi-modal data (texts and figures). PaperNet 1.0 version contains hierarchical categories of papers in the fields of computer vision (CV) and NLP, 2 coarse-grained and 20 fine-grained (7 in CV and 13 in NLP). We ran current mainstream models on the PaperNet-Dataset, along with a multi-modal method that we propose. Interestingly, none of these methods reaches an accuracy of 80% in fine-grained classification, showing plenty of room for improvement. We hope that PaperNet-Dataset will inspire more work in this challenging area. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Factors affecting value co-creation through artificial intelligence in tourism: a general literature review
- Author
-
Solakis, Konstantinos, Katsoni, Vicky, Mahmoud, Ali B., and Grigoriou, Nicholas
- Published
- 2024
- Full Text
- View/download PDF
9. Automatic extraction of significant terms from the title and abstract of scientific papers using the machine learning algorithm: A multiple module approach.
- Author
-
Mukherjee, Bhaskar and Majhi, Debasis
- Subjects
- *
MACHINE learning , *NATURAL language processing , *TERMS & phrases , *KEYWORDS - Abstract
Keyword extraction is the task of identifying important terms or phrase that are most representative of the source document. Although the process of automatic extraction of keywords from title is an old method, it was mainly for extraction from a single web document. Our approach differs from previous research works on keyword extraction in several aspects. For those who are non-expert of the scientific fields, understating scientific research trends is difficult. The purpose of this study is to develop an automatic method of obtaining overviews of a scientific field for non-experts by capturing research trends. This empirical study excavates significant term extraction using Natural Language Processing (NLP) tools. More than 15000 titles saved in a .csv file was our dataset and scripts written in Python were our process to compare how far significant terms of scientific title corpus are similar or different to the terms available in the abstract of that same scientific article corpus. A light-weight unsupervised title extractor, Yet Another Keyword Extractor (YAKE) was used to extract the results. Based on our analysis, it can be concluded that these algorithms can be used for other fields too by the non-experts of that subject field to perform automatic extraction of significant words and understanding trends. Our algorithm could be a solution to reduce the labour-intensive manual indexing process. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review.
- Author
-
Fraile Navarro D, Ijaz K, Rezazadegan D, Rahimi-Ardabili H, Dras M, Coiera E, and Berkovsky S
- Subjects
- Humans, Language, Information Storage and Retrieval, PubMed, Natural Language Processing, Machine Learning
- Abstract
Background: Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments., Methods: We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries)., Results: We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were "problem", "test" and "treatment". 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool., Discussion: Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2023 Elsevier B.V. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
11. Machine Learning-based Analysis of Publications Funded by the National Institutes of Health's Initial COVID-19 Pandemic Response.
- Author
-
Chandrabhatla, Anirudha S, Narahari, Adishesh K, Horgan, Taylor M, Patel, Paranjay D, Sturek, Jeffrey M, Davis, Claire L, Jackson, Patrick E H, and Bell, Taison D
- Subjects
COVID-19 pandemic ,MEDICAL subject headings ,HEART failure ,NATURAL language processing ,DATABASES - Abstract
Background The National Institutes of Health (NIH) mobilized more than $4 billion in extramural funding for the COVID-19 pandemic. Assessing the research output from this effort is crucial to understanding how the scientific community leveraged federal funding and responded to this public health crisis. Methods NIH-funded COVID-19 grants awarded between January 2020 and December 2021 were identified from NIH Research Portfolio Online Reporting Tools Expenditures and Results using the "COVID-19 Response" filter. PubMed identifications of publications under these grants were collected and the NIH iCite tool was used to determine citation counts and focus (eg, clinical, animal). iCite and the NIH's LitCOVID database were used to identify publications directly related to COVID-19. Publication titles and Medical Subject Heading terms were used as inputs to a machine learning–based model built to identify common topics/themes within the publications. Results and Conclusions We evaluated 2401 grants that resulted in 14 654 publications. The majority of these papers were published in peer-reviewed journals, though 483 were published to preprint servers. In total, 2764 (19%) papers were directly related to COVID-19 and generated 252 029 citations. These papers were mostly clinically focused (62%), followed by cell/molecular (32%), and animal focused (6%). Roughly 60% of preprint publications were cell/molecular-focused, compared with 26% of nonpreprint publications. The machine learning–based model identified the top 3 research topics to be clinical trials and outcomes research (8.5% of papers), coronavirus-related heart and lung damage (7.3%), and COVID-19 transmission/epidemiology (7.2%). This study provides key insights regarding how researchers leveraged federal funding to study the COVID-19 pandemic during its initial phase. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Predicting stock market using natural language processing
- Author
-
Puh, Karlo and Bagić Babac, Marina
- Published
- 2023
- Full Text
- View/download PDF
13. Editorial: Digital Linguistic Biomarkers: Beyond Paper and Pencil Tests
- Author
-
Gagliardi, Gloria, Kokkinakis, Dimitrios, and Duñabeitia, Jon Andoni
- Subjects
linguistic-based diagnosis ,computational linguistics ,Editorial ,machine learning ,speech processing and recognition ,Psychology ,linguistic biomarkers ,computer-aided diagnosis ,natural language processing ,clinical linguistics - Published
- 2021
14. GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications.
- Author
-
Ekolle, Zie Eya and Kohno, Ryuji
- Subjects
CHATBOTS ,NATURAL language processing ,TEXT messages ,SPAM email ,CONFERENCE papers - Abstract
The use of generative learning models in natural language processing (NLP) has significantly contributed to the advancement of natural language applications, such as sentimental analysis, topic modeling, text classification, chatbots, and spam filtering. With a large amount of text generated each day from different sources, such as web-pages, blogs, emails, social media, and articles, one of the most common tasks in NLP is the classification of a text corpus. This is important in many institutions for planning, decision-making, and creating archives of their projects. Many algorithms exist to automate text classification tasks but the most intriguing of them is that which also learns these tasks automatically. In this study, we present a new model to infer and learn from data using probabilistic logic and apply it to text classification. This model, called GenCo, is a multi-input single-output (MISO) learning model that uses a collaboration of partial classifications to generate the desired output. It provides a heterogeneity measure to explain its classification results and enables a reduction in the curse of dimensionality in text classification. Experiments with the model were carried out on the Twitter US Airline dataset, the Conference Paper dataset, and the SMS Spam dataset, outperforming baseline models with 98.40%, 89.90%, and 99.26% accuracy, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Automatically classifying the evidence type of drug-drug interaction research papers as a step toward computer supported evidence curation
- Author
-
Linh, Hoang, Richard D, Boyce, Nigel, Bosch, Britney, Stottlemyer, Mathias, Brochhausen, and Jodi, Schneider
- Subjects
Machine Learning ,Databases, Factual ,Computers ,Publications ,Data Mining ,Humans ,Drug Interactions ,Pilot Projects ,Articles ,Natural Language Processing - Abstract
A longstanding issue with knowledge bases that discuss drug-drug interactions (DDIs) is that they are inconsistent with one another. Computerized support might help experts be more objective in assessing DDI evidence. A requirement for such systems is accurate automatic classification of evidence types. In this pilot study, we developed a hierarchical classifier to classify clinical DDI studies into formally defined evidence types. The area under the ROC curve for sub-classifiers in the ensemble ranged from 0.78 to 0.87. The entire system achieved an F1 of 0.83 and 0.63 on two held-out datasets, the latter consisting focused on completely novel drugs from what the system was trained on. The results suggest that it is feasible to accurately automate the classification of a sub-set of DDI evidence types and that the hierarchical approach shows promise. Future work will test more advanced feature engineering techniques while expanding the system to classify a more complex set of evidence types.
- Published
- 2021
16. A Systematic Literature Review for New Technologies in IT Audit.
- Author
-
Tanrıverdi, Nur Sena and Taşkın, Nazım
- Subjects
INFORMATION technology ,MACHINE learning ,AUDITING ,ARTIFICIAL intelligence ,DATA mining ,NATURAL language processing - Abstract
Copyright of Acta Infologica is the property of Acta Infologica and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
17. Scientific papers citation analysis using textual features and SMOTE resampling techniques
- Author
-
Muhammad Umer, Malik Muhammad Saad Missen, Saima Sadiq, Zahid Aslam, Muhammad Abubakar Siddique, Michele Nappi, and Zahid Hameed
- Subjects
Feature engineering ,Computer science ,business.industry ,Citation sentiment analysis ,Sentiment analysis ,TF-IDF ,computer.software_genre ,Artificial Intelligence ,Citation analysis ,Signal Processing ,Classifier (linguistics) ,Pattern recognition (psychology) ,Machine learning ,Feature (machine learning) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Citation ,business ,tf–idf ,computer ,Software ,Natural language processing ,SMOTE - Abstract
Ascertaining the impact of research is significant for the research community and academia of all disciplines. The only prevalent measure associated with the quantification of research quality is the citation-count. Although a number of citations play a significant role in academic research, sometimes citations can be biased or made to discuss only the weaknesses and shortcomings of the research. By considering the sentiment of citations and recognizing patterns in text can aid in understanding the opinion of the peer research community and will also help in quantifying the quality of research articles. Efficient feature representation combined with machine learning classifiers has yielded significant improvement in text classification. However, the effectiveness of such combinations has not been analyzed for citation sentiment analysis. This study aims to investigate pattern recognition using machine learning models in combination with frequency-based and prediction-based feature representation techniques with and without using Synthetic Minority Oversampling Technique (SMOTE) on publicly available citation sentiment dataset. Sentiment of citation instances are classified into positive, negative or neutral. Results indicate that the Extra tree classifier in combination with Term Frequency-Inverse Document Frequency achieved 98.26% accuracy on the SMOTE-balanced dataset.
- Published
- 2021
18. Re-evaluating GPT-4’s bar exam performance
- Author
-
Martínez, Eric
- Published
- 2024
- Full Text
- View/download PDF
19. Triaging Medical Referrals Based on Clinical Prioritisation Criteria Using Machine Learning Techniques.
- Author
-
Wee CK, Zhou X, Sun R, Gururajan R, Tao X, Li Y, and Wee N
- Subjects
- Australia, Referral and Consultation, Triage, Machine Learning, Natural Language Processing
- Abstract
Triaging of medical referrals can be completed using various machine learning techniques, but trained models with historical datasets may not be relevant as the clinical criteria for triaging are regularly updated and changed. This paper proposes the use of machine learning techniques coupled with the clinical prioritisation criteria (CPC) of Queensland (QLD), Australia, to deliver better triaging for referrals in accordance with the CPC's updates. The unique feature of the proposed model is its non-reliance on the past datasets for model training. Medical Natural Language Processing (NLP) was applied in the proposed approach to process the medical referrals, which are unstructured free text. The proposed multiclass classification approach achieved a Micro F 1 score = 0.98. The proposed approach can help in the processing of two million referrals that the QLD health service receives annually; therefore, they can deliver better and more efficient health services.
- Published
- 2022
- Full Text
- View/download PDF
20. Transfer Learning for Radio Frequency Machine Learning: A Taxonomy and Survey.
- Author
-
Wong LJ and Michaels AJ
- Subjects
- Radio Waves, Surveys and Questionnaires, Technology, Machine Learning, Natural Language Processing
- Abstract
Transfer learning is a pervasive technology in computer vision and natural language processing fields, yielding exponential performance improvements by leveraging prior knowledge gained from data with different distributions. However, while recent works seek to mature machine learning and deep learning techniques in applications related to wireless communications, a field loosely termed radio frequency machine learning, few have demonstrated the use of transfer learning techniques for yielding performance gains, improved generalization, or to address concerns of training data costs. With modifications to existing transfer learning taxonomies constructed to support transfer learning in other modalities, this paper presents a tailored taxonomy for radio frequency applications, yielding a consistent framework that can be used to compare and contrast existing and future works. This work offers such a taxonomy, discusses the small body of existing works in transfer learning for radio frequency machine learning, and outlines directions where future research is needed to mature the field.
- Published
- 2022
- Full Text
- View/download PDF
21. System for Semi-Automated Literature Review Based on Machine Learning.
- Author
-
Bacinger, Filip, Boticki, Ivica, and Mlinaric, Danijel
- Subjects
MACHINE learning ,LITERATURE reviews ,NATURAL language processing ,USER interfaces - Abstract
This paper presents the design and implementation of a system for semi-automating the literature review process based on machine learning. By using machine learning algorithms, the system determines whether scientific papers belong to the topic that is being explored as part of the review process. The system's user interface allows the process of creating a literature review to be managed through a series of steps: selecting data sources, building queries and topic searches, displaying the scientific papers found, selecting papers that belong to the set of desired papers, running machine learning algorithms for learning and automated classification, and displaying and exporting the final set of papers. Manual literature reviews are compared with automated reviews, and similarities and differences between the two approaches in terms of duration, accuracy, and ease of use are discussed. This study concludes that the best results in terms of sensitivity and accuracy for the automated literature review process are achieved by using a combined machine learning model, which uses multiple unweighted machine learning models. Cross-testing the models on two alternative datasets revealed an overlap in the machine learning hyperparameters. The stable sensitivity and accuracy in the tests indicate the potential for generalized use in automated literature review. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. PPSGen: Learning-Based Presentation Slides Generation for Academic Papers
- Author
-
Yue Hu and Xiaojun Wan
- Subjects
business.industry ,Computer science ,media_common.quotation_subject ,Machine learning ,computer.software_genre ,Computer Science Applications ,Task (project management) ,Presentation ,Computational Theory and Mathematics ,Artificial intelligence ,business ,computer ,Natural language processing ,Information Systems ,media_common - Abstract
In this paper, we investigate a very challenging task of automatically generating presentation slides for academic papers. The generated presentation slides can be used as drafts to help the presenters prepare their formal slides in a quicker way. A novel system called PPSGen is proposed to address this task. It first employs the regression method to learn the importance scores of the sentences in an academic paper, and then exploits the integer linear programming (ILP) method to generate well-structured slides by selecting and aligning key phrases and sentences. Evaluation results on a test set of 200 pairs of papers and slides collected on the web demonstrate that our proposed PPSGen system can generate slides with better quality. A user study is also illustrated to show that PPSGen has a few evident advantages over baseline methods.
- Published
- 2015
23. Advances in Cybersecurity and Reliability.
- Author
-
Alazab, Moutaz and Alazab, Ammar
- Subjects
DEEP learning ,INTERNET security ,NATURAL language processing ,MACHINE learning ,ARTIFICIAL intelligence ,ADVANCED Encryption Standard - Abstract
This document is a collection of research papers on various topics related to cybersecurity. The papers cover a range of subjects, including mapping vulnerabilities to defense strategies, countermeasures for cybersecurity challenges in higher education, identifying malware packers, the role of blockchain technology in manufacturing, text-to-image synthesis, predicting cybersecurity attacks on IoT, enhancing data security in BYOD environments, encryption schemes for IoT systems, usable security, and an analysis of the ChatGPT language model. Each paper presents its findings and proposes solutions to address specific cybersecurity issues. The document aims to raise awareness and improve mitigation efforts against cyber threats, emphasizing the importance of collaboration between businesses and law enforcement agencies. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
24. Syntax-based transfer learning for the task of biomedical relation extraction.
- Author
-
Legrand J, Toussaint Y, Raïssi C, and Coulet A
- Subjects
- Machine Learning, Natural Language Processing
- Abstract
Background: Transfer learning aims at enhancing machine learning performance on a problem by reusing labeled data originally designed for a related, but distinct problem. In particular, domain adaptation consists for a specific task, in reusing training data developedfor the same task but a distinct domain. This is particularly relevant to the applications of deep learning in Natural Language Processing, because they usually require large annotated corpora that may not exist for the targeted domain, but exist for side domains., Results: In this paper, we experiment with transfer learning for the task of relation extraction from biomedical texts, using the TreeLSTM model. We empirically show the impact of TreeLSTM alone and with domain adaptation by obtaining better performances than the state of the art on two biomedical relation extraction tasks and equal performances for two others, for which little annotated data are available. Furthermore, we propose an analysis of the role that syntactic features may play in transfer learning for relation extraction., Conclusion: Given the difficulty to manually annotate corpora in the biomedical domain, the proposed transfer learning method offers a promising alternative to achieve good relation extraction performances for domains associated with scarce resources. Also, our analysis illustrates the importance that syntax plays in transfer learning, underlying the importance in this domain to privilege approaches that embed syntactic features., (© 2021. The Author(s).)
- Published
- 2021
- Full Text
- View/download PDF
25. A two level learning model for authorship authentication.
- Author
-
Taha A, Khalil HM, and El-Shishtawy T
- Subjects
- Authorship, Machine Learning standards, Natural Language Processing
- Abstract
Nowadays, forensic authorship authentication plays a vital role in identifying the number of unknown authors as a result of the world's rapidly rising internet use. This paper presents two-level learning techniques for authorship authentication. The learning technique is supplied with linguistic knowledge, statistical features, and vocabulary features to enhance its efficiency instead of learning only. The linguistic knowledge is represented through lexical analysis features such as part of speech. In this study, a two-level classifier has been presented to capture the best predictive performance for identifying authorship. The first classifier is based on vocabulary features that detect the frequency with which each author uses certain words. This classifier's results are fed to the second one which is based on a learning technique. It depends on lexical, statistical and linguistic features. All of the three sets of features describe the author's writing styles in numerical forms. Through this work, many new features are proposed for identifying the author's writing style. Although, the proposed new methodology is tested for Arabic writings, it is general and can be applied to any language. According to the used machine learning models, the experiment carried out shows that the trained two-level classifier achieves an accuracy ranging from 94% to 96.16%., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2021
- Full Text
- View/download PDF
26. A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records.
- Author
-
Laparra E, Mascio A, Velupillai S, and Miller T
- Subjects
- Datasets as Topic, Language, Electronic Health Records, Machine Learning, Natural Language Processing
- Abstract
Objectives: We survey recent work in biomedical NLP on building more adaptable or generalizable models, with a focus on work dealing with electronic health record (EHR) texts, to better understand recent trends in this area and identify opportunities for future research., Methods: We searched PubMed, the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computational Linguistics (ACL) anthology, the Association for the Advancement of Artificial Intelligence (AAAI) proceedings, and Google Scholar for the years 2018-2020. We reviewed abstracts to identify the most relevant and impactful work, and manually extracted data points from each of these papers to characterize the types of methods and tasks that were studied, in which clinical domains, and current state-of-the-art results., Results: The ubiquity of pre-trained transformers in clinical NLP research has contributed to an increase in domain adaptation and generalization-focused work that uses these models as the key component. Most recently, work has started to train biomedical transformers and to extend the fine-tuning process with additional domain adaptation techniques. We also highlight recent research in cross-lingual adaptation, as a special case of adaptation., Conclusions: While pre-trained transformer models have led to some large performance improvements, general domain pre-training does not always transfer adequately to the clinical domain due to its highly specialized language. There is also much work to be done in showing that the gains obtained by pre-trained transformers are beneficial in real world use cases. The amount of work in domain adaptation and transfer learning is limited by dataset availability and creating datasets for new domains is challenging. The growing body of research in languages other than English is encouraging, and more collaboration between researchers across the language divide would likely accelerate progress in non-English clinical NLP., Competing Interests: Disclosure The authors report no conflicts of interest in this work., (IMIA and Thieme. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).)
- Published
- 2021
- Full Text
- View/download PDF
27. Processamento de linguagem natural e machine learning na categorização de artigos científicos: um estudo em torno do “patrimônio cultural”.
- Author
-
Fernanda de Jesus, Ananda, Lígia Triques, Maria, Santarem Segundo, José Eduardo, and Cristina de Albuquerque, Ana
- Abstract
Copyright of Revista Ibero-Americana de Ciência da Informação is the property of Revista Ibero-Americana de Ciencia da Informacao and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
28. Machine learning in medicine: a practical introduction to natural language processing.
- Author
-
Harrison CJ and Sidey-Gibbons CJ
- Subjects
- Algorithms, Humans, Neural Networks, Computer, Support Vector Machine, Machine Learning, Natural Language Processing
- Abstract
Background: Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be analysed by statistical and machine learning (ML) models. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software., Methods: We performed three NLP experiments using publicly-available data obtained from medicine review websites. First, we conducted lexicon-based sentiment analysis on open-text patient reviews of four drugs: Levothyroxine, Viagra, Oseltamivir and Apixaban. Next, we used unsupervised ML (latent Dirichlet allocation, LDA) to identify similar drugs in the dataset, based solely on their reviews. Finally, we developed three supervised ML algorithms to predict whether a drug review was associated with a positive or negative rating. These algorithms were: a regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). We compared the performance of these algorithms in terms of classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity and specificity., Results: Levothyroxine and Viagra were reviewed with a higher proportion of positive sentiments than Oseltamivir and Apixaban. One of the three LDA clusters clearly represented drugs used to treat mental health problems. A common theme suggested by this cluster was drugs taking weeks or months to work. Another cluster clearly represented drugs used as contraceptives. Supervised machine learning algorithms predicted positive or negative drug ratings with classification accuracies ranging from 0.664, 95% CI [0.608, 0.716] for the regularised regression to 0.720, 95% CI [0.664,0.776] for the SVM., Conclusions: In this paper, we present a conceptual overview of common techniques used to analyse large volumes of text, and provide reproducible code that can be readily applied to other research studies using open-source software., (© 2021. The Author(s).)
- Published
- 2021
- Full Text
- View/download PDF
29. Synthetic data for annotation and extraction of family history information from clinical text.
- Author
-
Brekke PH, Rama T, Pilán I, Nytrø Ø, and Øvrelid L
- Subjects
- Humans, Language, Machine Learning, Natural Language Processing
- Abstract
Background: The limited availability of clinical texts for Natural Language Processing purposes is hindering the progress of the field. This article investigates the use of synthetic data for the annotation and automated extraction of family history information from Norwegian clinical text. We make use of incrementally developed synthetic clinical text describing patients' family history relating to cases of cardiac disease and present a general methodology which integrates the synthetically produced clinical statements and annotation guideline development. The resulting synthetic corpus contains 477 sentences and 6030 tokens. In this work we experimentally assess the validity and applicability of the annotated synthetic corpus using machine learning techniques and furthermore evaluate the system trained on synthetic text on a corpus of real clinical text, consisting of de-identified records for patients with genetic heart disease., Results: For entity recognition, an SVM trained on synthetic data had class weighted precision, recall and F
1 -scores of 0.83, 0.81 and 0.82, respectively. For relation extraction precision, recall and F1 -scores were 0.74, 0.75 and 0.74., Conclusions: A system for extraction of family history information developed on synthetic data generalizes well to real, clinical notes with a small loss of accuracy. The methodology outlined in this paper may be useful in other situations where limited availability of clinical text hinders NLP tasks. Both the annotation guidelines and the annotated synthetic corpus are made freely available and as such constitutes the first publicly available resource of Norwegian clinical text., (© 2021. The Author(s).)- Published
- 2021
- Full Text
- View/download PDF
30. Artificial intelligence (AI) and its implications for market knowledge in B2B marketing
- Author
-
Paschen, Jeannette, Kietzmann, Jan, and Kietzmann, Tim Christian
- Published
- 2019
- Full Text
- View/download PDF
31. G2Basy: A framework to improve the RNN language model and ease overfitting problem.
- Author
-
Yuwen L, Chen S, and Yuan X
- Subjects
- Software standards, Machine Learning, Natural Language Processing
- Abstract
Recurrent neural networks are efficient ways of training language models, and various RNN networks have been proposed to improve performance. However, with the increase of network scales, the overfitting problem becomes more urgent. In this paper, we propose a framework-G2Basy-to speed up the training process and ease the overfitting problem. Instead of using predefined hyperparameters, we devise a gradient increasing and decreasing technique that changes the parameters training batch size and input dropout simultaneously by a user-defined step size. Together with a pretrained word embedding initialization procedure and the introduction of different optimizers at different learning rates, our framework speeds up the training process dramatically and improves performance compared with a benchmark model of the same scale. For the word embedding initialization, we propose the concept of "artificial features" to describe the characteristics of the obtained word embeddings. We experiment on two of the most often used corpora-the Penn Treebank and WikiText-2 datasets-and both outperform the benchmark results and show potential towards further improvement. Furthermore, our framework shows better results with the larger and more complicated WikiText-2 corpus than with the Penn Treebank. Compared with other state-of-the-art results, we achieve comparable results with network scales hundreds of times smaller and within fewer training epochs., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2021
- Full Text
- View/download PDF
32. Interactive Dual Attention Network for Text Sentiment Classification.
- Author
-
Zhu Y, Zheng W, and Tang H
- Subjects
- Algorithms, Linguistics, Semantics, Machine Learning, Natural Language Processing
- Abstract
Text sentiment classification is an essential research field of natural language processing. Recently, numerous deep learning-based methods for sentiment classification have been proposed and achieved better performances compared with conventional machine learning methods. However, most of the proposed methods ignore the interactive relationship between contextual semantics and sentimental tendency while modeling their text representation. In this paper, we propose a novel Interactive Dual Attention Network (IDAN) model that aims to interactively learn the representation between contextual semantics and sentimental tendency information. Firstly, we design an algorithm that utilizes linguistic resources to obtain sentimental tendency information from text and then extract word embeddings from the BERT (Bidirectional Encoder Representations from Transformers) pretraining model as the embedding layer of IDAN. Next, we use two Bidirectional LSTM (BiLSTM) networks to learn the long-range dependencies of contextual semantics and sentimental tendency information, respectively. Finally, two types of attention mechanisms are implemented in IDAN. One is multihead attention, which is the next layer of BiLSTM and is used to learn the interactive relationship between contextual semantics and sentimental tendency information. The other is global attention that aims to make the model focus on the important parts of the sequence and generate the final representation for classification. These two attention mechanisms enable IDAN to interactively learn the relationship between semantics and sentimental tendency information and improve the classification performance. A large number of experiments on four benchmark datasets show that our IDAN model is superior to competitive methods. Moreover, both the result analysis and the attention weight visualization further demonstrate the effectiveness of our proposed method., Competing Interests: The authors declare that there are no conflicts of interest regarding the publication of this paper., (Copyright © 2020 Yinglin Zhu et al.)
- Published
- 2020
- Full Text
- View/download PDF
33. Investigating machine learning and natural language processing techniques applied for detecting eating disorders: a systematic literature review.
- Author
-
Merhbene, Ghofrane, Puttick, Alexandre, and Kurpicz-Briki, Mascha
- Subjects
NATURAL language processing ,EATING disorders ,MACHINE learning ,MENTAL illness ,BINGE-eating disorder - Abstract
Recent developments in the fields of natural language processing (NLP) and machine learning (ML) have shown significant improvements in automatic text processing. At the same time, the expression of human language plays a central role in the detection of mental health problems. Whereas spoken language is implicitly assessed during interviews with patients, written language can also provide interesting insights to clinical professionals. Existing work in the field often investigates mental health problems such as depression or anxiety. However, there is also work investigating how the diagnostics of eating disorders can benefit from these novel technologies. In this paper, we present a systematic overview of the latest research in this field. Our investigation encompasses four key areas: (a) an analysis of the metadata from published papers, (b) an examination of the sizes and specific topics of the datasets employed, (c) a review of the application of machine learning techniques in detecting eating disorders from text, and finally (d) an evaluation of the models used, focusing on their performance, limitations, and the potential risks associated with current methodologies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.
- Author
-
Ferrario A, Demiray B, Yordanova K, Luo M, and Martin M
- Subjects
- Aged, Algorithms, Communication, Humans, Machine Learning standards, Memory, Long-Term physiology, Natural Language Processing
- Abstract
Background: Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations., Objective: The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts., Methods: The methods in this study comprise (1) collecting and coding of transcripts of older adults' conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies., Results: Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs., Conclusions: This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults' everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults' well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health., (©Andrea Ferrario, Burcu Demiray, Kristina Yordanova, Minxia Luo, Mike Martin. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.09.2020.)
- Published
- 2020
- Full Text
- View/download PDF
35. Bioinspired Artificial Intelligence Applications 2023.
- Author
-
Wei, Haoran, Tao, Fei, Huang, Zhenghua, and Long, Yanhua
- Subjects
ARTIFICIAL intelligence ,DEEP learning ,REINFORCEMENT learning ,MACHINE learning ,DEEP reinforcement learning ,NATURAL language processing - Abstract
This document discusses the rapid development of Artificial Intelligence (AI) and its bioinspired applications. It highlights the benefits of bioinspired AI, such as increased accuracy in image and speech processing, reduced cost and energy usage through edge devices, and enhanced bio-signal quality. However, it also acknowledges the challenges posed by improper AI utilization, such as the generation of fake news and security issues. The document calls for research papers on bioinspired AI applications to explore its potential and address these challenges. It includes examples of research papers that utilize deep reinforcement learning for robot task sequencing, propose a real-time multi-surveillance pedestrian target detection model, develop an intelligent breast mass classification approach, and introduce a bio-inspired object detection algorithm for remote sensing images. The document concludes by emphasizing the importance of biomimetic artificial intelligence in various fields and promoting further research in this area. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
36. Abstract Sentence Classification for Scientific Papers Based on Transductive SVM
- Author
-
Bingquan Liu, Ming Liu, Yuanchao Liu, and Feng Wu
- Subjects
Structured support vector machine ,Machine translation ,business.industry ,Computer science ,computer.software_genre ,Machine learning ,Task (project management) ,Support vector machine ,Information extraction ,ComputingMethodologies_PATTERNRECOGNITION ,Pharmacology (medical) ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Natural language processing ,Sentence - Abstract
Presently, sentence-level researches are very significant in fields like natural language processing, information retrieval, machine translation etc. In this paper we present a practical task on sentence classification. The main purpose of this work is to classify the abstract sentences of scientific papers in the corpus built by ourselves into four categories- the background, the goal, the method and the result- which differ from each other in common usage, so that we can do further researches such as frequent pattern mining, information extraction and making a corpus for writing assistant system of scientific paper with these results. The main method of the classification is the Support Vector Machine, which is acknowledged among the best machine learning methods in the common text classification tasks. A semi-supervised method, Transductive Support Vector Machine, is also introduced into this four-class classification task to improve the accuracy. The experiments are conducted upon the corpus made by ourselves that consists of abstract sentences of scientific papers. The accuracy of the classifier finally reaches 75.86% with the semi-supervised method.
- Published
- 2013
37. A Scoping Literature Review of Natural Language Processing Application to Safety Occurrence Reports.
- Author
-
Ricketts, Jon, Barry, David, Guo, Weisi, and Pelham, Jonathan
- Subjects
NATURAL language processing ,LITERATURE reviews ,MACHINE learning ,LANGUAGE models ,DEEP learning - Abstract
Safety occurrence reports can contain valuable information on how incidents occur, revealing knowledge that can assist safety practitioners. This paper presents and discusses a literature review exploring how Natural Language Processing (NLP) has been applied to occurrence reports within safety-critical industries, informing further research on the topic and highlighting common challenges. Some of the uses of NLP include the ability for occurrence reports to be automatically classified against categories, and entities such as causes and consequences to be extracted from the text as well as the semantic searching of occurrence databases. The review revealed that machine learning models form the dominant method when applying NLP, although rule-based algorithms still provide a viable option for some entity extraction tasks. Recent advances in deep learning models such as Bidirectional Transformers for Language Understanding are now achieving a high accuracy while eliminating the need to substantially pre-process text. The construction of safety-themed datasets would be of benefit for the application of NLP to occurrence reporting, as this would allow the fine-tuning of current language models to safety tasks. An interesting approach is the use of topic modelling, which represents a shift away from the prescriptive classification taxonomies, splitting data into "topics". Where many papers focus on the computational accuracy of models, they would also benefit from real-world trials to further inform usefulness. It is anticipated that NLP will soon become a mainstream tool used by safety practitioners to efficiently process and gain knowledge from safety-related text. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. Automation of systematic reviews of biomedical literature: a scoping review of studies indexed in PubMed.
- Author
-
Tóth, Barbara, Berek, László, Gulácsi, László, Péntek, Márta, and Zrubka, Zsombor
- Subjects
AUTOMATION ,NATURAL language processing ,DATA extraction - Abstract
Background: The demand for high-quality systematic literature reviews (SRs) for evidence-based medical decision-making is growing. SRs are costly and require the scarce resource of highly skilled reviewers. Automation technology has been proposed to save workload and expedite the SR workflow. We aimed to provide a comprehensive overview of SR automation studies indexed in PubMed, focusing on the applicability of these technologies in real world practice. Methods: In November 2022, we extracted, combined, and ran an integrated PubMed search for SRs on SR automation. Full-text English peer-reviewed articles were included if they reported studies on SR automation methods (SSAM), or automated SRs (ASR). Bibliographic analyses and knowledge-discovery studies were excluded. Record screening was performed by single reviewers, and the selection of full text papers was performed in duplicate. We summarized the publication details, automated review stages, automation goals, applied tools, data sources, methods, results, and Google Scholar citations of SR automation studies. Results: From 5321 records screened by title and abstract, we included 123 full text articles, of which 108 were SSAM and 15 ASR. Automation was applied for search (19/123, 15.4%), record screening (89/123, 72.4%), full-text selection (6/123, 4.9%), data extraction (13/123, 10.6%), risk of bias assessment (9/123, 7.3%), evidence synthesis (2/123, 1.6%), assessment of evidence quality (2/123, 1.6%), and reporting (2/123, 1.6%). Multiple SR stages were automated by 11 (8.9%) studies. The performance of automated record screening varied largely across SR topics. In published ASR, we found examples of automated search, record screening, full-text selection, and data extraction. In some ASRs, automation fully complemented manual reviews to increase sensitivity rather than to save workload. Reporting of automation details was often incomplete in ASRs. Conclusions: Automation techniques are being developed for all SR stages, but with limited real-world adoption. Most SR automation tools target single SR stages, with modest time savings for the entire SR process and varying sensitivity and specificity across studies. Therefore, the real-world benefits of SR automation remain uncertain. Standardizing the terminology, reporting, and metrics of study reports could enhance the adoption of SR automation techniques in real-world practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Matching patients to clinical trials using semantically enriched document representation.
- Author
-
Hassanzadeh H, Karimi S, and Nguyen A
- Subjects
- Humans, Language, Neural Networks, Computer, Semantics, Machine Learning, Natural Language Processing
- Abstract
Recruiting eligible patients for clinical trials is crucial for reliably answering specific questions about medical interventions and evaluation. However, clinical trial recruitment is a bottleneck in clinical research and drug development. Our goal is to provide an approach towards automating this manual and time-consuming patient recruitment task using natural language processing and machine learning techniques. Specifically, our approach extracts key information from series of narrative clinical documents in patient's records and collates helpful evidence to make decisions on eligibility of patients according to certain inclusion and exclusion criteria. Challenges in applying narrative clinical documents such as differences in reporting styles and sub-languages are addressed by enriching them with knowledge from domain ontologies in the form of semantic vector representations. We show that a machine learning model based on Multi-Layer Perceptron (MLP) is more effective for the task than five other neural networks and four conventional machine learning models. Our approach achieves overall micro-F1-Score of 84% for 13 different eligibility criteria. Our experiments also indicate that semantically enriched documents are more effective than using original documents for cohort selection. Our system provides an end-to-end machine learning-based solution that achieves comparable results with the state-of-the-art which relies on hand-crafted rules or data-centric engineered features., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2020 Elsevier Inc. All rights reserved.)
- Published
- 2020
- Full Text
- View/download PDF
40. Machine Learning Driven Mental Stress Detection on Reddit Posts Using Natural Language Processing
- Author
-
Inamdar, Shaunak, Chapekar, Rishikesh, Gite, Shilpa, and Pradhan, Biswajeet
- Published
- 2023
- Full Text
- View/download PDF
41. Improving clinical named entity recognition in Chinese using the graphical and phonetic feature.
- Author
-
Wang Y, Ananiadou S, and Tsujii J
- Subjects
- Data Curation, Electronic Health Records, Humans, Semantics, Language, Machine Learning, Natural Language Processing, Phonetics
- Abstract
Background: Clinical Named Entity Recognition is to find the name of diseases, body parts and other related terms from the given text. Because Chinese language is quite different with English language, the machine cannot simply get the graphical and phonetic information form Chinese characters. The method for Chinese should be different from that for English. Chinese characters present abundant information with the graphical features, recent research on Chinese word embedding tries to use graphical information as subword. This paper uses both graphical and phonetic features to improve Chinese Clinical Named Entity Recognition based on the presence of phono-semantic characters., Methods: This paper proposed three different embedding models and tested them on the annotated data. The data have been divided into two sections for exploring the effect of the proportion of phono-semantic characters., Results: The model using primary radical and pinyin can improve Clinical Named Entity Recognition in Chinese and get the F-measure of 0.712. More phono-semantic characters does not give a better result., Conclusions: The paper proves that the use of the combination of graphical and phonetic features can improve the Clinical Named Entity Recognition in Chinese.
- Published
- 2019
- Full Text
- View/download PDF
42. Latent Dirichlet Allocation in predicting clinical trial terminations.
- Author
-
Geletta S, Follett L, and Laugerman M
- Subjects
- Clinical Trials as Topic, Databases, Factual, Humans, Narration, Early Termination of Clinical Trials, Machine Learning, Models, Theoretical, Natural Language Processing
- Abstract
Background: This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least 10 % of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures., Method: We used data from the clinicialTrials.gov repository, from which we extracted both structured data (study characteristics), and unstructured data (the narrative description of the studies). We applied natural language processing techniques to the unstructured data to quantify the risk of termination by identifying distinctive topics that are more frequently associated with trials that are terminated and trials that are completed. We used the Latent Dirichlet Allocation (LDA) technique to derive 25 "topics" with corresponding sets of probabilities, which we then used to predict study-termination by utilizing random forest modeling. We fit two distinct models - one using only structured data as predictors and another model with both structured data and the 25 text topics derived from the unstructured data., Results: In this paper, we demonstrate the interpretive and predictive value of LDA as it relates to predicting clinical trial failure. The results also demonstrate that the combined modeling approach yields robust predictive probabilities in terms of both sensitivity and specificity, relative to a model that utilizes the structured data alone., Conclusions: Our study demonstrated that the use of topic modeling using LDA significantly raises the utility of unstructured data in better predicating the completion vs. termination of studies. This study sets the direction for future research to evaluate the viability of the designs of health studies.
- Published
- 2019
- Full Text
- View/download PDF
43. Special Issue on Recent Advances in Machine Learning and Computational Intelligence.
- Author
-
Wu, Yue, Zhang, Xinglong, and Jia, Pengfei
- Subjects
MACHINE learning ,REINFORCEMENT learning ,NATURAL language processing ,OPTIMIZATION algorithms ,COMPUTER vision ,COMPUTATIONAL intelligence ,DEEP learning - Abstract
In reviewing this Special Issue, various topics have been addressed, predominantly machine learning techniques and heuristic search algorithms. Machine learning and computational intelligence are currently high-profile research areas attracting the attention of many researchers. In the first paper, L. Zhao and H. Jin improved the traditional vector-weighted optimization algorithm (INFO) and designed a promising optimization algorithm (IDEINFO) [[8]]. [Extracted from the article]
- Published
- 2023
- Full Text
- View/download PDF
44. Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach.
- Author
-
Im, Yunjeong, Song, Gyuwon, and Cho, Minsang
- Subjects
RECOMMENDER systems ,MACHINE learning ,CONFLICT of interests ,STANDARD deviations ,FIELD research - Abstract
Academic societies and funding bodies that conduct peer reviews need to select the best reviewers in each field to ensure publication quality. Conventional approaches for reviewer selection focus on evaluating expertise based on research relevance by subject or discipline. An improved perceiving conflict of interest (CoI) reviewer recommendation process that combines the five expertise indices and graph analysis techniques is proposed in this paper. This approach collects metadata from the academic database and extracts candidates based on research field similarities utilizing text mining; then, the candidate scores are calculated and ranked through a professionalism index-based analysis. The highly connected subgraphs (HCS) algorithm is used to cluster similar researchers based on their association or intimacy in the researcher network. The proposed method is evaluated using root mean square error (RMSE) indicators for matching the field of publication and research fields of the recommended experts using keywords of papers published in Korean journals over the past five years. The results show that the system configures a group of Top-K reviewers with an RMSE 0.76. The proposed method can be applied to the academic society and national research management system to realize fair and efficient screening and management. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. A Novel Memory-Scheduling Strategy for Large Convolutional Neural Network on Memory-Limited Devices.
- Author
-
Li S, Shen X, Dou Y, Ni S, Xu J, Yang K, Wang Q, and Niu X
- Subjects
- Deep Learning, Algorithms, Machine Learning, Memory physiology, Natural Language Processing, Neural Networks, Computer
- Abstract
Recently, machine learning, especially deep learning, has been a core algorithm to be widely used in many fields such as natural language processing, speech recognition, object recognition, and so on. At the same time, another trend is that more and more applications are moved to wearable and mobile devices. However, traditional deep learning methods such as convolutional neural network (CNN) and its variants consume a lot of memory resources. In this case, these powerful deep learning methods are difficult to apply on mobile memory-limited platforms. In order to solve this problem, we present a novel memory-management strategy called mmCNN in this paper. With the help of this method, we can easily deploy a trained large-size CNN on any memory size platform such as GPU, FPGA, or memory-limited mobile devices. In our experiments, we run a feed-forward CNN process in some extremely small memory sizes (as low as 5 MB) on a GPU platform. The result shows that our method saves more than 98% memory compared to a traditional CNN algorithm and further saves more than 90% compared to the state-of-the-art related work "vDNNs" (virtualized deep neural networks). Our work in this paper improves the computing scalability of lightweight applications and breaks the memory bottleneck of using deep learning method on memory-limited devices.
- Published
- 2019
- Full Text
- View/download PDF
46. Developing a portable natural language processing based phenotyping system.
- Author
-
Sharma H, Mao C, Zhang Y, Vatani H, Yao L, Zhong Y, Rasmussen L, Jiang G, Pathak J, and Luo Y
- Subjects
- Electronic Health Records, Humans, Obesity, Pilot Projects, Information Storage and Retrieval methods, Machine Learning, Natural Language Processing
- Abstract
Background: This paper presents a portable phenotyping system that is capable of integrating both rule-based and statistical machine learning based approaches., Methods: Our system utilizes UMLS to extract clinically relevant features from the unstructured text and then facilitates portability across different institutions and data systems by incorporating OHDSI's OMOP Common Data Model (CDM) to standardize necessary data elements. Our system can also store the key components of rule-based systems (e.g., regular expression matches) in the format of OMOP CDM, thus enabling the reuse, adaptation and extension of many existing rule-based clinical NLP systems. We experimented with our system on the corpus from i2b2's Obesity Challenge as a pilot study., Results: Our system facilitates portable phenotyping of obesity and its 15 comorbidities based on the unstructured patient discharge summaries, while achieving a performance that often ranked among the top 10 of the challenge participants., Conclusion: Our system of standardization enables a consistent application of numerous rule-based and machine learning based classification techniques downstream across disparate datasets which may originate across different institutions and data systems.
- Published
- 2019
- Full Text
- View/download PDF
47. Combination of conditional random field with a rule based method in the extraction of PICO elements.
- Author
-
Chabou S and Iglewski M
- Subjects
- Humans, Data Mining, Machine Learning, Medical Informatics Applications, Models, Statistical, Natural Language Processing
- Abstract
Background: Extracting primary care information in terms of Patient/Problem, Intervention, Comparison and Outcome, known as PICO elements, is difficult as the volume of medical information expands and the health semantics is complex to capture it from unstructured information. The combination of the machine learning methods (MLMs) with rule based methods (RBMs) could facilitate and improve the PICO extraction. This paper studies the PICO elements extraction methods. The goal is to combine the MLMs with the RBMs to extract PICO elements in medical papers to facilitate answering clinical questions formulated with the PICO framework., Methods: First, we analyze the aspects of the MLM model that influence the quality of the PICO elements extraction. Secondly, we combine the MLM approach with the RBMs in order to improve the PICO elements retrieval process. To conduct our experiments, we use a corpus of 1000 abstracts., Results: We obtain an F-score of 80% for P element, 64% for the I element and 92% for the O element. Given the nature of the used training corpus where P and I elements represent respectively only 6.5 and 5.8% of total sentences, the results are competitive with previously published ones., Conclusions: Our study of the PICO element extraction shows that the task is very challenging. The MLMs tend to have an acceptable precision rate but they have a low recall rate when the corpus is not representative. The RBMs backed up the MLMs to increase the recall rate and consequently the combination of the two methods gave better results.
- Published
- 2018
- Full Text
- View/download PDF
48. Researcher Network Visualization Using Matrix Researcher2vec.
- Author
-
Hirata, Enna, Yamashita, Takahiro, and Ozawa, Seiichi
- Subjects
NATURAL language processing ,DATA visualization ,MACHINE learning ,WEB services ,MATRICES (Mathematics) - Abstract
In this study, we introduce a system called Matrix Researcher2vec (MResearcher2vec) which generates researcher embedding vectors from their papers and research projects in researchmap and KAKENHI databases. The system includes data on 276,841 researchers, 6,161,592 papers, and research projects. Utilizing natural language processing techniques, the MResearcher2vec model extracts researcher vectors from the papers and research project summaries of KAKENHI grant recipients. The similarity between reseachers is then computed to visualize inter-researcher relationships. The machine learning results have been integrated into a web service, providing a novel approach for academic relationship mining. It can be applied in the matching of research contents and researchers in evaluation of industry-government-academia collaboration and joint research. It contributes in four aspects: (1) exchanges between researchers, (2) creation of opportunities for researchers and companies to connect, (3) further promotion of interdisciplinary research, and (4) reduction of lost opportunities for research institutions to acquire talents. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
49. Identifying Key Issues in Integration of Autonomous Ships in Container Ports: A Machine-Learning-Based Systematic Literature Review.
- Author
-
Hirata, Enna and Hansen, Annette Skovsted
- Subjects
LITERATURE reviews ,CONTAINER terminals ,CONTAINER ships ,SHIPPING containers ,INTERNET security laws ,THIRD-party logistics ,AUTONOMOUS underwater vehicles - Abstract
Background: Autonomous ships have the potential to increase operational efficiency and reduce carbon footprints through technology and innovation. However, there is no comprehensive literature review of all the different types of papers related to autonomous ships, especially with regard to their integration with ports. This paper takes a systematic review approach to extract and summarize the main topics related to autonomous ships in the fields of container shipping and port management. Methods: A machine learning method is used to extract the main topics from more than 2000 journal publications indexed in WoS and Scopus. Results: The research findings highlight key issues related to technology, cybersecurity, data governance, regulations, and legal frameworks, providing a different perspective compared to human manual reviews of papers. Conclusions: Our search results confirm several recommendations. First, from a technological perspective, it is advised to increase support for the research and development of autonomous underwater vehicles and unmanned aerial vehicles, establish safety standards, mandate testing of wave model evaluation systems, and promote international standardization. Second, from a cyber–physical systems perspective, efforts should be made to strengthen logistics and supply chains for autonomous ships, establish data governance protocols, enforce strict control over IoT device data, and strengthen cybersecurity measures. Third, from an environmental perspective, measures should be implemented to address the environmental impact of autonomous ships. This can be achieved by promoting international agreements from a global societal standpoint and clarifying the legal framework regarding liability in the event of accidents. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Perspectives of Machine Learning and Natural Language Processing on Characterizing Positive Energy Districts.
- Author
-
Han, Mengjie, Canli, Ilkim, Shah, Juveria, Zhang, Xingxing, Dino, Ipek Gursel, and Kalkan, Sinan
- Subjects
MACHINE learning ,NATURAL language processing ,CARBON emissions ,STAKEHOLDER analysis ,ENERGY consumption ,ARTIFICIAL intelligence - Abstract
The concept of a Positive Energy District (PED) has become a vital component of the efforts to accelerate the transition to zero carbon emissions and climate-neutral living environments. Research is shifting its focus from energy-efficient single buildings to districts, where the aim is to achieve a positive energy balance across a given time period. Various innovation projects, programs, and activities have produced abundant insights into how to implement and operate PEDs. However, there is still no agreed way of determining what constitutes a PED for the purpose of identifying and evaluating its various elements. This paper thus sets out to create a process for characterizing PEDs. First, nineteen different elements of a PED were identified. Then, two AI techniques, machine learning (ML) and natural language processing (NLP), were introduced and examined to determine their potential for modeling, extracting, and mapping the elements of a PED. Lastly, state-of-the-art research papers were reviewed to identify any contribution they can make to the determination of the effectiveness of the ML and NLP models. The results suggest that both ML and NLP possess significant potential for modeling most of the identified elements in various areas, such as optimization, control, design, and stakeholder mapping. This potential is realized through the utilization of vast amounts of data, enabling these models to generate accurate and useful insights for PED planning and implementation. Several practical strategies have been identified to enhance the characterization of PEDs. These include a clear definition and quantification of the elements, the utilization of urban-scale energy modeling techniques, and the development of user-friendly interfaces capable of presenting model insights in an accessible manner. Thus, developing a holistic approach that integrates existing and novel techniques for PED characterization is essential to achieve sustainable and resilient urban environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.