586 results
Search Results
2. An automatic text summarization based on valuable sentences selection
- Author
-
Mahalleh, Elham Rahimzadeh and Gharehchopogh, Farhad Soleimanian
- Published
- 2022
- Full Text
- View/download PDF
3. A Plea for Neutral Comparison Studies in Computational Sciences.
- Author
-
Boulesteix, Anne-Laure, Lauer, Sabine, and Eugster, Manuel J. A.
- Subjects
BIOINFORMATICS ,MACHINE learning ,COMPUTATIONAL biology ,COMPUTER algorithms ,PARALLEL computers ,BIOMETRY ,MATHEMATICS ,MEDICAL research - Abstract
: In computational science literature including, e.g., bioinformatics, computational statistics or machine learning, most published articles are devoted to the development of “new methods”, while comparison studies are generally appreciated by readers but surprisingly given poor consideration by many journals. This paper stresses the importance of neutral comparison studies for the objective evaluation of existing methods and the establishment of standards by drawing parallels with clinical research. The goal of the paper is twofold. Firstly, we present a survey of recent computational papers on supervised classification published in seven high-ranking computational science journals. The aim is to provide an up-to-date picture of current scientific practice with respect to the comparison of methods in both articles presenting new methods and articles focusing on the comparison study itself. Secondly, based on the results of our survey we critically discuss the necessity, impact and limitations of neutral comparison studies in computational sciences. We define three reasonable criteria a comparison study has to fulfill in order to be considered as neutral, and explicate general considerations on the individual components of a “tidy neutral comparison study”. R codes for completely replicating our statistical analyses and figures are available from the companion website http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/plea2013. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
4. Shear Strength Prediction of Slender Concrete Beams Reinforced with FRP Rebar Using Data-Driven Machine Learning Algorithms.
- Author
-
Karim, Mohammad Rezaul, Islam, Kamrul, Billah, A. H. M. Muntasir, and Alam, M. Shahria
- Subjects
MACHINE learning ,CONCRETE beams ,REINFORCED concrete ,SHEAR strength ,GRAPHICAL user interfaces ,FIBER-reinforced plastics ,STRUCTURAL engineering ,DIFFERENCE equations ,REINFORCING bars - Abstract
Estimating the shear strength of a fiber-reinforced polymer (FRP)–reinforced-concrete (RC) beam is a complex task that depends on multiple design variables. The use of FRP bars has emerged as a promising alternative to diminish the corrosion problems that are associated with steel reinforcement in adverse environments; however, an accurate and reliable method of shear strength prediction is needed to ensure the economical use of materials and robust designs. Several optimized design equations are available in the literature; however, when utilizing these equations a substantial difference is observed between the predicted outcome (V
pred ) and the experimental shear strength (Vexp ) result. Therefore, this paper presented a novel approach toward implementing machine learning (ML) algorithms to accurately estimate the shear strength of FRP–RC beams. A large database that consisted of 302 shear test results on FRP-reinforced slender concrete beams without stirrup was collected from the literature to formulate the most efficient prediction model. The performance of each ML algorithm model was compared with the existing design provisions and models. The model interpretation was performed through feature importance analysis to explain the model output compared with a black box. The proposed data-driven ML models demonstrated a high level of accuracy and excellent performance and were superior to the existing shear strength models. In addition, a simple graphical user interface (GUI) was developed to aid practicing engineers when estimating shear strength without the need for complicated design procedures. The shear strength of FRP–RC beams is calculated using various design codes and guidelines that are heuristically developed based on previous test results. In general, the developed equations are either mechanics-based or empirical. However, this paper demonstrated that data-driven ML algorithms could generate a more reliable and appropriate prediction of the shear strength of FRP–RC beams. Furthermore, as the database increases, it could be automatically updated, which would result in more accurate and reliable results. Designers and practitioners could conveniently use the developed algorithms for the reliable and quick prediction of the shear strength of FRP–RC beams. In addition, the developed GUI is innovative and user-friendly. It allows users to determine the design shear strength without referring to an existing code by employing ML in conjunction with a large, reliable, and authenticated database to ensure accuracy. This could be important for the structural engineering community when assessing the shear capacity of existing FRP–RC beams. [ABSTRACT FROM AUTHOR]- Published
- 2023
- Full Text
- View/download PDF
5. Predicting Bank Failures: A Synthesis of Literature and Directions for Future Research.
- Author
-
Li Xian Liu, Shuangzhe Liu, and Milind Sathye
- Subjects
BANK failures ,CENTRAL banking industry ,COVID-19 ,FINANCIAL institutions ,BANKING industry ,FINANCIAL risk - Abstract
Risk management has been a topic of great interest to Michael McAleer. Even as recent as 2020, his paper on risk management for COVID-19 was published. In his memory, this article is focused on bankruptcy risk in financial firms. For financial institutions in particular, banks are considered special, given that they perform risk management functions that are unique. Risks in banking arise from both internal and external factors. The GFC underlined the need for comprehensive risk management, and researchers since then have been working towards fulfilling that need. Similarly, the central banks across the world have begun periodic stress-testing of banks' ability to withstand shocks. This paper investigates the machine-learning and statistical techniques used in the literature on bank failure prediction. The study finds that though considerable progress has been made using advanced statistical and computational techniques, given the complex nature of banking risk, the ability of statistical techniques to predict bank failures is limited. Machine-learning-based models are increasingly becoming popular due to their significant predictive ability. The paper also suggests the directions for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Low-Flow (7-Day, 10-Year) Classical Statistical and Improved Machine Learning Estimation Methodologies.
- Author
-
DelSanto, Andrew, Bhuiyan, Md Abul Ehsan, Andreadis, Konstantinos M., and Palmer, Richard N.
- Subjects
MACHINE learning ,CONSTRUCTED wetlands ,STANDARD deviations ,SEWAGE disposal plants ,MUNICIPAL water supply ,PEARSON correlation (Statistics) ,AQUATIC resources - Abstract
Water resource managers require accurate estimates of the 7-day, 10-year low flow (7Q10) of streams for many reasons, including protecting aquatic species, designing wastewater treatment plants, and calculating municipal water availability. StreamStats, a publicly available web application developed by the United States Geologic Survey that is commonly used by resource managers for estimating the 7Q10 in states where it is available, utilizes state-by-state, locally calibrated regression equations for estimation. This paper expands StreamStats' methodology and improves 7Q10 estimation by developing a more regionally applicable and generalized methodology for 7Q10 estimation. In addition to classical methodologies, namely multiple linear regression (MLR) and multiple linear regression in log space (LTLR), three promising machine learning algorithms, random forest (RF) decision trees, neural networks (NN), and generalized additive models (GAM), are tested to determine if more advanced statistical methods offer improved estimation. For illustrative purposes, this methodology is applied to and verified for the full range of unimpaired, gaged basins in both the northeast and mid-Atlantic hydrologic regions of the United States (with basin sizes ranging from 2–1419 mi
2 ) using leave-one-out cross-validation (LOOCV). Pearson's correlation coefficient (R2 ), root mean square error (RMSE), Kling–Gupta Efficiency (KGE), and Nash–Sutcliffe Efficiency (NSE) are used to evaluate the performance of each method. Results suggest that each method provides varying results based on basin size, with RF displaying the smallest average RMSE (5.85) across all ranges of basin sizes. [ABSTRACT FROM AUTHOR]- Published
- 2023
- Full Text
- View/download PDF
7. Modeling of kappa factor using multivariate adaptive regression splines: application to the western Türkiye ground motion dataset
- Author
-
Kurtulmuş, Tevfik Özgür, Yerlikaya–Özkurt, Fatma, and Askan, Aysegul
- Published
- 2024
- Full Text
- View/download PDF
8. Anniversary article: Then and now: 25 years of progress in natural language engineering.
- Author
-
Tait, John and Wilks, Yorick
- Subjects
NATURAL languages ,CHATBOTS ,DATA mining ,MACHINE learning ,CRITICAL currents ,PARTS of speech - Abstract
The paper reviews the state of the art of natural language engineering (NLE) around 1995, when this journal first appeared, and makes a critical comparison with the current state of the art in 2018, as we prepare the 25th Volume. Specifically the then state of the art in parsing, information extraction, chatbots, and dialogue systems, speech processing and machine translation are briefly reviewed. The emergence in the 1980s and 1990s of machine learning (ML) and statistical methods (SM) is noted. Important trends and areas of progress in the subsequent years are identified. In particular, the move to the use of n-grams or skip grams and/or chunking with part of speech tagging and away from whole sentence parsing is noted, as is the increasing dominance of SM and ML. Some outstanding issues which merit further research are briefly pointed out, including metaphor processing and the ethical implications of NLE. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
9. Credit risk evaluation: a comprehensive study.
- Author
-
Bhattacharya, Arijit, Biswas, Saroj Kr., and Mandal, Ardhendu
- Subjects
CREDIT risk ,RISK assessment ,CREDIT analysis ,FINANCIAL stress ,MACHINE learning - Abstract
To date, there has been relatively little research in the field of credit risk analysis that compares all of the well known statistical, optimization technique (heuristic methods) and machine learning based approaches in a single article. Review on credit risk assessment using sixteen well-known approaches has been conducted in this work. The accuracy of the machine learning approaches in dealing with financial difficulties is superior to that of traditional statistical methods, especially when dealing with nonlinear patterns, according to the findings. Hybrid or Ensemble algorithms, on the other hand have been found to outperform their traditional counterparts – standalone classifiers in the vast majority of situations. Finally, the paper compares the models with nine machine learning classifiers utilizing two benchmark datasets. In this study, we have encountered with 46 datasets, among them 35 datasets have been utilized for once; whereas among the other 11 datasets, Australian, German and Japanese are the three most frequently utilized datasets by the researchers. The study showed that the performance of ensemble classifiers were very much significant. As per the experimental result, for both datasets ensemble classifiers outperformed other standalone classifiers which validate with the prior research also. Although some of these approaches have a high level of accuracy, additional study is required to discover the right parameters and procedures for better outcomes in a transparent manner. Additionally this study is a valuable reference source for analyzing credit risk for both academic and practical domains, since it contains relevant information on the most major machine learning approaches employed so far. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. Study of the Influence of Operating and Geometric Parameters on the Critical Outflow of Subcooled and Boiling Water through Channels of Different Geometry
- Author
-
Konovalov, I. A., Bol’shukhin, M. A., Khizbullin, A. M., Sokolov, A. N., Barinov, A. A., Loktionov, V. D., Dmitriev, S. M., and Zyryanova, T. K.
- Published
- 2024
- Full Text
- View/download PDF
11. Research on the forward-looking behavior judgment of heating oil price evolution based on complex networks.
- Author
-
Tian, Lixin, Chen, Huan, and Zhen, Zaili
- Subjects
PETROLEUM sales & prices ,ARTIFICIAL neural networks ,INFORMATION technology ,INFORMATION storage & retrieval systems ,PREDICTION models - Abstract
Analyzing and predicting the trend of price fluctuation has been receiving more and more attention, as price risk has become the focus of risk control research in heating oil futures market. A novel time series prediction model combined with the complex network method is put forward in the paper. First of all, this paper counts the cumulative time interval of different nodes in the network, and fits its growth trend with the Fourier model. Then a novel price fluctuation prediction model is established based on the effective information such as some topology properties extracted from the network. The results show that the Fourier model can predict the emergence time of new nodes in the next stage, and the established price fluctuation prediction model can infer the names of nodes in the prediction interval, so as to determine the forward-looking behavior of price evolution. Besides, liken to the NAR neural network, the prediction results obtained by the proposed method also show superiority, which has important theoretical value and academic significance for early warning and prediction of price behavior in the heating oil futures market. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
12. Modeling obesity in complex food systems: Systematic review.
- Author
-
Bhatia, Anita, Smetana, Sergiy, Heinz, Volker, and Hertzberg, Joachim
- Subjects
OBESITY ,FOOD habits ,CONCEPTUAL design ,MACHINE learning ,SOCIAL influence - Abstract
Obesity-related data derived from multiple complex systems spanning media, social, economic, food activity, health records, and infrastructure (sensors, smartphones, etc.) can assist us in understanding the relationship between obesity drivers for more efficient prevention and treatment. Reviewed literature shows a growing adaptation of the machine-learning model in recent years dealing with mechanisms and interventions in social influence, nutritional diet, eating behavior, physical activity, built environment, obesity prevalence prediction, distribution, and healthcare cost-related outcomes of obesity. Most models are designed to reflect through time and space at the individual level in a population, which indicates the need for a macro-level generalized population model. The model should consider all interconnected multi-system drivers to address obesity prevalence and intervention. This paper reviews existing computational models and datasets used to compute obesity outcomes to design a conceptual framework for establishing a macro-level generalized obesity model. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Upscaling and downscaling Monte Carlo ensembles with generative models.
- Author
-
Scheiter, Matthias, Valentine, Andrew, and Sambridge, Malcolm
- Subjects
MONTE Carlo method ,INVERSE problems ,NUMERICAL calculations ,CORE-mantle boundary ,FRICTION velocity - Abstract
Monte Carlo methods are widespread in geophysics and have proved to be powerful in non-linear inverse problems. However, they are associated with significant practical challenges, including long calculation times, large output ensembles of Earth models, and difficulties in the appraisal of the results. This paper addresses some of these challenges using generative models, a family of tools that have recently attracted much attention in the machine learning literature. Generative models can, in principle, learn a probability distribution from a set of given samples and also provide a means for rapid generation of new samples which follow that approximated distribution. These two features make them well suited for application to the outputs of Monte Carlo algorithms. In particular, training a generative model on the posterior distribution of a Bayesian inference problem provides two main possibilities. First, the number of parameters in the generative model is much smaller than the number of values stored in the ensemble, leading to large compression rates. Secondly, once trained, the generative model can be used to draw any number of samples, thereby eliminating the dependence on an often large and unwieldy ensemble. These advantages pave new pathways for the use of Monte Carlo ensembles, including improved storage and communication of the results, enhanced calculation of numerical integrals, and the potential for convergence assessment of the Monte Carlo procedure. Here, these concepts are initially demonstrated using a simple synthetic example that scales into higher dimensions. They are then applied to a large ensemble of shear wave velocity models of the core–mantle boundary, recently produced in a Monte Carlo study. These examples demonstrate the effectiveness of using generative models to approximate posterior ensembles, and indicate directions to address various challenges in Monte Carlo inversion. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. Modeling Dynamic Systems with Efficient Ensembles of Process-Based Models.
- Author
-
Simidjievski, Nikola, Todorovski, Ljupčo, and Džeroski, Sašo
- Subjects
DYNAMICAL systems ,MACHINE learning ,SET theory ,DECISION making ,SIMULATION methods & models - Abstract
Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
15. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.
- Author
-
Deng, Song, Yue, Dong, Yang, Le-chan, Fu, Xiong, and Feng, Ya-zhou
- Subjects
GENE expression ,BIOLOGICAL evolution ,GENETICS ,DATA mining ,ALGORITHMS ,COMPARATIVE studies - Abstract
For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
16. Shilling attack detection for recommender systems based on credibility of group users and rating time series.
- Author
-
Zhou, Wei, Wen, Junhao, Qu, Qiang, Zeng, Jun, and Cheng, Tian
- Subjects
SHILLING ,TRUTHFULNESS & falsehood ,TIME series analysis ,SUSTAINABILITY ,PREDICTION models - Abstract
Recommender systems are vulnerable to shilling attacks. Forged user-generated content data, such as user ratings and reviews, are used by attackers to manipulate recommendation rankings. Shilling attack detection in recommender systems is of great significance to maintain the fairness and sustainability of recommender systems. The current studies have problems in terms of the poor universality of algorithms, difficulty in selection of user profile attributes, and lack of an optimization mechanism. In this paper, a shilling behaviour detection structure based on abnormal group user findings and rating time series analysis is proposed. This paper adds to the current understanding in the field by studying the credibility evaluation model in-depth based on the rating prediction model to derive proximity-based predictions. A method for detecting suspicious ratings based on suspicious time windows and target item analysis is proposed. Suspicious rating time segments are determined by constructing a time series, and data streams of the rating items are examined and suspicious rating segments are checked. To analyse features of shilling attacks by a group user’s credibility, an abnormal group user discovery method based on time series and time window is proposed. Standard testing datasets are used to verify the effect of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
17. Statistical and Machine Learning forecasting methods: Concerns and ways forward.
- Author
-
Makridakis, Spyros, Spiliotis, Evangelos, and Assimakopoulos, Vassilios
- Subjects
MACHINE learning ,TIME series analysis ,FORECASTING ,STATISTICAL sampling ,STATISTICAL models - Abstract
Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
18. A novel stock forecasting model based on High-order-fuzzy-fluctuation Trends and Back Propagation Neural Network.
- Author
-
Guan, Hongjun, Dai, Zongli, Zhao, Aiwu, and He, Jie
- Subjects
STOCK prices ,ECONOMIC trends ,TIME series analysis ,BACK propagation ,ARTIFICIAL neural networks - Abstract
In this paper, we propose a hybrid method to forecast the stock prices called High-order-fuzzy-fluctuation-Trends-based Back Propagation(HTBP)Neural Network model. First, we compare each value of the historical training data with the previous day's value to obtain a fluctuation trend time series (FTTS). On this basis, the FTTS blur into fuzzy time series (FFTS) based on the fluctuation of the increasing, equality, decreasing amplitude and direction. Since the relationship between FFTS and future wave trends is nonlinear, the HTBP neural network algorithm is used to find the mapping rules in the form of self-learning. Finally, the results of the algorithm output are used to predict future fluctuations. The proposed model provides some innovative features:(1)It combines fuzzy set theory and neural network algorithm to avoid overfitting problems existed in traditional models. (2)BP neural network algorithm can intelligently explore the internal rules of the actual existence of sequential data, without the need to analyze the influence factors of specific rules and the path of action. (3)The hybrid modal can reasonably remove noises from the internal rules by proper fuzzy treatment. This paper takes the TAIEX data set of Taiwan stock exchange as an example, and compares and analyzes the prediction performance of the model. The experimental results show that this method can predict the stock market in a very simple way. At the same time, we use this method to predict the Shanghai stock exchange composite index, and further verify the effectiveness and universality of the method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. A review of regression and classification techniques for analysis of common and rare variants and gene-environmental factors.
- Author
-
Miller, Anthony, Panneerselvam, John, and Liu, Lu
- Subjects
- *
GENOME-wide association studies , *GENOTYPE-environment interaction , *MACHINE learning , *TYPE 1 diabetes , *GENOMICS - Abstract
Statistical techniques incorporated with machine-learning algorithms in unison with gene-environment interaction are giving unparalleled understanding of complex diseases. Accurate analysis and intricate capturing of common, rare, and low MAF (Minor Allele Frequency) variants alongside gene-environmental interaction is pivotal whilst concluding reliable and accurate classification of complex diseases. Various complex diseases including genres of diabetes Type 1 and Type 2 alongside the vastly under-researched Lada (Latent Autoimmune Diabetes in Adults) diabetes require further investigation alongside significant machine learning research to gain a deeper understanding of the disease complexities. Despite existing efforts, an ideal combination of statistical techniques with optimal machine-learning algorithms that can accurately capture and model the gene-environment interaction is lacking. Intentionally exploring future and simultaneously exploiting modern-day computational methods in genomic analysis, this paper profoundly investigates both the future and present interaction of statistical analysis techniques and machine-learning algorithms and Ensembles with gene-environmental factors. In this context, this paper firstly presents a conceptual understanding of genomic conventions; secondly, conducts potential future machine learning algorithms alongside an extensive analysis of a range of classification, regression and Ensemble techniques along with exhibiting their imperative relationship and roles in investigating and classifying common, rare variants and a wide array of gene-environmental factors; and thirdly, utilisation of statistical techniques in Genome Wide Association Studies is scrutinised whilst analysing common, rare and MAF variants. As an important contribution, this paper identifies efficient machine-learning algorithms alongside Ensemble models and future potential analysis techniques and exhibits their inherent characteristics that can enhance the reliability and accuracy of the gene-environment classification analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Statistical Methods with Applications in Data Mining: A Review of the Most Recent Works.
- Author
-
Pinto da Costa, Joaquim Fernando and Cabral, Manuel
- Subjects
DATA mining ,STATISTICS - Abstract
The importance of statistical methods in finding patterns and trends in otherwise unstructured and complex large sets of data has grown over the past decade, as the amount of data produced keeps growing exponentially and knowledge obtained from understanding data allows to make quick and informed decisions that save time and provide a competitive advantage. For this reason, we have seen considerable advances over the past few years in statistical methods in data mining. This paper is a comprehensive and systematic review of these recent developments in the area of data mining. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Predicting Key Events in the Popularity Evolution of Online Information.
- Author
-
Hu, Ying, Hu, Changjun, Fu, Shushen, Fang, Mingzhe, and Xu, Wenwen
- Subjects
METRIC spaces ,ONLINE information services ,MACHINE learning ,ARTIFICIAL intelligence ,MACHINE theory - Abstract
The popularity of online information generally experiences a rising and falling evolution. This paper considers the “burst”, “peak”, and “fade” key events together as a representative summary of popularity evolution. We propose a novel prediction task—predicting when popularity undergoes these key events. It is of great importance to know when these three key events occur, because doing so helps recommendation systems, online marketing, and containment of rumors. However, it is very challenging to solve this new prediction task due to two issues. First, popularity evolution has high variation and can follow various patterns, so how can we identify “burst”, “peak”, and “fade” in different patterns of popularity evolution? Second, these events usually occur in a very short time, so how can we accurately yet promptly predict them? In this paper we address these two issues. To handle the first one, we use a simple moving average to smooth variation, and then a universal method is presented for different patterns to identify the key events in popularity evolution. To deal with the second one, we extract different types of features that may have an impact on the key events, and then a correlation analysis is conducted in the feature selection step to remove irrelevant and redundant features. The remaining features are used to train a machine learning model. The feature selection step improves prediction accuracy, and in order to emphasize prediction promptness, we design a new evaluation metric which considers both accuracy and promptness to evaluate our prediction task. Experimental and comparative results show the superiority of our prediction solution. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
22. Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies.
- Author
-
Kim, Young Bin, Kim, Jun Gi, Kim, Wook, Im, Jae Ho, Kim, Tae Hyeong, Kang, Shin Jin, and Kim, Chang Hun
- Subjects
TRANSACTION systems (Computer systems) ,CRYPTOCURRENCIES ,MARKETS ,ECONOMIC forecasting ,ECONOMIC research ,HARD currencies - Abstract
This paper proposes a method to predict fluctuations in the prices of cryptocurrencies, which are increasingly used for online transactions worldwide. Little research has been conducted on predicting fluctuations in the price and number of transactions of a variety of cryptocurrencies. Moreover, the few methods proposed to predict fluctuation in currency prices are inefficient because they fail to take into account the differences in attributes between real currencies and cryptocurrencies. This paper analyzes user comments in online cryptocurrency communities to predict fluctuations in the prices of cryptocurrencies and the number of transactions. By focusing on three cryptocurrencies, each with a large market size and user base, this paper attempts to predict such fluctuations by using a simple and efficient method. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
23. Optimizing predictive performance of criminal recidivism models using registration data with binary and survival outcomes.
- Author
-
Tollenaar, Nikolaj and van der Heijden, Peter G. M.
- Subjects
REGRESSION trees ,DRUG registration ,CENSORING (Statistics) ,EXAMPLE ,DATA - Abstract
In a recidivism prediction context, there is no consensus on which modeling strategy should be followed for obtaining an optimal prediction model. In previous papers, a range of statistical and machine learning techniques were benchmarked on recidivism data with a binary outcome. However, two important tree ensemble methods, namely gradient boosting and random forests were not extensively evaluated. In this paper, we further explore the modeling potential of these techniques in the binary outcome criminal prediction context. Additionally, we explore the predictive potential of classical statistical and machine learning methods for censored time-to-event data. A range of statistical manually specified statistical and (semi-)automatic machine learning models is fitted on Dutch recidivism data, both for the binary outcome case and censored outcome case. To enhance generalizability of results, the same models are applied to two historical American data sets, the North Carolina prison data. For all datasets, (semi-) automatic modeling in the binary case seems to provide no improvement over an appropriately manually specified traditional statistical model. There is however evidence of slightly improved performance of gradient boosting in survival data. Results on the reconviction data from two sources suggest that both statistical and machine learning should be tried out for obtaining an optimal model. Even if a flexible black-box model does not improve upon the predictions of a manually specified model, it can serve as a test whether important interactions are missing or other misspecification of the model are present and can thus provide more security in the modeling process. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
24. Drug sensitivity prediction with high-dimensional mixture regression.
- Author
-
Li, Qianyun, Shi, Runmin, and Liang, Faming
- Subjects
DRUG analysis ,FEATURE selection ,REGRESSION analysis ,RANDOM forest algorithms ,PREDICTION models - Abstract
This paper proposes a mixture regression model-based method for drug sensitivity prediction. The proposed method explicitly addresses two fundamental issues in drug sensitivity prediction, namely, population heterogeneity and feature selection pertaining to each of the subpopulations. The mixture regression model is estimated using the imputation-conditional consistency algorithm, and the resulting estimator is consistent. This paper also proposes an average-BIC criterion for determining the number of components for the mixture regression model. The proposed method is applied to the CCLE dataset, and the numerical results indicate that the proposed method can make a drastic improvement over the existing ones, such as random forest, support vector regression, and regularized linear regression, in both drug sensitivity prediction and feature selection. The p-values for the comparisons in drug sensitivity prediction can reach the order O(10
−8 ) or lower for the drugs with heterogeneous populations. [ABSTRACT FROM AUTHOR]- Published
- 2019
- Full Text
- View/download PDF
25. Machine learning framework for assessment of microbial factory performance.
- Author
-
Oyetunde, Tolutola, Liu, Di, Martin, Hector Garcia, and Tang, Yinjie J.
- Subjects
MACHINE learning ,MICROBIAL cells ,METABOLIC models ,ESCHERICHIA coli ,GENOMES - Abstract
Metabolic models can estimate intrinsic product yields for microbial factories, but such frameworks struggle to predict cell performance (including product titer or rate) under suboptimal metabolism and complex bioprocess conditions. On the other hand, machine learning, complementary to metabolic modeling necessitates large amounts of data. Building such a database for metabolic engineering designs requires significant manpower and is prone to human errors and bias. We propose an approach to integrate data-driven methods with genome scale metabolic model for assessment of microbial bio-production (yield, titer and rate). Using engineered E. coli as an example, we manually extracted and curated a data set comprising about 1200 experimentally realized cell factories from ~100 papers. We furthermore augmented the key design features (e.g., genetic modifications and bioprocess variables) extracted from literature with additional features derived from running the genome-scale metabolic model iML1515 simulations with constraints that match the experimental data. Then, data augmentation and ensemble learning (e.g., support vector machines, gradient boosted trees, and neural networks in a stacked regressor model) are employed to alleviate the challenges of sparse, non-standardized, and incomplete data sets, while multiple correspondence analysis/principal component analysis are used to rank influential factors on bio-production. The hybrid framework demonstrates a reasonably high cross-validation accuracy for prediction of E.coli factory performance metrics under presumed bioprocess and pathway conditions (Pearson correlation coefficients between 0.8 and 0.93 on new data not seen by the model). [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
26. Automatic detection and classification of manufacturing defects in metal boxes using deep neural networks.
- Author
-
Essid, Oumayma, Laga, Hamid, and Samir, Chafik
- Subjects
ARTIFICIAL neural networks ,CLASSIFICATION algorithms ,COMPUTER vision ,COMPUTATIONAL complexity ,SUPPORT vector machines - Abstract
This paper develops a new machine vision framework for efficient detection and classification of manufacturing defects in metal boxes. Previous techniques, which are based on either visual inspection or on hand-crafted features, are both inaccurate and time consuming. In this paper, we show that by using autoencoder deep neural network (DNN) architecture, we are able to not only classify manufacturing defects, but also localize them with high accuracy. Compared to traditional techniques, DNNs are able to learn, in a supervised manner, the visual features that achieve the best performance. Our experiments on a database of real images demonstrate that our approach overcomes the state-of-the-art while remaining computationally competitive. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
27. Ship roll motion prediction based on ℓ1 regularized extreme learning machine.
- Author
-
Guan, Binglei, Yang, Wei, Wang, Zhibin, and Tang, Yinggan
- Subjects
ARTIFICIAL intelligence ,LIFE sciences ,NEUROSCIENCES ,PHYSICAL sciences ,BIOLOGICAL neural networks - Abstract
In this paper, a new method is proposed for prediction of ship roll motion based on extreme learning machine (ELM). To improve the prediction accuracy and avoid over or under fitting, two techniques are adopted to select the appropriate structure of ELM. First, the inputs of the ELM are selected from the roll motion time series using Lipschitz quotient method. Second, the number of hidden layer nodes is determined via ℓ
1 regularized technique. Finally, the ℓ1 regularized ELM is solved by least angle regression (LAR) algorithm. The effectiveness of the proposed method is demonstrated by ship roll motion prediction experiments based on the real measured ship roll motion time series. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
28. Extreme value theory for anomaly detection – the GPD classifier
- Author
-
Vignotto, Edoardo and Engelke, Sebastian
- Published
- 2020
- Full Text
- View/download PDF
29. Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer.
- Author
-
Rani R., Hannah Jessie and Victoire T., Aruldoss Albert
- Subjects
WIND speed ,RADIAL basis functions ,PARTICLE swarm optimization ,EVOLUTIONARY computation ,ALGORITHMS - Abstract
This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
30. TEXT CLASSIFICATION TECHNIQUES: A LITERATURE REVIEW.
- Author
-
Thangaraj, M. and Sivakami, M.
- Subjects
ARTIFICIAL intelligence ,INFORMATION storage & retrieval systems ,ARTIFICIAL neural networks ,MACHINE learning ,SUPPORT vector machines - Abstract
Aim/Purpose The aim of this paper is to analyze various text classification techniques employed in practice, their strengths and weaknesses, to provide an improved awareness regarding various knowledge extraction possibilities in the field of data mining. Background Artificial Intelligence is reshaping text classification techniques to better acquire knowledge. However, in spite of the growth and spread of AI in all fields of research, its role with respect to text mining is not well understood yet. Methodology For this study, various articles written between 2010 and 2017 on "text classification techniques in AI", selected from leading journals of computer science, were analyzed. Each article was completely read. The research problems related to text classification techniques in the field of AI were identified and techniques were grouped according to the algorithms involved. These algorithms were divided based on the learning procedure used. Finally, the findings were plotted as a tree structure for visualizing the relationship between learning procedures and algorithms. Contribution This paper identifies the strengths, limitations, and current research trends in text classification in an advanced field like AI. This knowledge is crucial for data scientists. They could utilize the findings of this study to devise customized data models. It also helps the industry to understand the operational efficiency of text mining techniques. It further contributes to reducing the cost of the projects and supports effective decision making. Findings It has been found more important to study and understand the nature of data before proceeding into mining. The automation of text classification process is required, with the increasing amount of data and need for accuracy. Another interesting research opportunity lies in building intricate text data models with deep learning systems. It has the ability to execute complex Natural Language Processing (NLP) tasks with semantic requirements. Recommendations for Practitioners Frame analysis, deception detection, narrative science where data expresses a story, healthcare applications to diagnose illnesses and conversation analysis are some of the recommendations suggested for practitioners. Recommendation for Researchers Developing simpler algorithms in terms of coding and implementation, better approaches for knowledge distillation, multilingual text refining, domain knowledge integration, subjectivity detection, and contrastive viewpoint summarization are some of the areas that could be explored by researchers. Impact on Society Text classification forms the base of data analytics and acts as the engine behind knowledge discovery. It supports state-of-the-art decision making, for example, predicting an event before it actually occurs, classifying a transaction as 'Fraudulent' etc. The results of this study could be used for developing applications dedicated to assisting decision making processes. These informed decisions will help to optimize resources and maximize benefits to the mankind. Future Research In the future, better methods for parameter optimization will be identified by selecting better parameters that reflects effective knowledge discovery. The role of streaming data processing is still rarely explored when it comes to text classification. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
31. INTELLIGENT MODELLING WITH ALTERNATIVE APPROACH: APPLICATION OF ADVANCED ARTIFICIAL INTELLIGENCE INTO TRAFFIC MANAGEMENT.
- Author
-
Lendel, Viliam, Pancikova, Lucia, Falat, Lukas, and Marcek, Dusan
- Subjects
ARTIFICIAL intelligence ,INTELLIGENT transportation systems ,MACHINE learning ,SUPPORT vector machines ,ARTIFICIAL neural networks - Abstract
The currently existing transport infrastructures are failing due to many problems. This paper deals with presenting a new approach of modelling and forecasting transport processes using artificial intelligence. Firstly, the current state of forecasting transport data is presented; the traditional as well as new artificial intelligence methods, such as artificial neural networks, are discussed and described. After that, a support vector regression prediction model is briefly presented and an empirical analysis is performed. Finally, on the basis of our experiment and performed comparative analysis we state that artificial intelligence (AI) intelligent methods have potential in the transport area as they can improve the efficiency, safety, and environmental compatibility of transport systems. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
32. Sequential Monte Carlo-guided ensemble tracking.
- Author
-
Wang, Yuru, Liu, Qiaoyuan, Jiang, Longkui, Yin, Minghao, and Wang, Shengsheng
- Subjects
MONTE Carlo method ,VISUAL perception ,BAYESIAN analysis ,SUPPORT vector machines ,ARTIFICIAL intelligence - Abstract
A great deal of robustness is allowed when visual tracking is considered as a classification problem. This paper combines a finite number of weak classifiers in a SMC framework as a strong classifier. The time-varying ensemble parameters (confidence of weak classifiers) are regarded as sequential arriving states and their posterior distribution is estimated in a Bayesian manner. Therefore, both the adaptiveness and stability are kept for the ensemble classification in handling scene changes and target deformation. Moreover, to increase the tracking accuracy, weak classifiers including Support Vector Machine (SVM) and Large Margin Distribution Machine (LDM) are combined as a hybrid strong one, with adaptiveness to the sample scales. Comprehensive experiments are performed on benchmark videos with various tracking challenges, and the proposed method is demonstrated to be better than or comparable to the state-of-the-art trackers. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
33. SVM and SVM Ensembles in Breast Cancer Prediction.
- Author
-
Huang, Min-Wei, Chen, Chih-Wen, Lin, Wei-Chao, Ke, Shih-Wen, and Tsai, Chih-Fong
- Subjects
SUPPORT vector machines ,BREAST cancer diagnosis ,MACHINE learning ,RECEIVER operating characteristic curves ,KERNEL functions - Abstract
Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
34. A Personalized Electronic Movie Recommendation System Based on Support Vector Machine and Improved Particle Swarm Optimization.
- Author
-
Wang, Xibin, Luo, Fengji, Qian, Ying, and Ranzi, Gianluca
- Subjects
VIDEO production & direction ,SUPPORT vector machines ,PARTICLE swarm optimization ,INFORMATION filtering systems ,REGRESSION analysis - Abstract
With the rapid development of ICT and Web technologies, a large an amount of information is becoming available and this is producing, in some instances, a condition of information overload. Under these conditions, it is difficult for a person to locate and access useful information for making decisions. To address this problem, there are information filtering systems, such as the personalized recommendation system (PRS) considered in this paper, that assist a person in identifying possible products or services of interest based on his/her preferences. Among available approaches, collaborative Filtering (CF) is one of the most widely used recommendation techniques. However, CF has some limitations, e.g., the relatively simple similarity calculation, cold start problem, etc. In this context, this paper presents a new regression model based on the support vector machine (SVM) classification and an improved PSO (IPSO) for the development of an electronic movie PRS. In its implementation, a SVM classification model is first established to obtain a preliminary movie recommendation list based on which a SVM regression model is applied to predict movies’ ratings. The proposed PRS not only considers the movie’s content information but also integrates the users’ demographic and behavioral information to better capture the users’ interests and preferences. The efficiency of the proposed method is verified by a series of experiments based on the MovieLens benchmark data set. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
35. DyHAP: Dynamic Hybrid ANFIS-PSO Approach for Predicting Mobile Malware.
- Author
-
Afifi, Firdaus, Anuar, Nor Badrul, Shamshirband, Shahaboddin, and Choo, Kim-Kwang Raymond
- Subjects
MALWARE ,MOBILE apps ,FUZZY systems ,PARTICLE swarm optimization ,PROGRAM transformation - Abstract
To deal with the large number of malicious mobile applications (e.g. mobile malware), a number of malware detection systems have been proposed in the literature. In this paper, we propose a hybrid method to find the optimum parameters that can be used to facilitate mobile malware identification. We also present a multi agent system architecture comprising three system agents (i.e. sniffer, extraction and selection agent) to capture and manage the pcap file for data preparation phase. In our hybrid approach, we combine an adaptive neuro fuzzy inference system (ANFIS) and particle swarm optimization (PSO). Evaluations using data captured on a real-world Android device and the MalGenome dataset demonstrate the effectiveness of our approach, in comparison to two hybrid optimization methods which are differential evolution (ANFIS-DE) and ant colony optimization (ANFIS-ACO). [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
36. Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality.
- Author
-
Malhotra, Ruchika and Jain, Ankita
- Subjects
COMPUTER software quality control ,STATISTICS ,MACHINE learning ,OBJECT-oriented methods (Computer science) ,SOFTWARE reliability ,COMPUTER software ,ELECTRONIC file management ,RECEIVER operating characteristic curves ,FAULT location (Engineering) - Abstract
An understanding of quality attributes is relevant for the software organization to deliver high software reliability. An empirical assessment of metrics to predict the quality attributes is essential in order to gain insight about the quality of software in the early phases of software development and to ensure corrective actions. In this paper, we predict a model to estimate fault proneness using Object Oriented CK metrics and QMOOD metrics. We apply one statistical method and six machine learning methods to predict the models. The proposed models are validated using dataset collected from Open Source software. The results are analyzed using Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis. The results show that the model predicted using the random forest and bagging methods outperformed all the other models. Hence, based on these results it is reasonable to claim that quality models have a significant relevance with Object Oriented metrics and that machine learning methods have a comparable performance with statistical methods [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
37. Improving the forecasting performance of temporal hierarchies.
- Author
-
Spiliotis, Evangelos, Petropoulos, Fotios, and Assimakopoulos, Vassilios
- Subjects
MATHEMATICAL functions ,PHYSICAL sciences ,COGNITIVE science ,APPLIED mathematics ,LIFE sciences - Abstract
Temporal hierarchies have been widely used during the past few years as they are capable to provide more accurate coherent forecasts at different planning horizons. However, they still display some limitations, being mainly subject to the forecasting methods used for generating the base forecasts and the particularities of the examined series. This paper deals with such limitations by considering three different strategies: (i) combining forecasts of multiple methods, (ii) applying bias adjustments and (iii) selectively implementing temporal hierarchies to avoid seasonal shrinkage. The proposed strategies can be applied either separately or simultaneously, being complements to the method considered for reconciling the base forecasts and completely independent from each other. Their effect is evaluated using the monthly series of the M and M3 competitions. The results are very promising, displaying lots of potential for improving the performance of temporal hierarchies, both in terms of accuracy and bias. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
38. LOTUS: A single- and multitask machine learning algorithm for the prediction of cancer driver genes.
- Author
-
Collier, Olivier, Stoven, Véronique, and Vert, Jean-Philippe
- Subjects
CANCER genes ,MACHINE learning ,LEARNING strategies ,P53 antioncogene ,PROTEIN-protein interactions ,COMPUTATIONAL biology ,TUMOR suppressor genes - Abstract
Cancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets or biomarkers. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types. In this paper, we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including information about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types. We empirically show that LOTUS outperforms five other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
39. Are screening methods useful in feature selection? An empirical study.
- Author
-
Wang, Mingyuan and Barbu, Adrian
- Subjects
FEATURE selection ,MACHINE learning ,BOOSTING algorithms ,RECEIVER operating characteristic curves ,COGNITIVE science - Abstract
Filter or screening methods are often used as a preprocessing step for reducing the number of variables used by a learning algorithm in obtaining a classification or regression model. While there are many such filter methods, there is a need for an objective evaluation of these methods. Such an evaluation is needed to compare them with each other and also to answer whether they are at all useful, or a learning algorithm could do a better job without them. For this purpose, many popular screening methods are partnered in this paper with three regression learners and five classification learners and evaluated on ten real datasets to obtain accuracy criteria such as R-square and area under the ROC curve (AUC). The obtained results are compared through curve plots and comparison tables in order to find out whether screening methods help improve the performance of learning algorithms and how they fare with each other. Our findings revealed that the screening methods were useful in improving the prediction of the best learner on two regression and two classification datasets out of the ten datasets evaluated. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
40. Empirical mode decomposition based long short-term memory neural network forecasting model for the short-term metro passenger flow.
- Author
-
Chen, Quanchao, Wen, Di, Li, Xuqiang, Chen, Dingjun, Lv, Hongxia, Zhang, Jie, and Gao, Peng
- Subjects
ARTIFICIAL neural networks ,HILBERT-Huang transform ,LOAD forecasting (Electric power systems) ,SHORT-term memory ,INTELLIGENT transportation systems ,NATURAL language processing ,BOX-Jenkins forecasting - Abstract
Short-term metro passenger flow forecasting is an essential component of intelligent transportation systems (ITS) and can be applied to optimize the passenger flow organization of a station and offer data support for metro passenger flow early warning and system management. LSTM neural networks have recently achieved remarkable recent in the field of natural language processing (NLP) because they are well suited for learning from experience to predict time series. For this purpose, we propose an empirical mode decomposition (EMD)-based long short-term memory (LSTM) neural network model for predicting short-term metro inbound passenger flow. The EMD algorithm decomposes the original sequential passenger flow into several intrinsic mode functions (IMFs) and a residual. Selected IMFs that are strongly correlated with the original data can be obtained via feature selection. The selected IMFs and the original data are integrated into inputs for LSTM neural networks, and a single LSTM prediction model and an EMD-LSTM hybrid forecasting model are developed. Finally, historical real automatic fare collection (AFC) data from metro passengers are collected from Chengdu Metro to verify the validity of the proposed EMD-LSTM prediction model. The results indicate that the proposed EMD-LSTM hybrid forecasting model outperforms the LSTM, ARIMA and BPN models. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
41. An improved gray prediction model for China’s beef consumption forecasting.
- Author
-
Zeng, Bo, Li, Shuliang, Meng, Wei, and Zhang, Dehai
- Subjects
BEEF industry ,PREDICTION models ,SUPPLY & demand ,FOOD consumption ,GOVERNMENT policy ,ANIMAL products ,VECTOR error-correction models - Abstract
To balance the supply and demand in China's beef market, beef consumption must be scientifically and effectively forecasted. Beef consumption is affected by many factors and is characterized by gray uncertainty. Therefore, gray theory can be used to forecast the beef consumption, In this paper, the structural defects and unreasonable parameter design of the traditional gray model are analyzed. Then, a new gray model termed, EGM(1,1,r), is built, and the modeling conditions and error checking methods of EGM(1,1,r) are studied. Then, EGM(1,1,r) is used to simulate and forecast China’s beef consumption. The results show that both the simulation and prediction precisions of the new model are better than those of other gray models. Finally, the new model is used to forecast China’s beef consumption for the period from 2019–2025. The findings will serve as an important reference for the Chinese government in formulating policies to ensure the balance between the supply and demand for Chinese beef. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
42. Analytics-statistics mixed training and its fitness to semisupervised manufacturing.
- Author
-
Parashar, Parag, Chen, Chun Han, Akbar, Chandni, Fu, Sze Ming, Rawat, Tejender S., Pratik, Sparsh, Butola, Rajat, Chen, Shih Han, and Lin, Albert S.
- Subjects
MACHINE learning ,STATISTICAL learning ,STATISTICAL models ,SEMICONDUCTOR devices ,MANUFACTURING processes ,SEMICONDUCTOR manufacturing - Abstract
While there have been many studies using machine learning (ML) algorithms to predict process outcomes and device performance in semiconductor manufacturing, the extensively developed technology computer-aided design (TCAD) physical models should play a more significant role in conjunction with ML. While TCAD models have been effective in predicting the trends of experiments, a machine learning statistical model is more capable of predicting the anomalous effects that can be dependent on the chambers, machines, fabrication environment, and specific layouts. In this paper, we use an analytics-statistics mixed training (ASMT) approach using TCAD. Under this method, the TCAD models are incorporated into the machine learning training procedure. The mixed dataset with the experimental and TCAD results improved the prediction in terms of accuracy. With the application of ASMT to the BOSCH process, we show that the mean square error (MSE) can be effectively decreased when the analytics-statistics mixed training (ASMT) scheme is used instead of the classic neural network (NN) used in the baseline study. In this method, statistical induction and analytical deduction can be combined to increase the prediction accuracy of future intelligent semiconductor manufacturing. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
43. Machine learning applied to asteroid dynamics.
- Author
-
Carruba, V., Aljbaae, S., Domingos, R. C., Huaman, M., and Barletta, W.
- Subjects
- *
DEEP learning , *SUPERVISED learning , *MACHINE learning , *ARTIFICIAL neural networks , *ASTEROIDS , *BIOLOGICAL systems , *SOLAR system - Abstract
Machine learning (ML) is the branch of computer science that studies computer algorithms that can learn from data. It is mainly divided into supervised learning, where the computer is presented with examples of entries, and the goal is to learn a general rule that maps inputs to outputs, and unsupervised learning, where no label is provided to the learning algorithm, leaving it alone to find structures. Deep learning is a branch of machine learning based on numerous layers of artificial neural networks, which are computing systems inspired by the biological neural networks that constitute animal brains. In asteroid dynamics, machine learning methods have been recently used to identify members of asteroid families, small bodies images in astronomical fields, and to identify resonant arguments images of asteroids in three-body resonances, among other applications. Here, we will conduct a full review of available literature in the field and classify it in terms of metrics recently used by other authors to assess the state of the art of applications of machine learning in other astronomical subfields. For comparison, applications of machine learning to Solar System bodies, a larger area that includes imaging and spectrophotometry of small bodies, have already reached a state classified as progressing. Research communities and methodologies are more established, and the use of ML led to the discovery of new celestial objects or features, or new insights in the area. ML applied to asteroid dynamics, however, is still in the emerging phase, with smaller groups, methodologies still not well-established, and fewer papers producing discoveries or insights. Large observational surveys, like those conducted at the Zwicky Transient Facility or at the Vera C. Rubin Observatory, will produce in the next years very substantial datasets of orbital and physical properties for asteroids. Applications of ML for clustering, image identification, and anomaly detection, among others, are currently being developed and are expected of being of great help in the next few years. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Exploring performance and robustness of shallow landslide susceptibility modeling at regional scale using different training and testing sets
- Author
-
Conforti, Massimo, Borrelli, Luigi, Cofone, Gino, and Gullà, Giovanni
- Published
- 2023
- Full Text
- View/download PDF
45. Internal validation of Automated Visual Evaluation (AVE) on smartphone images for cervical cancer screening in a prospective study in Zambia.
- Author
-
Hu, Liming, Mwanahamuntu, Mulindi H., Sahasrabuddhe, Vikrant V., Barrett, Caroline, Horning, Matthew P., Shah, Ishan, Laverriere, Zohreh, Banik, Dipayan, Ji, Ye, Shibemba, Aaron Lunda, Chisele, Samson, Munalula, Mukatimui Kalima, Kaunga, Friday, Musonda, Francis, Malyangu, Evans, Hariharan, Karen Milch, and Parham, Groesbeck P.
- Subjects
EARLY detection of cancer ,CERVICAL cancer ,HEALTH facilities ,SMARTPHONES ,HUMAN papillomavirus ,MEDICAL triage ,RURAL health clinics - Abstract
Objectives: Visual inspection with acetic acid (VIA) is a low‐cost approach for cervical cancer screening used in most low‐ and middle‐income countries (LMICs) but, similar to other visual tests, is subjective and requires sustained training and quality assurance. We developed, trained, and validated an artificial‐intelligence‐based "Automated Visual Evaluation" (AVE) tool that can be adapted to run on smartphones to assess smartphone‐captured images of the cervix and identify precancerous lesions, helping augment VIA performance. Design: Prospective study. Setting: Eight public health facilities in Zambia. Participants: A total of 8204 women aged 25–55. Interventions: Cervical images captured on commonly used low‐cost smartphone models were matched with key clinical information including human immunodeficiency virus (HIV) and human papillomavirus (HPV) status, plus histopathology analysis (where applicable), to develop and train an AVE algorithm and evaluate its performance for use as a primary screen and triage test for women who are HPV positive. Main Outcome Measures: Area under the receiver operating curve (AUC); sensitivity; specificity. Results: As a general population screening tool for cervical precancerous lesions, AVE identified cases of cervical precancerous and cancerous (CIN2+) lesions with high performance (AUC = 0.91, 95% confidence interval [CI] = 0.89–0.93), which translates to a sensitivity of 85% (95% CI = 81%–90%) and specificity of 86% (95% CI = 84%–88%) based on maximizing the Youden's index. This represents a considerable improvement over naked eye VIA, which as per a meta‐analysis by the World Health Organization (WHO) has a sensitivity of 66% and specificity of 87%. For women living with HIV, the AUC of AVE was 0.91 (95% CI = 0.88–0.93), and among those testing positive for high‐risk HPV types, the AUC was 0.87 (95% CI = 0.83–0.91). Conclusions: These results demonstrate the feasibility of utilizing AVE on images captured using a commonly available smartphone by nurses in a screening program, and support our ongoing efforts for moving to more broadly evaluate AVE for its clinical sensitivity, specificity, feasibility, and acceptability across a wider range of settings. Limitations of this study include potential inflation of performance estimates due to verification bias (as biopsies were only obtained from participants with visible aceto‐white cervical lesions) and due to this being an internal validation (the test data, while independent from that used to develop the algorithm was drawn from the same study). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Design of a tomato classifier based on machine vision.
- Author
-
Liu, Li, Li, Zhengkun, Lan, Yufei, Shi, Yinggang, and Cui, Yongjie
- Subjects
COMPUTER vision ,COLOR vision ,TOMATOES ,REGRESSION analysis ,LINEAR equations - Abstract
This paper attempts to design an automated, efficient and intelligent tomato grading method that facilitates the graded selling of the fruit. Based on machine vision, the color images of tomatoes with different morphologies were studied, and the color, shape and size were selected as the key features. On this basis, an automated grading classifier was created based on the surface features of tomatoes, and a grading platform was set up to verify the effect of the classifier. Specifically, the Hue value distributions of tomatoes with different maturities were investigated, and the Hue value ranges were determined for mature, semi-mature and immature tomatoes, producing the color classifier. Next, the first-order Fourier descriptor (1D- FD) was adopted to describe the radius sequence of tomato contour, and an equation was established to compute the irregularity of tomato contour, creating the shape classifier. After that, a linear regression equation was constructed to reflect the relationship between the transverse diameters of actual tomatoes and tomato images, and a classifier between large, medium and small tomatoes was produced based on the transverse diameter. Finally, a comprehensive tomato classifier was built based on the color, shape and size diameters. The experimental results show that the mean grading accuracy of the proposed method was 90.7%. This means our method can achieve automated real-time grading of tomatoes. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
47. Genome-Wide Analysis of MDR and XDR Tuberculosis from Belarus: Machine-Learning Approach.
- Author
-
Sergeev, Roman Sergeevich, Kavaliou, Ivan S., Sataneuski, Uladzislau V., Gabrielian, Andrei, Rosenthal, Alex, Tartakovsky, Michael, and Tuzikov, Alexander V.
- Abstract
Emergence of drug-resistant microorganisms has been recognized as a serious threat to public health worldwide. This problem is extensively discussed in the context of tuberculosis treatment. Alterations in pathogen genomes are among the main mechanisms by which microorganisms exhibit drug resistance. Analysis of 144 M. tuberculosis strains of different phenotypes including drug susceptible, MDR, and XDR isolated in Belarus was fulfilled in this paper. A wide range of machine learning methods that can discover SNPs related to drug-resistance in the whole bacteria genomes was investigated. Besides single-SNP testing approaches, methods that allow detecting joint effects from interacting SNPs were considered. We proposed a framework for automated selection of the best performing statistical model in terms of recall, precision, and accuracy to identify drug resistance-associated mutations. Analysis of whole-genome sequences often leads to situations where the number of treated features exceeds the number of available observations. For this reason, special attention is paid to fair evaluation of the model prediction quality and minimizing the risk of overfitting while estimating the underlying parameters. Results of our experiments aimed at identifying top-scoring resistance mutations to the major first-line and second-line anti-TB drugs are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
48. Three-dimensional GPU-accelerated active contours for automated localization of cells in large images.
- Author
-
Lotfollahi, Mahsa, Berisha, Sebastian, Saadatifard, Leila, Montier, Laura, Žiburkus, Jokūbas, and Mayerich, David
- Abstract
Cell segmentation in microscopy is a challenging problem, since cells are often asymmetric and densely packed. Successful cell segmentation algorithms rely identifying seed points, and are highly sensitive to variablility in cell size. In this paper, we present an efficient and highly parallel formulation for symmetric three-dimensional contour evolution that extends previous work on fast two-dimensional snakes. We provide a formulation for optimization on 3D images, as well as a strategy for accelerating computation on consumer graphics hardware. The proposed software takes advantage of Monte-Carlo sampling schemes in order to speed up convergence and reduce thread divergence. Experimental results show that this method provides superior performance for large 2D and 3D cell localization tasks when compared to existing methods on large 3D brain images. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
49. Sentimental text mining based on an additional features method for text classification.
- Author
-
Cheng, Ching-Hsue and Chen, Hsien-Hsiu
- Subjects
SINGULAR value decomposition ,MULTIPLE correspondence analysis (Statistics) ,WEBSITES ,CLASSIFICATION ,SENTIMENT analysis ,TEXT processing (Computer science) - Abstract
Owing to the emergence of the Internet and its rapid growth, people can use mobile devices on many social media platforms (blogs, Facebook forums, etc.), and the platforms provide well-known websites for people to express and share their daily activities and ideas on global issues. Many consumers utilize product review websites before making a purchase. Many well-known websites are searched for relevant product reviews and experiences of product use. We can easily collect large amounts of structured and unstructured product data and further analyze the data to determine the desired product information. For this reason, many researchers are gradually focusing on sentiment analysis or opinion exploration (opinion mining) and use this technique to extract and analyze customer opinions and emotions. This paper proposes a sentimental text mining method based on an additional features method to enhance accuracy and reduce implementation time and uses singular value decomposition and principal component analysis for data dimension reduction. This study has four contributions: (1) the proposed algorithm for preprocessing the data for sentiment classification, (2) the additional features to enhance the accuracy of the sentiment classification, (3) the application of singular value decomposition and principal component analysis for data dimension reduction, and (4) the design of five modules based on different features, with or without stemming, to compare the performance results. The experimental results show that the proposed method has better accuracy than other methods and that the proposed method can decrease the implementation time. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
50. PPCD: Privacy-preserving clinical decision with cloud support.
- Author
-
Ma, Hui, Guo, Xuyang, Ping, Yuan, Wang, Baocang, Yang, Yuehua, Zhang, Zhili, and Zhou, Jingxian
- Subjects
DECISION support systems ,INVESTMENT analysis ,MACHINE learning ,MULTILAYER perceptrons ,CLOUD computing ,PHYSICAL sciences - Abstract
With the prosperity of machine learning and cloud computing, meaningful information can be mined from mass electronic medical data which help physicians make proper disease diagnosis for patients. However, using medical data and disease information of patients frequently raise privacy concerns. In this paper, based on single-layer perceptron, we propose a scheme of privacy-preserving clinical decision with cloud support (PPCD), which securely conducts disease model training and prediction for the patient. Each party learns nothing about the other’s private information. In PPCD, a lightweight secure multiplication is presented and introduced to improve the model training. Security analysis and experimental results on real data confirm the high accuracy of disease prediction achieved by the proposed PPCD without the risk of privacy disclosure. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.