362 results on '"Empirical research"'
Search Results
2. An intelligent risk assessment on prediction of COVID-19 pandemic using DNN and TSA: An empirical case study in Thailand.
- Author
-
Kengpol, Athakorn and Klunngien, Jakkarin
- Subjects
- *
COVID-19 pandemic , *ARTIFICIAL neural networks , *DEEP learning , *RISK assessment , *DATA envelopment analysis , *EMPIRICAL research - Abstract
The WHO has declared that the COVID-19 pandemic is a severe health crisis. Currently, variants of concern are delta and omicron, including sub-lineages of the omicron which are XBB and BQ.1 variants. Decision planning with situation awareness is important during the COVID-19 pandemic, especially demand planning for medical supplies following pandemic probability or severity via pandemic risk assessment. Therefore, this research proposes an intelligent risk assessment on the prediction of the COVID-19 pandemic using deep learning with deep neural network (DNN) and tunicate swarm algorithm (TSA). The results show the model can accurately predict the distance and elapsed time of the next COVID-19 case based upon the previous case and evaluate the associated risks. The contribution of this research, as prediction model is based upon a DNN, it has the ability to learn and by implementing the TSA, it can improve theoretically the performance of the DNN for more precise prediction and faster convergence to the optimal solution. The prediction results are practically expanded to analyze risk assessment using probability and the data envelopment analysis (DEA). The benefit of this research is that the proposed methodology demonstrates the prediction results using risks assessment based upon intelligent risk assessment charts. The Government or those involved can use the proposed methodology to achieve a better decision-making and management to control the COVID-19 pandemic in terms of supplying the medical supplies into pandemic areas. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Exploring ChatGPT's code refactoring capabilities: An empirical study.
- Author
-
DePalma, Kayla, Miminoshvili, Izabel, Henselder, Chiara, Moss, Kate, and AlOmar, Eman Abdullah
- Subjects
- *
CHATGPT , *SOFTWARE refactoring , *DATA structures , *SOFTWARE engineering , *EMPIRICAL research , *SOURCE code - Abstract
ChatGPT has shown great potential in the field of software engineering with its ability to generate code. Yet, ChatGPT's ability to interpret code has been deemed unreliable and faulty, which causes concern for the platform's ability to properly refactor code. To confront this concern, we carried out a study to assess ChatGPT's abilities and limitations in refactoring code. We divided the study into three parts: if ChatGPT can refactor the code, if the refactored code preserves the behavior of the original code segments, and if ChatGPT is capable of providing documentation for the refactored code to provide insights into intent, instructions, and impact. We focused our research specifically on eight quality attributes to use when prompting ChatGPT to refactor our dataset of 40 Java code segments. After collecting the refactored code segments from ChatGPT, as well as data on whether the behavior was preserved, we ran the refactored code through PMD, a source code analyzer, to find programming flaws. We also tested ChatGPT's accuracy in generating documentation for the refactored code and analyzed the difference between the results of each quality attribute. We conclude that ChatGPT can provide many useful refactoring changes that can improve the code quality which is crucial. ChatGPT offered improved versions of the provided code segments 39 out of 40 times even if it is as simple as suggesting clearer names for variables or better formatting. ChatGPT was able to recommend numerous options ranging from minor changes such as renaming methods and variables to major changes such as modifying the data structure. ChatGPT's strengths and accuracy were in suggesting minor changes because it had difficulty addressing and understanding complex errors and operations. Although most of the changes were minor, they made significant improvements because converting loops, simplifying calculations, and removing redundant statements have a crucial effect on runtime, memory, and readability. However, our results also indicate how ChatGPT can be unpredictable in its responses which threatens the reliability of ChatGPT. Asking ChatGPT the same prompt often yields different results, so some outputs were more accurate than others. This makes it difficult to fully access ChatGPT's capabilities due to its variation and inconsistency. Due to ChatGPT's limitations of its reliance on its data set, it lacks understanding of the broader context so it may occasionally make errors and suggest alternations that are neither applicable nor necessary. Overall, ChatGPT has proved to be a beneficial tool for programming as it is capable of providing advantageous suggestions, even if it is on a small scale. However, human programmers are still needed to oversee these changes and determine their significance. ChatGPT should be used as an aid to programmers since we cannot completely depend on it yet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Exploiting causality signals in medical images: A pilot study with empirical results.
- Author
-
Carloni, Gianluca and Colantonio, Sara
- Subjects
- *
BREAST , *COMPUTER-assisted image analysis (Medicine) , *CONVOLUTIONAL neural networks , *DIAGNOSTIC imaging , *MAGNETIC resonance imaging , *EMPIRICAL research - Abstract
We present a novel technique to discover and exploit weak causal signals directly from images via neural networks for classification purposes. This way, we model how the presence of a feature in one part of the image affects the appearance of another feature in a different part of the image. Our method consists of a convolutional neural network backbone and a causality-factors extractor module, which computes weights to enhance each feature map according to its causal influence in the scene. We develop different architecture variants and empirically evaluate all the models on two public datasets of prostate MRI images and breast histopathology slides for cancer diagnosis. We study the effectiveness of our module both in fully-supervised and few-shot learning, we assess its addition to existing attention-based solutions, we conduct ablation studies, and investigate the explainability of our models via class activation maps. Our findings show that our lightweight block extracts meaningful information and improves the overall classification, together with producing more robust predictions that focus on relevant parts of the image. That is crucial in medical imaging, where accurate and reliable classifications are essential for effective diagnosis and treatment planning. • Feature maps from CNNs are exploited to extract causal signals in medical images. • Our causality factors extractor is a new way to embed such information into CNNs. • Feature maps are attended according to their causal influence in the scene. • Our Mulcat option works in both fully-supervised and few-shot learning settings. • Causality-driven CNNs perform better and have more robust visual explanations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Empirical study of outlier impact in classification context.
- Author
-
Khan, Hufsa, Rasheed, Muhammad Tahir, Zhang, Shengli, Wang, Xizhao, and Liu, Han
- Subjects
- *
OUTLIER detection , *DATA distribution , *DATA mining , *EMPIRICAL research , *CLASSIFICATION , *FUZZY clustering technique , *CENTROID - Abstract
In the field of data mining, outlier detection is an important and challenging task. This paper focuses on studying the impacts of outliers on the performance of models learned from large-scale datasets while it is unknown whether a data point is outlier or not. In the proposed approach, a fuzzy c-means clustering algorithm is applied and outliers are defined as those that have greater membership values in the cluster and are located further away from the cluster centroid. Ideally, the sample with a higher membership value should be located closer to the cluster centroid. In this context, we calculate the weight of each sample using the AdaBoost algorithm, where a weight determines the representativeness of each sample within the data distribution. Additionally, in this study, the impact of weighted loss functions in different situations are discussed in detail. At last, our method is evaluated on 12 UCI datasets and the accuracy of our method is greater than 95% on some datasets, such as banknote 99.99%, biodeg 99.01%, optdigits 99.19%, and letters 97.37%. The experimental results show the efficiency and effectiveness of the proposed approach. • Outlier impact is investigated in a classification context. • Outliers are identified as having higher membership and found outside cluster center. • Different weighted loss functions are also discussed in detail in various situations. • The efficiency and effectiveness of the proposed method are verified experimentally. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Fuzzy cognitive map and mean square method in empirical modeling: Application in economics.
- Author
-
Rotshtein, Alexander, Yosef, Arthur, Neskorodeva, Tatiana, and Katielnikov, Denys
- Subjects
- *
COGNITIVE maps (Psychology) , *BANKING industry , *LEAST squares , *EMPIRICAL research , *GENETIC algorithms , *REGRESSION analysis , *INPUT-output analysis - Abstract
This article offers a Fuzzy Cognitive Map (FCM) - based method of empirical modeling which can be considered as an alternative to multivariate regression analysis for the input–output relationship extraction from experimental data. Vertices of the FCM graph are interpreted as input–output variables of the empirical model, and the weights of arcs are unknown parameters that are estimated in two stages. The first stage consists of the arcs weights estimating based on the proximity of the columns' values in the input–output observation data table. At the second stage, the observation table is used for the offline weights adjustment using the least squares method and the optimization procedure by genetic algorithm. The method is illustrated by the example of the World Bank data testing the relations between Gross Domestic Product per Capita vs demographic, educational, economic, technological and other factors. The novelty of the method in comparison to the Soft Regression technique lies in the utilization of the FCM arcs weights adjustment according to the least squares criterion. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Development of technology predicting based on EEMD-GRU: An empirical study of aircraft assembly technology.
- Author
-
Zhang, Huyi, Feng, Lijie, Wang, Jinfeng, and Gao, Na
- Subjects
- *
DEEP learning , *HILBERT-Huang transform , *EMPIRICAL research , *STANDARD deviations , *TECHNOLOGY transfer , *DECOMPOSITION method - Abstract
Technology prediction has been the subject of many prior studies, which have the issues of the long-term effectiveness, the high uncertainty, and the low predictive accuracy. To address these problems, this study developed a model based on a mixed neural network that combines the Ensemble Empirical Mode Decomposition (EEMD) signal decomposition method with the Gated Recurrent Unit (GRU) deep learning model. In this study, the literature data is first preprocessed using Latent Dirichlet Allocation (LDA) topic modeling, and clusters of key technology topics are obtained accordingly. Secondly, within the identified technology topics, the EEMD signal processing method is employed to decompose complex time-series data into simpler subsequences, and GRU prediction models are established. Thirdly, the ultimate technological prediction results are obtained by integrating each subsequence's prediction results. In addition, Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) were used to evaluate the prediction results. Finally, the field of aircraft assembly technology is analyzed as a case study. The results show that the EEMD-GRU hybrid model excels in prediction accuracy, and brings a new perspective and method to the field of technological prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Conformal prediction of option prices.
- Author
-
Bastos, João A.
- Subjects
- *
PRICES , *STOCK options , *OPTIONS (Finance) , *FORECASTING , *EMPIRICAL research - Abstract
The uncertainty associated with option price predictions has largely been overlooked in the literature. This paper aims to fill this gap by quantifying such uncertainty using conformal prediction. Conformal prediction is a model-agnostic procedure that constructs prediction intervals, ensuring valid coverage in finite samples without relying on distributional assumptions. Through the simulation of synthetic option prices, we find that conformal prediction generates prediction intervals for gradient boosting machines with an empirical coverage close to the nominal level. Conversely, non-conformal prediction intervals exhibit empirical coverage levels that fall short of the nominal target. In other words, they fail to contain the actual option price more frequently than expected for a given coverage level. As anticipated, we also observe a decrease in the width of prediction intervals as the size of the training data increases. However, we uncover significant variations in the width of these intervals across different options. Specifically, out-of-the-money options and those with a short time-to-maturity exhibit relatively wider prediction intervals. Then, we perform an empirical study using American call and put options on individual stocks. We find that the empirical results replicate those obtained in the simulation experiment. • Conformal prediction quantifies well the uncertainty of option price predictions. • Conformal prediction intervals have empirical coverage near the nominal level. • Non-conformal prediction intervals have empirical coverage below the nominal level. • Large variations in prediction intervals are found for American call and put options. • Larger intervals are found for out-of-the-money and short time-to-maturity options. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. DEA cross-efficiency models with prospect theory and distance entropy: An empirical study on high-tech industries.
- Author
-
Chen, Xiaoqing, Liu, Xinwang, Zhu, Qingyuan, and Wang, Zhiwei
- Subjects
- *
DATA envelopment analysis , *PROSPECT theory , *HIGH technology industries , *MODEL theory , *ENTROPY , *EMPIRICAL research - Abstract
• An evaluation model is proposed by considering risk attitudes and aggregation method. • Our distance entropy function reduces the inconsistent degree of evaluation results. • Empirical example is performed to validate the applicability of the proposed model. • Sensitivity analysis is conducted to test the effect of parameter changes on results. Cross-efficiency evaluation with the data envelopment analysis (DEA) model is an effective way to assess performance and provide a complete ranking of decision-making units. However, it is generally assumed that decision makers are perfectly rational in the cross-efficiency model, which fails to consider the subjective preferences of decision-makers. Moreover, the arithmetic average method is usually adopted to aggregate efficiency scores in traditional cross-efficiency methods, which underestimates the importance of self-evaluation. To address these issues, we extend cross-efficiency with the DEA model by incorporating prospect theory and the distance entropy function. First, we calculate the prospect values of decision-making units to describe the non-rational subjective preferences under the risk of decision-makers. Second, based on prospect cross-efficiency, a new distance entropy function is developed to aggregate the ultimate prospect cross-efficiency values. More specifically, some traditional cross-efficiency evaluation models can be considered special cases of prospect cross-efficiency models with appropriate adjustments to the parameters. Finally, an empirical example is used to evaluate the prospect cross-efficiency results with the high-tech industries of 29 provinces in China to illustrate the applicability and effectiveness of the proposed model in ranking observations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Assessing the environmental efficiency with complex structure and component processes: An empirical analysis of regional industries in China.
- Author
-
Wu, Jie, Ke, Yue, Zhou, Zhixiang, and Xu, Guangcheng
- Subjects
- *
EMPIRICAL research , *TOBITS , *RETURNS to scale , *ENVIRONMENTAL economics ,ECONOMIC conditions in China - Abstract
Over recent decades, China's economy has continued to develop steadily and rapidly, contributing to a large environmental cost. Therefore, many researchers propose various methods to evaluate environmental efficiency to deal with these serious issues accurately. This paper is the first to evaluate environmental efficiency using a model featuring not only a parallel structure with three component processes (primary, secondary, and tertiary industries) in the first stage but also a sewage treatment process in the second stage. Simultaneously, we propose radial but also nonradial versions of the Environmental Multiple Variable Return to Scale (EMVRS) model. These two models both consider shared inputs as well as shared outputs, and the proportion of the shared inputs and shared outputs utilized in each component process is assumed to be unknown. Additionally, we further analyze the factors influencing environmental efficiency by utilizing the Tobit model. The results when EMVRS is applied to Chinese provinces can be divided into 3 parts: (1) Most provinces do not purify wastewater appropriately contributing to the bad environmental efficiency. (2) The efficiency values of the three industry types are similar or identical to each other in most provinces. (3) The population of people employed negatively impacts environmental efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Platform collaboration patterns, network structure, and innovation performance: An NK simulation approach.
- Author
-
Tang, Fangcheng, Qian, Zeqiang, Cheng, Liyan, Baig, Jibal, and Cui, Fushang
- Subjects
- *
TECHNOLOGICAL innovations , *EMPIRICAL research , *DIFFUSION of innovations , *SIMULATION methods & models - Abstract
Prior research lacks understanding about how collaborative arrangements and network structures influence innovation outcomes in platform-based ecosystems. This paper addresses this gap by using an NK simulation model to investigate the impacts of different collaboration patterns and network structures on innovation performance. The NK simulation approach overcomes the shortcomings of empirical methods and enables examining the dynamic impacts. The results reveal that the "special platform + generic complementor" pattern leads to the highest innovation output. The impact of component correlations on the innovation performance follows an inverted U-shape. The small world network structure promotes innovation versus regular or random networks. The results provide novel theoretical insights into strategically configuring platform partnerships and network connections to optimize innovation. The findings offer practical guidance for firms to choose beneficial collaboration pattern and design proper network structure. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. The effects of modular process models on gaze patterns - A follow-up investigation about modularization in process model literacy.
- Author
-
Winter, Michael and Pryss, Rüdiger
- Subjects
- *
GAZE , *MODULAR design , *INFORMATION overload , *INFORMATION storage & retrieval systems , *LITERACY , *EYE movements - Abstract
Advanced organizational processes today require many process-relevant artifacts to be formally represented in process models. As these processes become more complex, process designers face the challenge of maintaining high comprehensibility while adequately visualizing process-relevant information. Modularization has been introduced to address the increasing complexity and information overload of process models, tackling this issue. While research acknowledges the importance of process model comprehensibility in order to benefit from the merits of such models, there still needs to be more clarity about the benefits of process model modularization. Recent research has focused on the human visual system and investigated visual behavior in the context of process model literacy. Continuing this branch of research, this paper analyses gaze behavior when comprehending modularized process models. A follow-up eye-tracking study was conducted to comprehend differently modularized BPMN 2.0 process models (i.e., vertical, horizontal, orthogonal). The results indicated that process model readers comprehend modularization approaches differently. Vertical modularization was perceived as the best comprehensible approach, whereas horizontal received the least acceptance. Considered eye movement performance measures confirm prior observation that attention is deployed more effectively in vertical modularization. Based upon the findings, the paper proposes a revised theoretical model from prior work that relates visual tasks to the execution of visual routines, explaining comprehension and information processing in modularized process models. The theoretical model elaborates how the visual system process and comprehends information obtained from modularized process models. Furthermore, the paper derived implications for modeling and practical applications, pointing out solid representational points and shortcomings of each modularization approach regarding how modularized process models are read and comprehended. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Portfolio allocation with the TODIM method.
- Author
-
Alali, Fatih and Tolga, A. Cagri
- Subjects
- *
BUSINESS turnover , *FUZZY decision making , *PARAMETER estimation , *INVESTORS , *EMPIRICAL research - Abstract
Highlights • Empirical analysis of TODIM approach for portfolio allocation is presented. • Detailed application of portfolio allocation with TODIM is provided. • Sharpe ratio generated by TODIM is better than 1/ N portfolios and similar to MVP. • TODIM performs better than MVP from turnover and concentration risk perspective. • TODIM method could be applied as an efficient portfolio screening mechanism. Abstract The aim of this study is to adapt a well-known interactive and multi-criteria decision-making method, TODIM, to the portfolio allocation process. The proposed method is applied to empirical US equity data by employing variance, correlation and returns calculated on different observation periods as decision criteria. A total of 440 different configurations are applied to analyze the impact of several parameters in TODIM. Based on the results for the test period, outperforming TODIM configurations are elected. In the validation period, it is empirically demonstrated that portfolios based on outperforming TODIM configurations yield significantly better results than equally weighted portfolios (1/ N) and insignificantly inferior results than the minimum variance portfolio (MVP) in terms of the Sharpe ratio. However, TODIM may still be a better choice than MVP for investors sensitive to concentration risk and turnover costs. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. A paired neural network model for tourist arrival forecasting.
- Author
-
Yao, Yuan, Cao, Yi, Ding, Xuemei, Zhai, Jia, Liu, Junxiu, Luo, Yuling, Ma, Shuai, and Zou, Kailin
- Subjects
- *
ARTIFICIAL neural networks , *PREDICTION models , *TOURISTS , *EMPIRICAL research , *ECONOMIC development - Abstract
Highlights • We developed a novel structural neural network (sNN) model for forecasting tourism demand. • The sNN captures the trend and seasonal patterns of tourism demand accurately. • Empirical results show a significantly superior performance of sNN to benchmark models. Abstract Tourist arrival and tourist demand forecasting are a crucial issue in tourism economy and the community economic development as well. Tourist demand forecasting has attracted much attention from tourism academics as well as industries. In recent year, it attracts increasing attention in the computational literature as advances in machine learning method allow us to construct models that significantly improve the precision of tourism prediction. In this paper, we draw upon both strands of the literature and propose a novel paired neural network model. The tourist arrival data is decomposed by two low-pass filters into long-term trend and short-term seasonal components, which are then modelled by a pair of autoregressive neural network models as a parallel structure. The proposed model is evaluated by the tourist arrival data to United States from twelve source markets. The empirical studies show that our proposed paired neural network model outperforming the selected benchmark model across all error measures and over different horizons. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
15. Eliciting and utilising knowledge for security event log analysis: An association rule mining and automated planning approach.
- Author
-
Khan, Saad and Parkinson, Simon
- Subjects
- *
AUTOMATED planning & scheduling , *DATA logging , *DATA security failures , *DECISION making , *EMPIRICAL research - Abstract
Vulnerability assessment and security configuration activities are heavily reliant on expert knowledge. This requirement often results in many systems being left insecure due to a lack of analysis expertise and access to specialist resources. It has long been known that a system’s event logs provide historical information depicting potential security breaches, as well as recording configuration activities. However, identifying and utilising knowledge within the event logs is challenging for the non-expert. In this paper, a novel technique is developed to process security event logs of a computer that has been assessed and configured by a security professional, extract key domain knowledge indicative of their expert decision making, and automatically apply learnt knowledge to previously unseen systems by non-experts. The technique converts event log entries into an object-based model and dynamically extracts associative rules. The rules are further improved in terms of quality using a temporal metric to autonomously establish temporal-association rules and acquire a domain model of expert configuration tasks. The acquired domain model and problem instance generated from a previously unseen system can then be used to produce a plan-of-action, which can be exploited by non-professionals to improve their system’s security. Empirical analysis is subsequently performed on 20 event logs, where identified plan traces are discussed in terms of accuracy and performance. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
16. An empirical study of pattern leakage impact during data preprocessing on machine learning-based intrusion detection models reliability.
- Author
-
Bouke, Mohamed Aly and Abdullah, Azizol
- Subjects
- *
MACHINE learning , *LEAKAGE , *SUPPORT vector machines , *EMPIRICAL research , *RANDOM forest algorithms , *DECISION trees - Abstract
In this paper, we investigate the impact of pattern leakage during data preprocessing on the reliability of Machine Learning (ML) based intrusion detection systems (IDS). Data leakage, also known as pattern leakage, occurs during data preprocessing when information from the testing set is used in training, leading to overfitting and inflated accuracy scores. Our study uses three well-known intrusion detection datasets: NSL-KDD, UNSW-NB15, and KDDCUP99. We preprocess the data to create versions with and without pattern leakage and train and test six ML models: Decision Tree (DT), Gradient Boosting (GB), K-neighbours (KNN), Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR). Our results show that building IDS models with data leakage leads to higher accuracy but is unreliable. Additionally, we find that some algorithms are more sensitive to data leakage than others, as seen by the drop in model accuracy when built without leakage. To address this problem, we provide suggestions for mitigating data leakage in the training process and analyzing the sensitivity of different algorithms. Overall, our study emphasizes the importance of addressing data leakage in the training process to ensure the reliability of ML-based IDS models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Link prediction in heterogeneous networks based on metapath projection and aggregation.
- Author
-
Zhao, Yuncong, Sun, Yiyang, Huang, Yaning, Li, Longjie, and Dong, Hu
- Subjects
- *
FORECASTING , *NETWORK analysis (Planning) , *EMPIRICAL research - Abstract
A heterogeneous network, which contains multiple types of nodes and edges, is a special kind of network. Link prediction in heterogeneous networks is a consistently interesting research topic owing to its practical value in various applications. In this work, we present an end-to-end link prediction method for heterogeneous networks. By leveraging the metapath projection and semantic graph aggregation, the proposed method can learn the embeddings of node pairs from different metapaths. Specifically, the proposed method projects a heterogeneous network into multiple semantic graphs based on a number of metapaths, and then learns the embedding of a node pair from a probability subgraph extracted in each semantic graph via a graph neural network. Afterward, a semantic aggregation module is designed to combine the embeddings of the node pair obtained from multiple semantic graphs using an attention mechanism. Empirical study manifests that the accuracy of the proposed link prediction method is superior to that of the competing methods. • A new end-to-end method is proposed for link prediction in heterogeneous networks. • The probability subgraph is defined to extract the related nodes. • Embeddings of node pairs are learned using metapath projection and semantic graph aggregation. • Experiments show the proposed method achieves superior performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. A multi-attribute fusion approach extending Dempster–Shafer theory for combinatorial-type evidences.
- Author
-
Sun, Lin and Wang, Yanzhang
- Subjects
- *
DEMPSTER-Shafer theory , *COMBINATORICS , *EMERGENCY management , *RELIABILITY in engineering , *EMPIRICAL research - Abstract
Human cognition of the world generally begins with attribute perception of things. Knowledge element, which is able to reveal microscopic regularities by attribute network, provides an intelligent support for attribute-based cognition. In the field of emergency management, knowledge element has been widely used in evolution rules, risk analysis and machine learning based on emergency cases. However, since the knowledge of emergency management is multidisciplinary and in addition, limited by diverse cognitive perspectives and language expressions, integrating heterogeneous knowledge from domain experts and data sources for depth mining and decision support seems to be difficult. Especially, with the advent of big data, the problem of developing an efficient multi-attribute fusion method to reorganize complex and massive data in a consensual knowledge framework must be addressed. In this paper, a novel mathematical approach, which extends Dempster–Shafer theory to fuse combinatorial-type evidences, is elaborated to handle multi-attribute integration through the use of knowledge element model. This methodology makes it possible to establish a complete knowledge structure for attribute description of things by implementing new uncertainty measures to determine a degree of belief when combining evidences. It is meaningful to optimize the fusion algorithm in view of reliability and expansibility to some extent. Furthermore, the processing also has the advantage of being effective without any semantic preprocessing. The application of the proposed model is shown in marine disaster monitoring for emergency management. We make an empirical analysis in the attribute fusion of knowledge element “sea”. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. Electricity price estimation using deep learning approaches: An empirical study on Turkish markets in normal and Covid-19 periods.
- Author
-
Kaya, Mustafa, Karan, Mehmet Baha, and Telatar, Erdinç
- Subjects
- *
ELECTRICITY pricing , *STANDARD deviations , *PRICES , *ELECTRICITY markets , *EMPIRICAL research , *DEEP learning - Abstract
This study aims to estimate the prices in the next 24 h with deep learning methods in the Turkish electricity market. The model is based on hourly data for the period 2017–2021 using electricity prices. The model's Root Mean Square Error (RMSE) value is 3.14, and the explanatory power R2 is 0.94. Since this model also considers the subgroups in the database, it can make price predictions for the pandemic period. To test the robustness and consistency of the model, twelve RNN-based models were re-estimated with the same data set. Although all models successfully predict the prices, The TEDSE Model performs better than the others. This study will be especially beneficial to electricity market players and policymakers. In further studies, the TEDSE model can be used for price prediction in intraday energy markets. This study's most important contribution is methodology innovation, using the Transformer Encoder-Decoder with Self-Attention (TEDSE) model for the first time to estimate electricity prices. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. An empirical study of dynamic selection and random under-sampling for the class imbalance problem.
- Author
-
Liu, Shuhua Monica, Chen, Jiun-Hung, and Liu, Zhiheng
- Subjects
- *
EMPIRICAL research , *RESEARCH questions , *SCIENTIFIC community , *RANDOM forest algorithms - Abstract
A detailed and extensive empirical study of dynamic selection (DS) and random under-sampling (RUS) for the class imbalance problem is conducted in this paper. Total 20 state of the art DS methods are compared on 54 datasets. The empirical results clearly answer the following six key research questions in this direction, (1) how performances of ensembles with and without RUS compare with respect to different DS and static ensemble (SE) methods (2) how whether RUS is used affects performances of ensembles with respect to different DS/SE methods (3) how performances of different DS/SE methods compare with respect to ensembles with and without RUS (4) whether DS methods perform better than SE methods no matter whether RUS is used and what types of ensembles are used (5) how numbers of base classifiers affect how performances of ensembles with and without RUS compare with respect to different DS/SE methods (6) how numbers of base classifiers affect how performances of different DS/SE methods compare with respect to ensembles with and without RUS. The answers to the six research questions based on the experimental results in this study and the experimental findings are the main contributions of this work. • An extensive empirical study of DES with RUS for class imbalance is conducted. • Total 20 state of the art dynamic selection methods are compared on 54 datasets. • Three types of ensembles, bagging, random forests and AdaBoosting, are compared. • The experimental results and findings are important for the DES research community. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning.
- Author
-
Douzas, Georgios and Bacao, Fernando
- Subjects
- *
BIG data , *LEARNING , *ALGORITHMS , *EMPIRICAL research , *SAMPLING (Process) - Abstract
Learning from imbalanced datasets is challenging for standard algorithms, as they are designed to work with balanced class distributions. Although there are different strategies to tackle this problem, methods that address the problem through the generation of artificial data constitute a more general approach compared to algorithmic modifications. Specifically, they generate artificial data that can be used by any algorithm, not constraining the options of the user. In this paper, we present a new oversampling method, Self-Organizing Map-based Oversampling (SOMO), which through the application of a Self-Organizing Map produces a two dimensional representation of the input space, allowing for an effective generation of artificial data points. SOMO comprises three major stages: Initially a Self-Organizing Map produces a two-dimensional representation of the original, usually high-dimensional, space. Next it generates within-cluster synthetic samples and finally it generates between cluster synthetic samples. Additionally we present empirical results that show the improvement in the performance of algorithms, when artificial data generated by SOMO are used, and also show that our method outperforms various oversampling methods. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
22. A composite spatio-temporal modeling approach for age invariant face recognition.
- Author
-
Alvi, Fahad Bashir and Pears, Russel
- Subjects
- *
HUMAN facial recognition software , *TEXTURE analysis (Image processing) , *INFORMATION theory , *DATA analysis , *TWO-phase flow , *EMPIRICAL research , *ARTIFICIAL neural networks - Abstract
In this research we propose a novel method of face recognition based on texture and shape information. Age invariant face recognition enables matching of an image obtained at a given point in time against an image of the same individual obtained at an earlier point in time and thus has important applications, notably in law enforcement. We investigate various types of models built on different levels of data granularity. At the global level a model is built on training data that encompasses the entire set of available individuals, whereas at the local level, data from homogeneous sub-populations is used and finally at the individual level a personalized model is built for each individual. We narrow down the search space by dividing the whole database into subspaces for improving recognition time. We use a two-phased process for age invariant face recognition. In the first phase we identify the correct subspace by using a probabilistic method, and in the second phase we find the probe image within that subspace. Finally, we use a decision tree approach to combine models built from shape and texture features. Our empirical results show that the local and personalized models perform best when rated on both Rank-1 accuracy and recognition time. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
23. Dealing with endogeneity in data envelopment analysis applications.
- Author
-
Santín, Daniel and Sicilia, Gabriela
- Subjects
- *
ENDOGENEITY (Econometrics) , *DATA envelopment analysis , *PRODUCTION (Economic theory) , *EMPIRICAL research , *MONTE Carlo method - Abstract
Although the presence of the endogeneity is frequently observed in economic production processes, it tends to be overlooked when practitioners apply data envelopment analysis (DEA). In this paper we deal with this issue in two ways. First, we provide a simple statistical heuristic procedure that enables practitioners to identify the presence of endogeneity in an empirical application. Second, we propose the use of an instrumental input DEA (II-DEA) as a potential tool to address this problem and thus improve DEA estimations. A Monte Carlo experiment confirms that the proposed II-DEA approach outperforms standard DEA in finite samples under the presence of high positive endogeneity. To illustrate our theoretical findings, we perform an empirical application on the education sector. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
24. Candidate project selection in cross project defect prediction using hybrid method.
- Author
-
Kanwar, Shailza, Awasthi, Lalit Kumar, and Shrivastava, Vivek
- Subjects
- *
NAIVE Bayes classification , *FORECASTING , *EMPIRICAL research , *PREDICTION models , *RECOMMENDER systems , *STATISTICAL bootstrapping - Abstract
Cross Project Defect Prediction (CPDP) is a process that develops a defect prediction model on source projects and then applies the same model to the target project. Day by day, new software projects are being developed, so selecting an appropriate training project from existing projects and from new upcoming projects, to train a predictor model is a challenging task in CPDP. In the present study, we have proposed a hybrid selection method to select a candidate project from existing projects and a probabilistic method to select a candidate project from new projects. The proposed hybrid method is a weighted combination of the Collaborative filtering (CF) method and the Content Based (CB) method. The probabilistic method is based on a Naïve Bayes classifier and is used to predict the relation between the target project and the new target project. In the CF method, a usability score is generated for each project by making use of classification techniques, and the CB method calculates the matching score of candidate projects by using the K-dimensional tree. Finally, both the methods are combined by parallelized hybridization design, and weights for the proposed method are estimated with an empirical bootstrapping method. The score generated by the proposed hybrid technique is then used to identify the most suitable candidate project for the new project. The experimental results show that the suggestion list of the best three candidate projects is consistent when employing different classifiers. The recommendation performance is evaluated in terms of F-score and Mean Average Precision (MAP), and the proposed method has shown improved performance as compared to the existing methods in both terms. • Recommender system is proposed to recommend training data for target project in CPDP. • Five machine algorithms are employed to validate the performance of proposed system. • 13 software projects from the Jureczko datasets are taken to conduct experiments • It has improved the CFPS method by in terms of F-score and MAP resp. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Building detection from orthophotos using a machine learning approach: An empirical study on image segmentation and descriptors.
- Author
-
Dornaika, Fadi, Moujahid, Abdelmalik, El Merabet, Youssef, and Ruichek, Yassine
- Subjects
- *
IMAGE segmentation , *MACHINE learning , *EMPIRICAL research , *PIXELS , *DATA extraction - Abstract
Building detection from aerial images has many applications in fields like urban planning, real-estate management, and disaster relief. In the last two decades, a large variety of methods on automatic building detection have been proposed in the remote sensing literature. Many of these approaches make use of local features to classify each pixel or segment to an object label, therefore involving an extra step to fuse pixelwise decisions. This paper presents a generic framework that exploits recent advances in image segmentation and region descriptors extraction for the automatic and accurate detection of buildings on aerial orthophotos. The proposed solution is supervised in the sense that appearances of buildings are learnt from examples. For the first time in the context of building detection, we use the matrix covariance descriptor, which proves to be very informative and compact. Moreover, we introduce a principled evaluation that allows selecting the best pair segmentation algorithm-region descriptor for the task of building detection. Finally, we provide a performance evaluation at pixel level using different classifiers. This evaluation is conducted over 200 buildings using different segmentation algorithms and descriptors. The performance analysis quantifies the quality of both the image segmentation and the descriptor used. The proposed approach presents several advantages in terms of scalability, suitability and simplicity with respect to the existing methods. Furthermore, the proposed scheme (detection chain and evaluation) can be deployed for detecting multiple object categories that are present in images and can be used by intelligent systems requiring scene perception and parsing such as intelligent unmanned aerial vehicle navigation and automatic 3D city modeling. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
26. Detection of fake opinions using time series.
- Author
-
Heydari, Atefeh, Tavakoli, Mohammadali, and Salim, Naomie
- Subjects
- *
ELECTRONIC commerce , *PATTERN recognition systems , *DATA mining , *TIME series analysis , *EMPIRICAL research - Abstract
Today's e-commerce is highly depended on increasingly growing online customers’ reviews posted in opinion sharing websites. This fact, unfortunately, has tempted spammers to target opinion sharing websites in order to promote and demote products. To date, different types of opinion spam detection methods have been proposed in order to provide reliable resources for customers, manufacturers and researchers. However, supervised approaches suffer from imbalance data due to scarcity of spam reviews in datasets, rating deviation based filtering systems are easily cheated by smart spammers, and content based methods are very expensive and majority of them have not been tested on real data hitherto. The aim of this paper is to propose a robust review spam detection system wherein the rating deviation, content based factors and activeness of reviewers are employed efficiently. To overcome the aforementioned drawbacks, all these factors are synthetically investigated in suspicious time intervals captured from time series of reviews by a pattern recognition technique. The proposed method could be a great asset in online spam filtering systems and could be used in data mining and knowledge discovery tasks as a standalone system to purify product review datasets. These systems can reap benefit from our method in terms of time efficiency and high accuracy. Empirical analyses on real dataset show that the proposed approach is able to successfully detect spam reviews. Comparison with two of the current common methods, indicates that our method is able to achieve higher detection accuracy (F-Score: 0.86) while removing the need for having specific fields of Meta data and reducing heavy computation required for investigation purposes. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
27. Nonparametric machine learning models for predicting the credit default swaps: An empirical study.
- Author
-
Son, Youngdoo, Byun, Hyeongmin, and Lee, Jaewook
- Subjects
- *
MACHINE learning , *NONPARAMETRIC estimation , *PREDICTION models , *CREDIT default swaps , *EMPIRICAL research - Abstract
Credit default swap which reflects the credit risk of a firm is one of the most frequently traded credit derivatives. In this paper, we conduct a comprehensive study to verify the predictive performance of nonparametric machine learning models and two conventional parametric models on the daily credit default swap spreads of different maturities and different rating groups, from AA to C. The whole period of data set used in this study runs from January 2001 to February 2014, which includes the global financial crisis period when the credit risk of firms were very high. Through experiments, it is shown that most nonparametric models used in this study outperformed the parametric benchmark models in terms of prediction accuracy as well as the practical hedging measures irrespective of the different credit ratings of the firms and the different maturities of their spreads. Especially, artificial neural networks showed better performance than the other parametric and nonparametric models. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
28. Empirical distributions of daily equity index returns: A comparison.
- Author
-
Corlu, Canan G., Meterelliyoz, Melike, and Tiniç, Murat
- Subjects
- *
EQUITY indexed annuities , *GAUSSIAN distribution , *EMPIRICAL research , *COMPARATIVE studies , *SYSTEMS theory - Abstract
The normality assumption concerning the distribution of equity returns has long been challenged both empirically and theoretically. Alternative distributions have been proposed to better capture the characteristics of equity return data. This paper investigates the ability of five alternative distributions to represent the behavior of daily equity index returns over the period 1979–2014: the skewed Student- t distribution, the generalized lambda distribution, the Johnson system of distributions, the normal inverse Gaussian distribution, and the g-and-h distribution. We find that the generalized lambda distribution is a prominent alternative for modeling the behavior of daily equity index returns. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
29. Associative learning on imbalanced environments: An empirical study.
- Author
-
Cleofas-Sánchez, L., Sánchez, J.S., García, V., and Valdovinos, R.M.
- Subjects
- *
ASSOCIATIVE learning , *EMPIRICAL research , *COMPUTATIONAL complexity , *ARTIFICIAL neural networks , *PROBABILITY theory - Abstract
Associative memories have emerged as a powerful computational neural network model for several pattern classification problems. Like most traditional classifiers, these models assume that the classes share similar prior probabilities. However, in many real-life applications the ratios of prior probabilities between classes are extremely skewed. Although the literature has provided numerous studies that examine the performance degradation of renowned classifiers on different imbalanced scenarios, so far this effect has not been supported by a thorough empirical study in the context of associative memories. In this paper, we fix our attention on the applicability of the associative neural networks to the classification of imbalanced data. The key questions here addressed are whether these models perform better, the same or worse than other popular classifiers, how the level of imbalance affects their performance, and whether distinct resampling strategies produce a different impact on the associative memories. In order to answer these questions and gain further insight into the feasibility and efficiency of the associative memories, a large-scale experimental evaluation with 31 databases, seven classification models and four resampling algorithms is carried out here, along with a non-parametric statistical test to discover any significant differences between each pair of classifiers. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
30. Evaluating machine learning classification for financial trading: An empirical approach.
- Author
-
Gerlein, Eduardo A., McGinnity, Martin, Belatreche, Ammar, and Coleman, Sonya
- Subjects
- *
MACHINE learning , *TRADING bands (Securities) , *EMPIRICAL research , *COMPUTATIONAL complexity , *SUPPORT vector machines , *ARTIFICIAL neural networks - Abstract
Technical and quantitative analysis in financial trading use mathematical and statistical tools to help investors decide on the optimum moment to initiate and close orders. While these traditional approaches have served their purpose to some extent, new techniques arising from the field of computational intelligence such as machine learning and data mining have emerged to analyse financial information. While the main financial engineering research has focused on complex computational models such as Neural Networks and Support Vector Machines, there are also simpler models that have demonstrated their usefulness in applications other than financial trading, and are worth considering to determine their advantages and inherent limitations when used as trading analysis tools. This paper analyses the role of simple machine learning models to achieve profitable trading through a series of trading simulations in the FOREX market. It assesses the performance of the models and how particular setups of the models produce systematic and consistent predictions for profitable trading. Due to the inherent complexities of financial time series the role of attribute selection, periodic retraining and training set size are discussed in order to obtain a combination of those parameters not only capable of generating positive cumulative returns for each one of the machine learning models but also to demonstrate how simple algorithms traditionally precluded from financial forecasting for trading applications presents similar performances as their more complex counterparts. The paper discusses how a combination of attributes in addition to technical indicators that has been used as inputs of the machine learning-based predictors such as price related features, seasonality features and lagged values used in classical time series analysis are used to enhance the classification capabilities that impacts directly into the final profitability. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
31. An empirical study on iris recognition in a mobile phone.
- Author
-
Kim, Dongik, Jung, Yujin, Toh, Kar-Ann, Son, Byungjun, and Kim, Jaihie
- Subjects
- *
IRIS recognition , *SMARTPHONES , *EMPIRICAL research , *TECHNOLOGICAL innovations , *NEAR infrared radiation - Abstract
The iris recognition on a mobile phone is different from the conventional iris recognition implemented on a dedicated device in that the computational power of a mobile phone and the space for placing NIR (near infrared) LED (light emitting diode) illuminators and iris camera are limited. This paper raises these issues in detail based on real implementation of an iris recognition system in a mobile phone and proposes some solutions to these issues. An experimental study was conducted to search for the relevant power and wavelength of NIR LED illuminators with their positioning on a phone for capturing a good quality iris image. Subsequently, in view of the disparity between the user's gazing point and the center of the iris camera which causes degradation of acquired iris images, an experiment was performed to locate the appropriate gazing point for good iris image capture. A fast eye detection algorithm was proposed for implementation under the mobile platform with low computational facility. The experiments were conducted on a currently released mobile phone and the results showed promising potential for adoption of iris recognition as a reliable authentication means. As a result, two 850 nm LEDs were selected for iris illumination at 1.1 cm away from the iris camera for the size of a 7 cm × 13.7 cm phone. In the performance, the recognition accuracy was 0.1% EER (equal error rate) and the eye detection rate with the speed of 17.64 ms on a mobile phone was 99.4%. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
32. An automatic apnea screening algorithm for children.
- Author
-
Ríos, Sebastián A. and Erazo, Lili
- Subjects
- *
SLEEP disorders in children , *SLEEP disorder diagnosis , *PULMONARY function tests , *POLYSOMNOGRAPHY , *ELECTROCARDIOGRAPHY , *EMPIRICAL research - Abstract
Sleep Disordered Breathing (SDB) is a group of diseases that affect the normal respiratory function during sleep, from primary snoring to obstructive sleep apnea (OSA) being the most severe. SDB can be detected using a complex and expensive exam called polysomnography. This exam monitors the sleep of a person during the night by measuring 21 different signals from an Electrocardiogram to Nasal Air Flow. Several automatic methods have been developed to detect this disorder in adults, with a very high performance and using only one signal. However, we have not found similar algorithms especially developed for Children. We benchmarked 6 different methods developed for adults. We showed empirically that those models’ performance is drastically reduced when used on children (under 15 years old). Afterwards, we present a new approach for screening children with risk of having SDB. Moreover, our algorithm uses less information than a polysomnography and out performs state-of-the-art techniques when used on children. We also showed empirically that no signal alone is a good SDB screening in children. Moreover, we discover that combinations of three signals which are not used in any other previous work are the best for this task in children. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
33. An improved grey neural network model for predicting transportation disruptions.
- Author
-
Liu, Chunxia, Shu, Tong, Chen, Shou, Wang, Shouyang, Lai, Kin Keung, and Gan, Lu
- Subjects
- *
ARTIFICIAL neural networks , *SUPPLY chains , *BUSINESS enterprises , *TRANSPORTATION research , *EMPIRICAL research , *EXPONENTIAL functions - Abstract
Transportation disruption is the direct result of various accidents in supply chains, which have multiple negative impacts on supply chains and member enterprises. After transportation disruption, market demand becomes highly unpredictable and thus it is necessary for enterprises to better predict market demand and optimize purchase, inventory and production. As such, this article endeavors to design an improved model of grey neural networks to help enterprises better predict market demand after transportation disruption and then the empirical study tests its feasibility. This improved model of grey neural networks exceeds the conventional grey model GM(1,1) with respect to the fact that the raw data tend to show exponential growth and data variation is required to be moderate, demonstrating the good attribute of nonlinear approximation in terms of neural networks, setting up standards for selecting the number of neurons in the input layer of BP neural networks, increasing the fitting degree and prediction accuracy and enhancing the stability and reliability of prediction. It can be applied to sequential data prediction in transportation disruption or mutation, contributing to the prediction of transportation disruption. The forecasting results can provide scientific evidence for demand prediction, inventory management and production of supply chain enterprises. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
34. An improved global feature selection scheme for text classification.
- Author
-
Uysal, Alper Kursat
- Subjects
- *
FEATURE selection , *CLASSIFICATION , *EMPIRICAL research , *PERFORMANCE evaluation , *SET theory - Abstract
Feature selection is known as a good solution to the high dimensionality of the feature space and mostly preferred feature selection methods for text classification are filter-based ones. In a common filter-based feature selection scheme, unique scores are assigned to features depending on their discriminative power and these features are sorted in descending order according to the scores. Then, the last step is to add top- N features to the feature set where N is generally an empirically determined number. In this paper, an improved global feature selection scheme (IGFSS) where the last step in a common feature selection scheme is modified in order to obtain a more representative feature set is proposed. Although feature set constructed by a common feature selection scheme successfully represents some of the classes, a number of classes may not be even represented. Consequently, IGFSS aims to improve the classification performance of global feature selection methods by creating a feature set representing all classes almost equally. For this purpose, a local feature selection method is used in IGFSS to label features according to their discriminative power on classes and these labels are used while producing the feature sets. Experimental results on well-known benchmark datasets with various classifiers indicate that IGFSS improves the performance of classification in terms of two widely-known metrics namely Micro-F1 and Macro-F1. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
35. Direct marketing campaigns in retail banking with the use of deep learning and random forests
- Author
-
Piotr Ładyżyński, Piotr Gawrysiak, and Kamil Żbikowski
- Subjects
0209 industrial biotechnology ,Operations research ,business.industry ,Computer science ,Deep learning ,General Engineering ,02 engineering and technology ,Banking sector ,Computer Science Applications ,Direct marketing ,020901 industrial engineering & automation ,Empirical research ,Artificial Intelligence ,Loan ,0202 electrical engineering, electronic engineering, information engineering ,Retail banking ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Transaction data - Abstract
Credit products are a crucial part of business of banks and other financial institutions. A novel approach based on time series of customer’s data representation for predicting willingness to take a personal loan is shown. Proposed testing procedure based on moving window allows detection of complex, sequential, time based dependencies between particular transactions. Moreover, this approach reduces noise by eliminating irrelevant dependencies that would occur due to the lack of time dimension analysis. The system for identifying customers interested in credit products, based on classification with random forests and deep neural networks is proposed. The promising results of empirical studies prove that the system is able to extract significant patterns from customers historical transfer and transactional data and predict credit purchase likelihood. Our approach, including the testing method, is not limited to banking sector and can be easily transferred and implemented as a general purpose direct marketing campaign system.
- Published
- 2019
36. Exploiting intra-day patterns for market shock prediction: A machine learning approach
- Author
-
Wenjun Zhou, Hui Xiong, Jinwen Sun, Keli Xiao, and Chuanren Liu
- Subjects
0209 industrial biotechnology ,Artificial neural network ,business.industry ,General Engineering ,Feature selection ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science Applications ,020901 industrial engineering & automation ,Empirical research ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Financial modeling ,020201 artificial intelligence & image processing ,Stock market ,Artificial intelligence ,Predictability ,Time series ,business ,computer ,Stock (geology) - Abstract
Discovering hidden patterns under unexpected market shocks is a significant and challenging problem, which continually attracts attention from research communities of mathematics, economics, and data science. Classic financial pricing models present unsatisfactory prediction accuracy when applied to real-world data due to limited capacity in depicting complex market movements. In this paper, we develop a machine learning approach, called ARMA-GARCH-NN, to capture intra-day patterns for stock market shock forecasting. Specifically, we integrate classical financial pricing models with artificial neural networks, with explicitly designed feature selection and cross-validation methods. We conduct empirical studies on high-frequency data of the U.S. stock market for evaluation. Our results provide initial evidence of the predictability of market shocks. Additionally, we confirm the effectiveness of ARMA-GARCH-NN by recognizing patterns in massive stock data without strong assumptions on distribution. Our method can serve as a portable methodology that integrates the advantages of traditional financial models and data-driven methods to reveal hidden patterns in large-scale financial data.
- Published
- 2019
37. Integrating data decomposition and machine learning methods: An empirical proposition and analysis for renewable energy generation forecasting.
- Author
-
Ding, Song, Zhang, Huahan, Tao, Zui, and Li, Ruojin
- Subjects
- *
RENEWABLE energy sources , *MACHINE learning , *EMPIRICAL research , *FORECASTING , *GEOTHERMAL resources , *ENERGY shortages , *LOAD forecasting (Electric power systems) - Abstract
• An ensemble data-driven STL-LSTM forecasting framework is proposed. • This proposed model is used for renewable energy generation forecasting. • The new model can grasp the nonlinear, trend, and periodic patterns in datasets. • The proposed method strikingly outperforms a range of prevalent benchmarks. Renewable energy generation (REG) has been vigorously developed under the dual pressure of environmental contamination and fossil fuel energy shortage. Whereas the rapidly increasing REG instigates stochastic volatility may threaten the REG deployment and grid stability. Motivated by the grid flexibility enhancement depending on the expansion of grid infrastructure that belongs to the mid-term management domain, the precise predictions for mid-term, especially monthly REG sourced from diverse renewable energy, are indispensable for power plant configuration, electricity end-uses promotion, and grid expansion to propel the energy system transformation. To this end, aiming at diverse monthly REG datasets characterized by periodicity, nonlinearity, and volatility, the seasonal-trend decomposition procedure based on loess (STL) is initially employed to extract the trend and periodic features of monthly REG datasets, and generate trend, seasonal, and remainder subseries. Based on the decomposed data, long-short term memory (LSTM) is utilized for subseries prediction, and then the projections are integrated to compose the ultimate forecasted results for original REG observations. To validate the efficacy and adaptability of the proposed data-driven STL-LSTM framework, several machine learning methods and autoregressive models are involved in predicting practical monthly electricity generation datasets sourced by solar, wind, hydropower, and geothermal energy in different countries. The forecasted results indicate that the proposed framework is demonstrated with superlative prediction performance and strong adaptability for generation prediction derived from diverse renewable energy. Therefore, the novel STL-LSTM framework may constitute a promising alternative for REG forecasting. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. A time series-based statistical approach for outbreak spread forecasting: Application of COVID-19 in Greece
- Author
-
Christos Katris
- Subjects
0209 industrial biotechnology ,Frequentist probability ,Multivariate adaptive regression splines ,Artificial neural network ,Computer science ,tSIR epidemiological model ,Exponential smoothing ,Univariate ,General Engineering ,02 engineering and technology ,Article ,Computer Science Applications ,Forecast combinations ,Machine learning approaches ,020901 industrial engineering & automation ,Empirical research ,Artificial Intelligence ,Time series forecasting ,Classical time series models ,0202 electrical engineering, electronic engineering, information engineering ,Econometrics ,020201 artificial intelligence & image processing ,Autoregressive integrated moving average ,Time series - Abstract
The aim of this paper is the generation of a time-series based statistical data-driven procedure in order to track an outbreak. At first are used univariate time series models in order to predict the evolution of the reported cases. Moreover, are considered combinations of the models in order to provide more accurate and robust results. Additionally, statistical probability distributions are considered in order to generate future scenarios. Final step is the build and use of an epidemiological model (tSIR) and the calculation of an epidemiological ratio (R0) for estimating the termination of the outbreak. The time series models include Exponential Smoothing and ARIMA approaches from the classical models, also Feed-Forward Artificial Neural Networks and Multivariate Adaptive Regression Splines from the machine learning toolbox. Combinations include simple mean, Newbolt-Granger and Bates-Granger approaches. Finally, the tSIR model and the R0 ratio are used for estimating the spread and the reversion of the pandemic. The suggested procedure is used to track the COVID-19 epidemic in Greece. This epidemic has appeared in China in December 2019 and has been widespread since then to all over the world. Greece is the center of this empirical study as is considered an early successful paradigm of resistance against the virus.
- Published
- 2021
- Full Text
- View/download PDF
39. A business intelligence approach using web search tools and online data reduction techniques to examine the value of product-enabled services.
- Author
-
Tanev, Stoyan, Liotta, Giacomo, and Kleismantas, Andrius
- Subjects
- *
BUSINESS intelligence , *INTERNET searching , *ONLINE data processing , *WEB services , *EMPIRICAL research , *MOTIVATION (Psychology) - Abstract
This article summarizes the results of an empirical study focusing on the value of product-enabled services in intensive R&D spenders. The focus is on product-driven firms for which new service development is expected to be particularly promising but also quite challenging. Part of the motivation is based on the fact that existing studies on the value attributes of hybrid offerings are mostly conceptual and need to be further substantiated through more systematic empirical studies. The research includes two samples with a total of 83 product-driven firms selected among the top R&D spenders in Canada and Europe. It adopts an innovative methodology based on online textual data that could be implemented in advanced business intelligence tools aiming at the facilitation of innovation, marketing and business decision making. Combinations of keywords referring to different aspects of service value were designed and used in a web search resulting in the frequency of their use on companies’ websites. Principal component analysis was applied to identify distinctive groups of keyword combinations that were interpreted in terms of specific service value attributes. Finally, the firms were classified by means of K-means cluster analysis in order to identify the firms with a high degree of articulation of their service value attributes. This work articulates a relatively simple and intuitive method for quantitative and qualitative semantic analysis of online textual data that is similar to latent semantic analysis but could be used as part of more user-friendly expert system solutions and business intelligence tools based on easily accessible business statistics packages. The results show that the main service value attributes of the Canadian firms are: better service effectiveness, higher market share, higher service quality, and customer satisfaction. The service value attributes for the European firms include, among others, product added-value, product modernization and optimization of customer time and efforts. Canadian firms focus on collaboration and co-creation with suppliers and customers for the sake of product–service innovation as a competitive advantage on the marketplace. On the other hand, the focus of EU firms on innovative hybrid offerings is not explicitly related to business differentiation and competitiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
40. Mixed dissimilarity measure for piecewise linear approximation based time series applications.
- Author
-
Bankó, Zoltán and Abonyi, János
- Subjects
- *
APPROXIMATION theory , *TIME series analysis , *PERFORMANCE evaluation , *EMPIRICAL research , *LINEAR systems - Abstract
In recent years, expert systems built around time series-based methods have been enthusiastically adopted in engineering applications, thanks to their ease of use and effectiveness. This effectiveness depends on how precisely the raw data can be approximated and how precisely these approximations can be compared. When performance of a time series-based system needs to be improved, it is desirable to consider other time series representations and comparison methods. The approximation, however, is often generated by a non-replaceable element and eventually the only way to find a more advanced comparison method is either by creating a new dissimilarity measure or by improving the existing one further. In this paper, it is shown how a mixture of different comparison methods can be utilized to improve the effectiveness of a system without modifying the time series representation itself. For this purpose, a novel, mixed comparison method is presented for the widely used piecewise linear approximation (PLA), called mixed dissimilarity measure for PLA (MDPLA). It combines one of the most popular dissimilarity measure that utilizes the means of PLA segments and the authors’ previously presented approach that replaces the mean of a segment with its slope. On the basis of empirical studies three advantages of such combined dissimilarity measures are presented. First, it is shown that the mixture ensures that MDPLA outperforms the most popular dissimilarity measures created for PLA segments. Moreover, in many cases, MDPLA provides results that makes the application of dynamic time warping (DTW) unnecessary, yielding improvement not only in accuracy but also in speed. Finally, it is demonstrated that a mixed measure, such as MDPLA, shortens the warping path of DTW and thus helps to avoid pathological warpings, i.e. the unwanted alignments of DTW. This way, DTW can be applied without penalizing or constraining the warping path itself while the chance of the unwanted alignments are significantly lowered. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
41. User ratings analysis in social networks through a hypernetwork method.
- Author
-
Suo, Qi, Sun, Shiwei, Hajli, Nick, and Love, Peter E.D.
- Subjects
- *
SOCIAL networks , *DATA analysis , *TOPOLOGY , *RESOURCE management , *EMPIRICAL research - Abstract
This study utilizes the critical properties of a complex social network to reveal its intrinsic characteristics and the laws governing the way information propagates across the network to identify the central, active users and opinion leaders. The hypernetwork method is applied to analyze user ratings in social networks (SNSs). After introducing the concept of a hypernetwork and its topological characteristics such as node degree, the strength of the node and node hyperdegree, collaborative recommendations in hypernetworks are formulated based on the topological characteristics. Finally, the new method developed is applied to analyze data from the Douban social network. In this hypernetwork, users are defined as hyperedges and the objects as nodes. Three hypernetworks focused on reviews of books, movies and music were constructed using the proposed method and found to share a similar law of trends. These topological characteristics are clearly an effective way to reflect the relationship between users and objects. This research will enable SNSs providers to offer better object resource management and a personalized service for users, as well as contributing to empirical analyses of other similar SNSs. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
42. Diversity-driven generation of link-based cluster ensemble and application to data classification.
- Author
-
Iam-On, Natthakan and Boongoen, Tossapon
- Subjects
- *
FEATURE selection , *GENE expression , *RANDOM forest algorithms , *EMPIRICAL research , *MATHEMATICAL optimization - Abstract
Over decades, a large number of research studies have concentrated on improving the accuracy of classification model. This is the case as several types of classifiers prove to be useful in real-life problems, including the prediction of system failure risk and microarray-based cancer diagnosis. Despite this, the accuracy of existing classifiers has been constrained by uninformative variables typically observed in modern data. In addition to feature selection, one may transform the original data to another variation, where only key feature components are included. Unlike conventional transformation-based techniques found in the literature, this paper presents a novel method that makes use of cluster ensembles, specifically the summarized information matrix, as the transformed data for the following classification step. Among different state-of-the-art methods, the link-based cluster ensemble approach (LCE) provides a highly accurate clustering, and thus particularly employed here. This is uniquely coupled with a diversity-driven generation of ensemble, which provides informative and diverse sets of clusterings. The performance of this transformation model is evaluated on published synthetic, standard and gene expression datasets; using C4.5, Naive Bayes, KNN, Neural Network and Random Forest classifiers; in comparison with benchmark techniques. The findings suggest that the new model can improve the classification accuracy of original data and performs better than the other transformation methods investigated in the empirical study. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
43. An empirical study of empty prediction of multi-label classification.
- Author
-
Liu, Shuhua (Monica) and Chen, Jiun-Hung
- Subjects
- *
PROBLEM solving , *SET theory , *DATA analysis , *EMPIRICAL research , *COMPUTER science - Abstract
A detailed and extensive empirical study of empty prediction of multi-label classification is conducted in this paper and to the best of our knowledge this work is the first empirical study of this problem. Total 8 state of the art multi-label classification methods, BR, CC, CLR, HOMER, RAkEL, ECC, MLkNN, and BRkNN, are compared on 11 datasets. The empirical results clearly answer the two research questions, (1) whether empty prediction problems happen in commonly used state of the art multi-label classification methods and what their empty prediction rates (EPR) on different test sets are and (2) what multi-label classification methods are with overall highest/lowest EPRs. Specifically, it is empirically shown that every method considered all made empty predictions on different datasets. In addition, several thresholding methods which in theory can solve empty prediction are compared. The clear answers to the two research questions and the experimental findings are the main contributions of this work to multi-label classification. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
44. Ontology-based feature modeling: An empirical study in changing scenarios.
- Author
-
Dermeval, Diego, Tenório, Thyago, Bittencourt, Ig Ibert, Silva, Alan, Isotani, Seiji, and Ribeiro, Márcio
- Subjects
- *
ONTOLOGY , *EMPIRICAL research , *SOFTWARE product line engineering , *MARKET segmentation , *SCALABILITY , *SELF-expression - Abstract
A software product line (SPL) is a set of software systems that have a particular set of common features and that satisfy the needs of a particular market segment or mission. Feature modeling is one of the key activities involved in the design of SPLs. The feature diagram produced in this activity captures the commonalities and variabilities of SPLs. In some complex domains (e.g., ubiquitous computing, autonomic systems and context-aware computing), it is difficult to foresee all functionalities and variabilities a specific SPL may require. Thus, Dynamic Software Product Lines (DSPLs) bind variation points at runtime to adapt to fluctuations in user needs as well as to adapt to changes in the environment. In this context, relying on formal representations of feature models is important to allow them to be automatically analyzed during system execution. Among the mechanisms used for representing and analyzing feature models, description logic (DL) based approaches demand to be better investigated in DSPLs since it provides capabilities, such as automated inconsistency detection, reasoning efficiency, scalability and expressivity. Ontology is the most common way to represent feature models knowledge based on DL reasoners. Previous works conceived ontologies for feature modeling either based on OWL classes and properties or based on OWL individuals. However, considering change or evolution scenarios of feature models, we need to compare whether a class-based or an individual-based feature modeling style is recommended to describe feature models to support SPLs, and especially its capabilities to deal with changes in feature models, as required by DSPLs. In this paper, we conduct a controlled experiment to empirically compare two approaches based on each one of these modeling styles in several changing scenarios (e.g., add/remove mandatory feature, add/remove optional feature and so on). We measure time to perform changes, structural impact of changes (flexibility) and correctness for performing changes in our experiment. Our results indicate that using OWL individuals requires less time to change and is more flexible than using OWL classes and properties. These results provide insightful assumptions towards the definition of an approach relying on reasoning capabilities of ontologies that can effectively support products reconfiguration in the context of DSPL. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
45. Generating project risk response strategies based on CBR: A case study.
- Author
-
Fan, Zhi-Ping, Li, Yong-Hai, and Zhang, Yao
- Subjects
- *
CASE-based reasoning , *INFORMATION theory , *INFORMATION retrieval , *PROJECT managers , *EMPIRICAL research - Abstract
Risk response is an important work in project risk management (PRM). To generate project risk response strategies, retrieving and reusing information and knowledge of the similar historical cases is important, while research concerning this issue is still relatively scarce. Taking the risk response of the subway project in S city, China as a case problem, this paper proposes a pragmatic method for generating project risk response strategies based on the case-based reasoning (CBR). The procedure of the method include the five parts: first, representing the target case and the historical cases; second, retrieving the available historical cases by judging whether the risks involved in each historical case cover or are the same as those in the target case; third, retrieving the similar historical cases by measuring the similarity between each available historical case and the target case; fourth, revising the inapplicable risk response strategies involved in the similar historical cases by analyzing the response relation between each strategy and each risk of the current project; and generating the desirable risk response strategies by evaluating each candidate risk response strategy set. To illustrate the use of the proposed method, an empirical analysis of generating the risk response strategies for the subway station project is given. The proposed method can support project managers to make the better decision in PRM. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
46. Knowledge management capabilities and firm performance: A test of universalistic, contingency and complementarity perspectives.
- Author
-
Cohen, Jason F. and Olsen, Karen
- Subjects
- *
KNOWLEDGE management , *ORGANIZATIONAL performance , *UNIVERSALISM (Theology) , *CONTINGENCY theory (Management) , *INTERPERSONAL complementarity , *EMPIRICAL research , *PERSPECTIVE (Philosophy) , *HUMAN capital - Abstract
Competing theoretical perspectives regarding the effects of knowledge management (KM) on performance have underpinned past empirical studies. By explicitly surfacing and comparing three such perspectives, we contribute to the theoretical advancement of the KM field. We develop hypotheses consistent with the underlying logics of universalistic, complementarity and contingency theories and we empirically test these hypotheses to determine which is best supported. Data was collected from a sample of hospitality services firms operating in South Africa. Our results show that the universalistic perspective is less preferred. We find support for the complementarity perspective by revealing that codification and human capital KM capabilities interact to influence customer service outcomes. The contingency perspective also received support as the links between KM capabilities and performance were found to be contingent on the business strategy of the firm. Our results suggest that future researchers should explicitly acknowledge the theoretical perspective from which they are observing the performance impacts of KM and ensure that empirical tests are consistent with the logic of the selected perspective. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
47. An empirical study of the design choices for local citation recommendation systems.
- Author
-
Medić, Zoran and Šnajder, Jan
- Subjects
- *
RECOMMENDER systems , *ABSOLUTE value , *LOGITS , *KALMAN filtering , *EMPIRICAL research , *EXPERIMENTAL design - Abstract
As the number of published research articles grows on a daily basis, it is becoming increasingly difficult for scientists to keep up with the published work. Local citation recommendation (LCR) systems, which produce a list of relevant articles to be cited in a given text passage, could help alleviate the burden on scientists and facilitate research. While research on LCR is gaining popularity, building such systems involves a number of important design choices that are often overlooked. We present an empirical study of the impact of the three design choices in two-stage LCR systems consisting of a prefiltering and a reranking phase. In particular, we investigate (1) the impact of the prefiltering models' parameters on the model's performance, as well as the impact of (2) the training regime and (3) negative sampling strategy on the performance of the reranking model. We evaluate various combinations of these parameters on two datasets commonly used for LCR and demonstrate that specific combinations improve the model's performance over the widely used standard approaches. Specifically, we demonstrate that (1) optimizing prefiltering models' parameters improves R @ 1000 in the range of 3% to 12% in absolute value, (2) using the strict training regime improves both R @ 10 and M R R (up to a maximum of 3.4% and 2.6%, respectively) in all combinations of dataset and prefiltering model, and (3) a careful choice of negative examples can further improve both R @ 10 and M R R (up to a maximum of 11.9% and 8%, respectively) in both datasets used Our results show that the design choices we considered are important and should be given greater consideration when building LCR systems. • Certain design choices affect the accuracy of local citation recommendation systems. • Training reranker model with a strict regime improves the model's performance. • Triplet-based reranking models benefit from non-random negative sampling strategies. • The best negative sampling strategy for triplet construction depends on the dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. Enhanced sine cosine algorithm with crossover: A comparative study and empirical analysis.
- Author
-
Gupta, Shubham
- Subjects
- *
METAHEURISTIC algorithms , *BENCHMARK problems (Computer science) , *ALGORITHMS , *COSINE function , *GLOBAL optimization , *PARTICLE swarm optimization , *EMPIRICAL research , *COMPARATIVE studies - Abstract
Sine cosine algorithm (SCA) is a recently developed and widely used metaheuristic to perform global optimization tasks. Due to its simplicity in structure and reasonable performance, it has been utilized to solve several real-world applications. This paper proposes an alternate version of the SCA by adopting the greedy approach of search, crossover and exponentially decreased transition control parameter to overcome the issues of low exploitation, insufficient diversity and premature convergence. The proposed algorithm, called ECr-SCA, is validated and compared with the original SCA using computational time, diversity, performance index, statistical and convergence analysis on a set of 23 standard benchmark problems. Later, the proposed ECr-SCA is compared with seventeen other algorithms including improved versions of the SCA and state-of-the-art algorithms. Furthermore, the ECr-SCA is used to train multi-layer perceptron and the results are compared with variants of SCA and other metaheuristics. Overall comparison based on several different metrics illustrates the significant improvement in the search strategy of the SCA by the proposal of the ECr-SCA. • The enhanced sine cosine algorithm (ECr-SCA) is proposed for global optimization. • A greedy search and modified transition parameter are used to increase exploitation. • The multi-parent crossover is embedded to enhance diversity and global search. • The excellent performance of the ECr-SCA is validated on benchmark problems. • The ECr-SCA is also used to train multilayer perceptron. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
49. Creating technological innovation via green supply chain management: An empirical analysis.
- Author
-
Voon-Hsien Lee, Keng-Boon Ooi, Alain Yee-Loong Chong, and Seow, Christopher
- Subjects
- *
TECHNOLOGICAL innovations , *GREEN technology , *SUPPLY chain management , *EMPIRICAL research , *MANUFACTURING industries , *STRUCTURAL equation modeling - Abstract
This study sets out to empirically test the research framework and identify the relationship between green supply chain management (GSCM) practices and technological innovation (TI) in manufacturing firms. It is one of the first studies that experimentally validate the relationship between GSCM practices and TI from the perspective of a developing nation. In this study, 133 usable sets of data were collected from manufacturing firms in Malaysia. Results obtained from the partial least square-structural equation modeling (PLS-SEM) approach supported the significant positive relationship between three GSCM practices (i.e. internal environmental management, eco-design and investment recovery) and TI, but the study found that green purchasing and cooperation with customers do not have a significant positive correlation with TI. The importance of GSCM practices has been proven to enhance firms' TI, in addition to improving the environment, bringing about a positive impact on the manufacturing establishment. Therefore, this study has relevance to all manufactures seeking to enhance their TI through the effective use of GSCM practices. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
50. A genetic algorithm approach for SMEs bankruptcy prediction: Empirical evidence from Italy.
- Author
-
Gordini, Niccolò
- Subjects
- *
GENETIC algorithms , *SMALL business , *BANKRUPTCY , *EMPIRICAL research , *ECONOMIC forecasting , *SUPPORT vector machines - Abstract
Highlights: [•] GAs are a very promising method in SMEs default prediction analysis. [•] GAs are capable of extracting rules that are easy to understand for users. [•] GAs give a better SMEs default prediction accuracy rate compared with SVM and LR. [•] GAs significantly reduce misclassification costs compared to SVM and LR. [•] The prediction accuracy rate of GAs is markedly higher for the smallest sized firms and in the firms operating in the north. [Copyright &y& Elsevier]
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.