49 results on '"Lin, Jerry Chun-Wei"'
Search Results
2. Federated deep learning for smart city edge-based applications
- Author
-
Djenouri, Youcef, Michalak, Tomasz P., and Lin, Jerry Chun-Wei
- Published
- 2023
- Full Text
- View/download PDF
3. An efficient biobjective evolutionary algorithm for mining frequent and high utility itemsets
- Author
-
Fang, Wei, Li, Chongyang, Zhang, Qiang, Zhang, Xin, and Lin, Jerry Chun-Wei
- Published
- 2023
- Full Text
- View/download PDF
4. Hybrid graph convolution neural network and branch-and-bound optimization for traffic flow forecasting
- Author
-
Djenouri, Youcef, Belhadi, Asma, Srivastava, Gautam, and Lin, Jerry Chun-Wei
- Published
- 2023
- Full Text
- View/download PDF
5. High-utility itemsets mining based on binary particle swarm optimization with multiple adjustment strategies
- Author
-
Fang, Wei, Zhang, Qiang, Lu, Hengyang, and Lin, Jerry Chun-Wei
- Published
- 2022
- Full Text
- View/download PDF
6. EANDC: An explainable attention network based deep adaptive clustering model for mental health treatment
- Author
-
Ahmed, Usman, Srivastava, Gautam, Yun, Unil, and Lin, Jerry Chun-Wei
- Published
- 2022
- Full Text
- View/download PDF
7. Reliable customer analysis using federated learning and exploring deep-attention edge intelligence
- Author
-
Ahmed, Usman, Srivastava, Gautam, and Lin, Jerry Chun-Wei
- Published
- 2022
- Full Text
- View/download PDF
8. A multi-strategy fusion identification model for failure mode of reinforced concrete column.
- Author
-
Gai, Tongtong, Yu, Dehu, Zeng, Sen, and Lin, Jerry Chun-Wei
- Subjects
FAILURE mode & effects analysis ,CONCRETE columns ,ARTIFICIAL neural networks ,REINFORCED concrete ,IDENTIFICATION ,COLUMNS ,COMPOSITE columns - Abstract
Accurate identification of the failure modes of Reinforced Concrete (RC) columns based on the design parameters of the structural members is critical for earthquake-resistant design and safety evaluation of existing structures. Existing identification methods have some problems, such as high cost, incomplete consideration of influencing factors, and low precision or recall in identifying shear or flexural-shear failure. In this paper, the main factors for the failure modes of RC columns are first analyzed and studied. Then, the problem of class imbalance in data samples is investigated. To identify the failure modes of RC columns, oversampling of data (BSB-FMC), model ensembling (RFB-FMC), cost-sensitive learning (CSB-FMC) and a fusion model of three strategies (BSFCB-FMC) are proposed. And finally, the SHapley Additive exPlanations (SHAP) method is used to provide a better interpretation of the designed model. The results show that the developed strategies can improve the accuracy of identifying the failure modes of RC columns compared to the models using a single Artificial Neural Network (ANN), a Support Vector Machine (SVM), a Random Forest (RF), and Adaptive Boosting (AdaBoost). The overall accuracy of the developed BSFCB-FMC model reaches 97%, and the precision and recall for the three failure modes are both above 90%. The designed model provides a solution for fast, accurate and cost-effective identification of the failure modes of RC columns. • We develop the models for identifying the failure modes of RC columns considering class imbalance problem. • The BSFCB-FMC model is developed by considering data oversampling, model ensembling, and cost-sensitive learning. • The designed model provides engineers with an interpretable, fast and stable model for identifying failure modes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Efficient closed high-utility pattern fusion model in large-scale databases
- Author
-
Lin, Jerry Chun-Wei, Djenouri, Youcef, and Srivastava, Gautam
- Published
- 2021
- Full Text
- View/download PDF
10. A predictive GA-based model for closed high-utility itemset mining
- Author
-
Lin, Jerry Chun-Wei, Djenouri, Youcef, Srivastava, Gautam, Yun, Unil, and Fournier-Viger, Philippe
- Published
- 2021
- Full Text
- View/download PDF
11. Privacy reinforcement learning for faults detection in the smart grid
- Author
-
Belhadi, Asma, Djenouri, Youcef, Srivastava, Gautam, Jolfaei, Alireza, and Lin, Jerry Chun-Wei
- Published
- 2021
- Full Text
- View/download PDF
12. Hiding sensitive information in eHealth datasets
- Author
-
Wu, Jimmy Ming-Tai, Srivastava, Gautam, Jolfaei, Alireza, Fournier-Viger, Philippe, and Lin, Jerry Chun-Wei
- Published
- 2021
- Full Text
- View/download PDF
13. Deep learning for pedestrian collective behavior analysis in smart cities: A model of group trajectory outlier detection
- Author
-
Belhadi, Asma, Djenouri, Youcef, Srivastava, Gautam, Djenouri, Djamel, Lin, Jerry Chun-Wei, and Fortino, Giancarlo
- Published
- 2021
- Full Text
- View/download PDF
14. Discovering rare correlated periodic patterns in multiple sequences
- Author
-
Fournier-Viger, Philippe, Yang, Peng, Li, Zhitian, Lin, Jerry Chun-Wei, and Kiran, Rage Uday
- Published
- 2020
- Full Text
- View/download PDF
15. A CMFFP-tree algorithm to mine complete multiple fuzzy frequent itemsets
- Author
-
Lin, Jerry Chun-Wei, Hong, Tzung-Pei, and Lin, Tsung-Ching
- Published
- 2015
- Full Text
- View/download PDF
16. Special Issue Editorial: Advances in Computational Intelligence for Perception and Decision-Making for Autonomous Systems.
- Author
-
Lin, Jerry Chun-Wei, Srivastava, Gautam, and Zhang, Yu-Dong
- Subjects
DECISION making ,COMPUTATIONAL intelligence - Published
- 2023
- Full Text
- View/download PDF
17. A Bi-LSTM mention hypergraph model with encoding schema for mention extraction.
- Author
-
Lin, Jerry Chun-Wei, Shao, Yinan, Zhou, Yujie, Pirouz, Matin, and Chen, Hsing-Chung
- Subjects
- *
NATURAL language processing , *NOUN phrases (Grammar) , *NATURAL languages - Abstract
Natural language processing is a technique to process data such as text and speech. Some fundamental research includes named-entity recognition, which recognizes name entities (i.e., persons, companies) from texts; semantic parsing, which is used to convert a natural language utterance to the representation of logical form; and co-reference resolution, which extracts nouns (including pronouns, noun phrases) pointing to the same reference body. In this paper, we mainly focus on the task of mention extraction, which extract and classify overlapping or nested structure mentions. We proposed a neural-encoded mention-hypergraph (NEMH) model to use hypergraph to model overlapping or nested structure mentions and use neural networks to extract features for hypergraph automatically. Unlike the existing approaches, our hypergraph model can effectively capture nested mention entities with unlimited lengths. Also, the proposed model is highly scalable and the time complexity of the proposed model is linear in the number of mention classes and the number of input words. Extensive experiments are conducted on several standard datasets to demonstrate the effectiveness of the proposed model. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
18. ProUM: Projection-based utility mining on sequence data.
- Author
-
Gan, Wensheng, Lin, Jerry Chun-Wei, Zhang, Jiexiong, Chao, Han-Chieh, Fujita, Hamido, and Yu, Philip S.
- Subjects
- *
SEQUENTIAL pattern mining , *DATA mining , *UTILITY theory , *ACCOUNTING - Abstract
Utility is an important concept in Economics. A variety of applications consider utility in real-life situations, which has lead to the emergence of utility-oriented mining (also called utility mining) in the recent decade. Utility mining has attracted a great amount of attention, but most of the existing studies have been developed to deal with itemset-based data. Time-ordered sequence data is more commonly seen in real-world situations, which is different from itemset-based data. Since they are time-consuming and require large amount of memory usage, current utility mining algorithms still have limitations when dealing with sequence data. In addition, the mining efficiency of utility mining on sequence data still needs to be improved, especially for long sequences or when there is a low minimum utility threshold. In this paper, we propose an efficient Pro jection-based U tility M ining (ProUM) approach to discover high-utility sequential patterns from sequence data. The utility-array structure is designed to store the necessary information of the sequence-order and utility. ProUM can significantly improve the mining efficiency by utilizing the projection technique in generating utility-array, and it effectively reduces the memory consumption. Furthermore, a new upper bound named sequence extension utility is proposed and several pruning strategies are further applied to improve the efficiency of ProUM. By taking utility theory into account, the derived high-utility sequential patterns have more insightful and interesting information than other kinds of patterns. Experimental results showed that the proposed ProUM algorithm significantly outperformed the state-of-the-art algorithms in terms of execution time, memory usage, and scalability. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
19. Correlated utility-based pattern mining.
- Author
-
Gan, Wensheng, Lin, Jerry Chun-Wei, Chao, Han-Chieh, Fujita, Hamido, and Yu, Philip S.
- Subjects
- *
COMMERCIAL products , *UTILITY theory , *CONSUMER behavior , *PURCHASING , *RESEARCH methodology , *STATISTICAL correlation , *ALGORITHMS - Abstract
Recently, a new research field called utility-oriented mining has attracted great attention. However, previous studies have shown a limitation in that they rarely consider the inherent correlation of items among patterns. For example, considering the purchase behaviors of consumers, a high-utility group of products (w.r.t. multi-products) may contain several very high-utility products with some low-utility products. However, it is considered to be a valuable pattern even if this behavior/pattern may not be highly correlated, or even if it happens by chance. In light of these challenges, we propose an efficient utility-mining approach, called non-redundant Co rrelated high- U tility P attern M iner (CoUPM) by considering the positive correlation and profitable value. The derived patterns with high utility and strong positive correlation can lead to more insightful availability than those patterns that only have high profitable values. The utility-list structure is revised and applied to store the necessary information of both correlation and utility. Several pruning strategies are further developed to improve the efficiency for discovering the desired patterns. Experimental results showed that the non-redundant correlated high-utility patterns have more effectiveness than some other kinds of interesting patterns. Moreover, the efficiency of the proposed CoUPM algorithm significantly outperformed the state-of-the-art algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
20. BILU-NEMH: A BILU neural-encoded mention hypergraph for mention extraction.
- Author
-
Lin, Jerry Chun-Wei, Shao, Yinan, Fournier-Viger, Philippe, and Hamido, Fujita
- Subjects
- *
NATURAL language processing , *ELECTRONIC data processing , *HYPERGRAPHS , *NEURAL computers , *ENCODING - Abstract
The natural language processing (NLP) denotes a technique used to process data such as text and speech. Some of the fundamental research in NLP includes the named entity recognition, which recognizes the named entities (i.e., persons and companies) from texts, the semantic parsing, which converts a natural language utterance to a logical form, and the co-reference resolution, which extracts the nouns (including pronouns and noun phrases) pointing to the same reference body. In this paper, we focus on the mention extraction and classification, proposing a neural-encoded mention-hypergraph model named the BILU-NEMH to extract the mention entities from a content. The proposed BILU-NEMH model combines a mention hypergraph model with the encoding schema and neural network. The proposed model can effectively capture the overlapping mention entities of an unbounded length. The proposed model was verified by the experiments, and the obtained experimental results showed that the proposed model achieved better performance and greater effectiveness than the existing related models on most standard datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
21. Mining of skyline patterns by considering both frequent and utility constraints.
- Author
-
Lin, Jerry Chun-Wei, Yang, Lu, Fournier-Viger, Philippe, and Hong, Tzung-Pei
- Subjects
- *
DATA mining , *COMPUTATIONAL acoustics , *ARTIFICIAL intelligence , *ARTIFICIAL neural networks , *COMPUTER simulation , *DECISION making - Abstract
Abstract Association-rule mining (ARM) or frequent itemset mining (FIM) is the most fundamental task in knowledge discovery, which is used to find the occurrence frequency of the item/sets in transactional database. The other factors such as weight, interestingness or unit profit of the items are not considered whether in ARM or FIM. To reveal more information, high-utility itemset mining (HUIM) was designed to consider both quantity and unit profit of items to discover the high-utility itemsets (HUIs). Several algorithms for FIM or HUIM were extensively studied but fewer works concern both frequency and utility together to provide better solutions in decision-making. In the past, the SKYMINE algorithm was designed to find the skyline frequent-utility patterns (SFUPs). A SFUP is a non-dominated pattern, in which each solution dominates the others by considering the aspects of frequency and utility. The SKYMINE algorithm needs, however, amounts of computation to level-wisely discover the SFUPs. In this paper, an efficient utility-list structure is used instead of the UP-tree structure used in SKYMINE to mine the SFUPs. Two algorithms are respectively designed by using the depth-first search (called SKYFUP-D) and breath-first search (SKYFUP-B) to mine the SFUPs. An efficient structure is also designed to record the maximal utility of the potential itemsets, thus reducing the computations for finding the SFUPs in the search space. Extensive experiments are conducted on several real-world and simulated datasets and the results indicate that the designed two algorithms have better performance than that of the state-of-the-art SKYMINE algorithm in terms of runtime, memory usage, search space size and the scalability. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
22. Efficiently updating the discovered high average-utility itemsets with transaction insertion.
- Author
-
Lin, Jerry Chun-Wei, Ren, Shifeng, Fournier-Viger, Philippe, Pan, Jeng-Shyan, and Hong, Tzung-Pei
- Subjects
- *
DATA mining , *ALGORITHMS , *DATABASES , *GRAPH theory , *INFORMATION theory - Abstract
High-utility itemset mining (HUIM) is an extension of frequent-itemset mining (FIM) but considers the unit profit and quantity of items to discover the set of high-utility itemsets (HUIs). Traditionally, the utility of an itemset is the summation of the utilities of the itemset in all the transactions regardless of its length. This approach is, however, inappropriate in real-world applications since the utility of the itemset increases along with the number of items within it. High average-utility itemset mining (HAUIM) was designed to provide more reasonable utility measure by taking the size of the itemset into account. Existing algorithms can only handle, however, the static database and unsuitable for the dynamic environment since the size of data is frequently changed in real-life situations. In this paper, an incremental high-average utility pattern mining (IHAUPM) algorithm is presented to handle the incremental database with transaction insertion. The well-known fast updated (FUP) concept in the FIM is modified to adopt the designed algorithm, thus efficiently updating the discovered HAUIs. Based on the designed model for HAUIM with transaction insertion, the proposed IHAUPM algorithm can easily only handle the inserted transactions. Experiments are carried on six datasets and the results showed that the designed algorithm has better performance than the state-of-the-art algorithms performing in the batch manner. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. A two-phase approach to mine short-period high-utility itemsets in transactional databases.
- Author
-
Lin, Jerry Chun-Wei, Zhang, Jiexiong, Fournier-Viger, Philippe, Hong, Tzung-Pei, and Zhang, Ji
- Subjects
- *
DATA mining , *DATABASE design , *DECISION making - Abstract
The discovery of high-utility itemsets (HUIs) in transactional databases has attracted much interest from researchers in recent years since it can uncover hidden information that is useful for decision making, and it is widely used in many domains. Nonetheless, traditional methods for high-utility itemset mining (HUIM) utilize the utility measure as sole criterion to determine which item/sets should be presented to the user. These methods ignore the timestamps of transactions and do not consider the period constraint. Hence, these algorithms often finds HUIs that are profitable but that seldom occur in transactions. In this paper, we address this limitation of previous methods by pushing the period constraint in the HUI mining process. A new framework called short-period high-utility itemset mining (SPHUIM) is designed to identify patterns in a transactional database that appear regularly, are profitable, and also yield a high utility under the period constraint. The aim of discovering short-period high-utility itemsets (SPHUI) is hence to identify patterns that are interesting both in terms of period and utility. The paper proposes a baseline two-phase short-period high-utility itemset (SPHUI T P ) mining algorithm to mine SPHUIs in a level-wise manner. Then, to reduce the search space of the SPHUI TP algorithm and speed up the discovery of SPHUIs, two pruning strategies are developed and integrated in the baseline algorithm. The resulting algorithms are denoted as SPHUI MT and SPHUI TID , respectively. Substantial experiments both on real-life and synthetic datasets show that the three proposed algorithms can efficiently and effectively discover the complete set of SPHUIs, and that considering the short-period constraint and the utility measure can greatly reduce the number of patterns found. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
24. Extracting recent weighted-based patterns from uncertain temporal databases.
- Author
-
Gan, Wensheng, Lin, Jerry Chun-Wei, Fournier-Viger, Philippe, Chao, Han-Chieh, Wu, Jimmy Ming-Tai, and Zhan, Justin
- Subjects
- *
COMPUTER algorithms , *DATA mining , *TEMPORAL databases , *APRIORI algorithm , *CONSTRAINT algorithms - Abstract
Weighted Frequent Itemset Mining (WFIM) has been proposed as an extension of frequent itemset mining that considers not only the frequency of items but also their relative importance. However, using WFIM algorithms in real applications raises some problems. First, they do not consider how recent the patterns are. Second, traditional WFIM algorithms cannot handle uncertain data, although this type of data is common in real-life. To address these limitations, this paper introduces the concept of Recent High Expected Weighted Itemset (RHEWI), which considers the recency, weight and uncertainty of patterns. By considering these three factors, more up-to-date and relevant results are found. A projection-based algorithm named RHEWI-P is presented to mine RHEWIs using a novel upper - bound downward closure ( UBDC ) property. An improved version of this algorithm called RHEWI-PS is further proposed based on a novel sorted upper-bound downward closure ( SUBDC ) property for pruning unpromising candidate itemsets early. An experimental evaluation against the state-of-the-art HEWI-Uapriori algorithm was carried out on both real-world and synthetic datasets. Results show that the proposed algorithms are highly efficient and are acceptable for mining the desired patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
25. Mining of frequent patterns with multiple minimum supports.
- Author
-
Gan, Wensheng, Lin, Jerry Chun-Wei, Fournier-Viger, Philippe, Chao, Han-Chieh, and Zhan, Justin
- Subjects
- *
DATA mining , *DATABASES , *SET theory , *ALGORITHMS , *INFORMATION & communication technologies - Abstract
Frequent pattern mining (FPM) is an important topic in data mining for discovering the implicit but useful information. Many algorithms have been proposed for this task but most of them suffer from an important limitation, which relies on a single uniform minimum support threshold as the sole criterion to identify frequent patterns (FPs). Using a single threshold value to assess the usefulness of all items in a database is inadequate and unfair in real-life applications since each item is different and not all items should be treated as the same. Several algorithms have been developed for mining FPs with multiple minimum supports but most of them suffer from the time-consuming problem and require a large amount of memory. In this paper, we address this issue by introducing the novel approach named F requent P attern mining with M ultiple minimum supports from the E numeration-tree (FP-ME). In the developed Set- E numeration -tree with M ultiple minimum supports (ME-tree) structure, a new sorted downward closure ( SDC ) property of FPs and the least minimum support ( LMS ) concept with multiple minimum supports are used to effectively prune the search space. The proposed FP-ME algorithm can directly discover FPs from the ME-tree without candidate generation. Moreover, an improved algorithm, named FP-ME DiffSet , is also developed based on the DiffSet concept, to further increase mining performance. Substantial experiments on both real-life and synthetic datasets show that the proposed algorithms can not only avoid the “ rare item problem ”, but also efficiently and effectively discover the complete set of FPs in transactional databases while considering multiple minimum supports and outperform the state-of-the-art CFP-growth++ algorithm in terms of execution time, memory usage and scalability. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
26. Mining high-utility itemsets based on particle swarm optimization.
- Author
-
Lin, Jerry Chun-Wei, Yang, Lu, Fournier-Viger, Philippe, Wu, Jimmy Ming-Thai, Hong, Tzung-Pei, Wang, Leon Shyue-Liang, and Zhan, Justin
- Subjects
- *
DATA mining , *PARTICLE swarm optimization , *ASSOCIATION rule mining , *SEARCH algorithms , *HEURISTIC algorithms , *EVOLUTIONARY computation , *GENETIC algorithms - Abstract
High-utility itemset mining (HUIM) is a critical issue in recent years since it can be used to reveal the profitable products by considering both the quantity and profit factors instead of frequent itemset mining (FIM) or association-rule mining (ARM). Several algorithms have been presented to mine high-utility itemsets (HUIs) and most of the designed algorithms have to handle the exponential search space for discovering HUIs when the number of distinct items and the size of database are very large. In the past, a heuristic HUPE umu -GRAM algorithm was proposed to mine HUIs based on genetic algorithm (GA). For the evolutionary computation (EC) techniques of particle swarm optimization (PSO), it only requires fewer parameters compared to the GA-based approach. Since the traditional PSO mechanism is used to handle the continuous problem, in this paper, the discrete PSO is adopted to encode the particles as the binary variables. An efficient PSO-based algorithm namely HUIM-BPSO sig is proposed to efficiently find HUIs. It first sets the number of discovered high-transaction-weighted utilization 1-itemsets (1-HTWUIs) as the size of a particle based on transaction-weighted utility (TWU) model, which can greatly reduce the combinational problem in evolution process. The sigmoid function is adopted in the updating process of the particles of the designed HUIM-BPSO sig algorithm. Substantial experiments on real-life datasets show that the proposed algorithm has better results compared to the state-of-the-art GA-based algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
27. Fast algorithms for hiding sensitive high-utility itemsets in privacy-preserving utility mining.
- Author
-
Lin, Jerry Chun-Wei, Wu, Tsu-Yang, Fournier-Viger, Philippe, Lin, Guo, Zhan, Justin, and Voznak, Miroslav
- Subjects
- *
DATA mining , *DATA security , *UTILITIES (Computer programs) , *INFORMATION sharing , *COMPUTER algorithms , *SECURITY systems - Abstract
High-Utility Itemset Mining (HUIM) is an extension of frequent itemset mining, which discovers itemsets yielding a high profit in transaction databases (HUIs). In recent years, a major issue that has arisen is that data publicly published or shared by organizations may lead to privacy threats since sensitive or confidential information may be uncovered by data mining techniques. To address this issue, techniques for privacy-preserving data mining (PPDM) have been proposed. Recently, privacy-preserving utility mining (PPUM) has become an important topic in PPDM. PPUM is the process of hiding sensitive HUIs (SHUIs) appearing in a database, such that the resulting sanitized database will not reveal these itemsets. In the past, the HHUIF and MSICF algorithms were proposed to hide SHUIs, and are the state-of-the-art approaches for PPUM. In this paper, two novel algorithms, namely Maximum Sensitive Utility-MAximum item Utility (MSU-MAU) and Maximum Sensitive Utility-MInimum item Utility (MSU-MIU), are respectively proposed to minimize the side effects of the sanitization process for hiding SHUIs. The proposed algorithms are designed to efficiently delete SHUIs or decrease their utilities using the concepts of maximum and minimum utility. A projection mechanism is also adopted in the two designed algorithms to speed up the sanitization process. Besides, since the evaluation criteria proposed for PPDM are insufficient and inappropriate for evaluating the sanitization performed by PPUM algorithms, this paper introduces three similarity measures to respectively assess the database structure, database utility and item utility of a sanitized database. These criteria are proposed as a new evaluation standard for PPUM. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
28. A sanitization approach for hiding sensitive itemsets based on particle swarm optimization.
- Author
-
Lin, Jerry Chun-Wei, Liu, Qiankun, Fournier-Viger, Philippe, Hong, Tzung-Pei, Voznak, Miroslav, and Zhan, Justin
- Subjects
- *
PARTICLE swarm optimization , *GENETIC algorithms , *DATA mining , *PERTURBATION theory , *INFORMATION theory , *DATABASES - Abstract
Privacy-preserving data mining (PPDM) has become an important research field in recent years, as approaches for PPDM can discover important information in databases, while ensuring that sensitive information is not revealed. Several algorithms have been proposed to hide sensitive information in databases. They apply addition and deletion operations to perturb an original database and hide the sensitive information. Finding an appropriate set of transactions/itemsets to be perturbed for hiding sensitive information while preserving other important information is a NP-hard problem. In the past, genetic algorithm (GA)-based approaches were developed to hide sensitive itemsets in an original database through transaction deletion. In this paper, a particle swarm optimization (PSO)-based algorithm called PSO2DT is developed to hide sensitive itemsets while minimizing the side effects of the sanitization process. Each particle in the designed PSO2DT algorithm represents a set of transactions to be deleted. Particles are evaluated using a fitness function that is designed to minimize the side effects of sanitization. The proposed algorithm can also determine the maximum number of transactions to be deleted for efficiently hiding sensitive itemsets, unlike the state-of-the-art GA-based approaches. Besides, an important strength of the proposed approach is that few parameters need to be set, and it can still find better solutions to the sanitization problem than GA-based approaches. Furthermore, the pre-large concept is also adopted in the designed algorithm to speed up the evolution process. Substantial experiments on both real-world and synthetic datasets show that the proposed PSO2DT algorithm performs better than the Greedy algorithm and GA-based algorithms in terms of runtime, fail to be hidden (F-T-H), not to be hidden (N-T-H), and database similarity (DS). [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
29. An efficient algorithm to mine high average-utility itemsets.
- Author
-
Lin, Jerry Chun-Wei, Li, Ting, Fournier-Viger, Philippe, Hong, Tzung-Pei, Zhan, Justin, and Voznak, Miroslav
- Subjects
- *
DATA mining , *SEARCH algorithms , *SET theory , *COMPUTER storage devices , *COMPUTER algorithms - Abstract
With the ever increasing number of applications of data mining, high-utility itemset mining (HUIM) has become a critical issue in recent decades. In traditional HUIM, the utility of an itemset is defined as the sum of the utilities of its items, in transactions where it appears. An important problem with this definition is that it does not take itemset length into account. Because the utility of larger itemset is generally greater than the utility of smaller itemset, traditional HUIM algorithms tend to be biased toward finding a set of large itemsets. Thus, this definition is not a fair measurement of utility. To provide a better assessment of each itemset’s utility, the task of high average-utility itemset mining (HAUIM) was proposed. It introduces the average utility measure, which considers both the length of itemsets and their utilities, and is thus more appropriate in real-world situations. Several algorithms have been designed for this task. They can be generally categorized as either level-wise or pattern-growth approaches. Both of them require, however, the amount of computation to find the actual high average-utility itemsets (HAUIs). In this paper, we present an efficient average-utility (AU)-list structure to discover the HAUIs more efficiently. A depth-first search algorithm named HAUI-Miner is proposed to explore the search space without candidate generation, and an efficient pruning strategy is developed to reduce the search space and speed up the mining process. Extensive experiments are conducted to compare the performance of HAUI-Miner with the state-of-the-art HAUIM algorithms in terms of runtime, number of determining nodes, memory usage and scalability. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
30. A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification.
- Author
-
Lin, Jerry Chun-Wei, Gan, Wensheng, and Hong, Tzung-Pei
- Subjects
- *
ASSOCIATION rule mining , *DATABASES , *ALGORITHMS , *PROFIT , *INFORMATION science - Abstract
High-utility itemsets mining (HUIM) is a critical issue which concerns not only the occurrence frequencies of itemsets in association-rule mining (ARM), but also the factors of quantity and profit in real-life applications. Many algorithms have been developed to efficiently mine high-utility itemsets (HUIs) from a static database. Discovered HUIs may become invalid or new HUIs may arise when transactions are inserted, deleted or modified. Existing approaches are required to re-process the updated database and re-mine HUIs each time, as previously discovered HUIs are not maintained. Previously, a pre-large concept was proposed to efficiently maintain and update the discovered information in ARM, which cannot be directly applied into HUIM. In this paper, a maintenance (PRE-HUI-MOD) algorithm with transaction modification based on a new pre-large strategy is presented to efficiently maintain and update the discovered HUIs. When the transactions are consequentially modified from the original database, the discovered information is divided into three parts with nine cases. A specific procedure is then performed to maintain and update the discovered information for each case. Based on the designed PRE-HUI-MOD algorithm, it is unnecessary to rescan original database until the accumulative total utility of the modified transactions achieves the designed safety bound, which can greatly reduce the computations of multiple database scans when compared to the batch-mode approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
31. Efficient algorithms for mining up-to-date high-utility patterns.
- Author
-
Lin, Jerry Chun-Wei, Gan, Wensheng, Hong, Tzung-Pei, and Tseng, Vincent S.
- Subjects
- *
ASSOCIATION rule mining , *DECISION making , *TIMESTAMPS , *A priori , *DATA mining - Abstract
High-utility pattern mining (HUPM) is an emerging topic in recent years instead of association-rule mining to discover more interesting and useful information for decision making. Many algorithms have been developed to find high-utility patterns (HUPs) from quantitative databases without considering timestamp of patterns, especially in recent intervals. A pattern may not be a HUP in an entire database but may be a HUP in recent intervals. In this paper, a new concept namely up-to-date high-utility pattern (UDHUP) is designed. It considers not only utility measure but also timestamp factor to discover the recent HUPs. The UDHUP-apriori is first proposed to mine UDHUPs in a level-wise way. Since UDHUP-apriori uses Apriori-like approach to recursively derive UDHUPs, a second UDHUP-list algorithm is then presented to efficiently discover UDHUPs based on the developed UDU-list structures and a pruning strategy without candidate generation, thus speeding up the mining process. A flexible minimum-length strategy with two specific lifetimes is also designed to find more efficient UDHUPs based on a users’ specification. Experiments are conducted to evaluate the performance of the proposed two algorithms in terms of execution time, memory consumption, and number of generated UDHUPs in several real-world and synthetic datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
32. Editorial for the special issue: Satellite imagery analysis and mapping for urban ecology.
- Author
-
Díaz, Vicente García, Lin, Jerry Chun-Wei, and Molinera, Juan Antonio Morente
- Published
- 2022
- Full Text
- View/download PDF
33. Introduction to the special section on Human-computer Interaction enabled Augmentative communication (VSI-hcac).
- Author
-
García-Díaz, Vicente, Lin, Jerry Chun Wei, and Molinera, Juan Antonio Morente
- Subjects
- *
HUMAN-computer interaction - Published
- 2022
- Full Text
- View/download PDF
34. Mitigating adversarial evasion attacks of ransomware using ensemble learning.
- Author
-
Ahmed, Usman, Lin, Jerry Chun-Wei, and Srivastava, Gautam
- Subjects
- *
RANSOMWARE , *PERSONALLY identifiable information , *MACHINE learning , *MALWARE - Abstract
Ransomware continues to pose a significant threat to cybersecurity by extorting money from users by locking their devices and personal data. The attackers force the payment of a ransom in order to restore access to personal files. Because of the structural similarity, detection of ransomware and benign applications becomes vulnerable to evasion attacks. Ensemble learning can provide countermeasures, while attackers can use the same technique to improve the effectiveness of their respective attacks. This motivates us to investigate whether the distinct ensemble method can achieve better performance when combined with the voting-based method. This research proposes a hybrid approach that examines permissions, text, and network-based features both statically and dynamically by monitoring memory usage, system call logs, and CPU usage. Ensemble machine learning analyzers on static and dynamic features extracted from Android malware applications (ransomware and non-ransomware) are then trained in the designed model. Our experimental results show that the proposed ensemble classification and detection technique can classify unknown static and dynamic ransomware behavior to mitigate adversarial evasion attacks. [Display omitted] • The designed model extracts and analyzes static network-based features. • Two ML-based ensemble models are proposed for static and dynamic feature sets. • The model is for Android ransomware adversarial evasion attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. Efficient algorithms to identify periodic patterns in multiple sequences.
- Author
-
Fournier-Viger, Philippe, Li, Zhitian, Lin, Jerry Chun-Wei, Kiran, Rage Uday, and Fujita, Hamido
- Subjects
- *
PATTERN recognition systems , *DATA mining , *COMPUTER algorithms , *PROBLEM solving , *SEQUENCE analysis - Abstract
Periodic pattern mining is a popular data mining task, which consists of identifying patterns that periodically appear in data. Traditional periodic pattern mining algorithms are designed to find patterns in a single sequence. However, in several domains, it is desirable to discover patterns that are periodic in many sequences. An example of such application is market basket analysis. Given a database of sequences of transactions made by customers, discovering sets of items that are periodically bought by customers can help understand customer behavior. To discover periodic patterns common to multiple sequences, this paper extends the traditional problem of mining periodic patterns in a sequence. Two novel measures are defined called the standard deviation of periods and the sequence periodic ratio. Two algorithms are proposed to mine these patterns efficiently called MPFPS BFS and MPFPS DFS , which perform a breadth-first search and depth-first search, respectively. Because the sequence periodic ratio is neither monotone nor anti-monotone, these algorithms rely on a novel upper-bound called boundRa and two novel search space pruning properties to find periodic patterns efficiently. The algorithms have been evaluated on multiple datasets. Results show that they are efficient and can filter numerous non periodic itemsets to identify periodic patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. An air quality prediction model based on improved Vanilla LSTM with multichannel input and multiroute output.
- Author
-
Fang, Wei, Zhu, Runsu, and Lin, Jerry Chun-Wei
- Subjects
- *
AIR quality , *DEEP learning , *PREDICTION models , *SIMILARITY (Physics) , *MULTI-channel integration - Abstract
Long short-term memory (LSTM), especially vanilla LSTM (VLSTM), has been widely used in air quality prediction field. However, VLSTM has many more parameters, thereby making training slow and prediction performance unstable. The VLSTM network input data have not been selected for better efficiency. In this paper, we propose an air quality prediction model based on the improved VLSTM with multichannel input and multiroute output (IVLSTM-MCMR). The proposed model includes the IVLSTM and MCMR modules. The proposed IVLSTM module is developed by improving the VLSTM inner structure of VLSTM in order to reduce the number of parameters that help to accelerate the convergence. A new historical information usage approach is further proposed to obtain a stable training process. For the MCMR module, a multichannel data input model (MC) with an improved linear similarity dynamic time warping is introduced to choose the valid data as the input of IVLSTM. A multiroute output model (MR) is designed to integrate the results from MC, in which the results of different target stations with different features are output by different routes. We evaluate the proposed model with the collected data from Beijing, China, and the experimental results show that our model achieves improvements regarding the predication performance. • An air quality prediction model IVLSTM-MCMR is proposed based on deep learning. • The number of parameters in IVLSTM-MCMR is reduced to accelerate the convergence. • The improved linear similarity dynamic time warping is introduced in IVLSTM-MCMR. • The integration of multi-channel data input and multi-route output is designed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. AF-GCN: Completing various graph tasks efficiently via adaptive quadratic frequency response function in graph spectral domain.
- Author
-
Pang, Shanchen, Zhang, Kuijie, Wang, Gan, Lin, Jerry Chun-Wei, Wang, Fuyu, Meng, Xiangyu, Wang, Shudong, and Zhang, Yuanyuan
- Subjects
- *
COMPLETE graphs , *CONVOLUTIONAL neural networks , *ADAPTIVE filters , *DRUG repositioning , *SOCIAL network analysis , *INFERENCE (Logic) - Abstract
Graph neural network is a breakthrough in applying deep learning to non-Euclidean space. It is widely used for tasks such as social network analysis, molecular function inference, drug repositioning and protein modeling, achieving outstanding performance on relational models. Despite the great success of graph neural networks, most of them cannot be generalized to various scenarios. This is because graph information needs variation for different tasks and fixed models limit the flexibility of feature extraction. To address this challenge, we design a graph filter that can be adaptively adjusted according to graph tasks. This filter combines the multi-view strategy with the learnable quadratic frequency response function, using the crests of the quadratic functions to adaptively emphasize the required information. We further design the graph convolutional network model base on this adaptive filter, named AF-GCN. Extensive experiments are performed with 13 SOTA models on 12 different real-world datasets, including homogeneous and heterogeneous datasets in the node classification task, biological datasets, and social network datasets in the graph classification task. AF-GCN achieves state-of-the-art results in various scenarios. In addition, AF-GCN has high interpretability in the graph spatial domain. The development process from Graph Convolutional Networks (GCN) to AF-GCN has historical similarities with the development of Convolution Neural Networks (CNN). [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. A consensus protocol for unmanned aerial vehicle networks in the presence of Byzantine faults.
- Author
-
Cheng, Chien-Fu, Srivastava, Gautam, Lin, Jerry Chun-Wei, and Lin, Ying-Chen
- Subjects
- *
AERIAL photography , *AERIAL spraying & dusting in agriculture , *DRONE aircraft - Abstract
This paper discusses the fault-tolerant consensus problem in Unmanned Aerial Vehicle Networks (UAVNets). In recent years, the applications of UAVNets have become more and more popular. Related applications include aerial photography, geological and topographic surveys, disaster monitoring, military applications, and so on. Therefore, it is very important to build a highly reliable and fault-tolerant UAVNets. However, the network architecture of UAVNets is very different from previous network architectures. Because UAVs fly at a high speed, the topology also varies quickly. Hence, how to collect sufficient messages to reach a consensus on the network for UAVs is a challenge. In this paper, the characteristics of the distributed UAVNets will be explored first. Then, based on the characteristics of the UAVNets, a new fault-tolerant consensus protocol called the UAV Consensus Protocol (UCP) is proposed. The proposed UCP consists of two phases: the message exchanging phase and the consensus making phase. The proposed UCP can solve the consensus problem with ⌞(n O -1)/3⌟+1 rounds of message exchange in the presence of ⌞(n O -1- a O)/3⌟ Byzantine faulty UAVs and a O away UAVs, where n O is the number of AUVs in the UAVNet. Moreover, the correctness of the proposed UCP is also proved in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
39. Deep learning based hashtag recommendation system for multimedia data.
- Author
-
Djenouri, Youcef, Belhadi, Asma, Srivastava, Gautam, and Lin, Jerry Chun-Wei
- Subjects
- *
DEEP learning , *RECOMMENDER systems , *MULTIMEDIA systems , *CONVOLUTIONAL neural networks , *EVOLUTIONARY algorithms - Abstract
This work aims to provide a novel hybrid architecture to suggest appropriate hashtags to a collection of orpheline tweets. The methodology starts with defining the collection of batches used in the convolutional neural network. This methodology is based on frequent pattern extraction methods. The hashtags of the tweets are then learned using the convolution neural network that was applied to the collection of batches of tweets. In addition, a pruning approach should ensure that the learning process proceeds properly by reducing the number of common patterns. Besides, the evolutionary algorithm is involved to extract the optimal parameters of the deep learning model used in the learning process. This is achieved by using a genetic algorithm that learns the hyper-parameters of the deep architecture. The effectiveness of our methodology has been demonstrated in a series of detailed experiments on a set of Twitter archives. From the results of the experiments, it is clear that the proposed method is superior to the baseline methods in terms of efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. Motion Context guided Edge-preserving network for video salient object detection.
- Author
-
Huang, Kan, Tian, Chunwei, Xu, Zhijing, Li, Nannan, and Lin, Jerry Chun-Wei
- Subjects
- *
VIDEO compression , *VIDEO processing , *VIDEOS , *MOTION - Abstract
Video salient object detection targets at extracting the most conspicuous objects in a video sequence, which facilitate various video processing tasks, e.g., video compression, video recognition, etc. Although remarkable progress has been made for video salient object detection, most existing methods still suffer from coarse edge boundaries which may hinder their usage in real-world applications. To alleviate this problem, in this paper, we propose a Motion Context guided Edge-preserving network (MCE-Net) model for video salient object detection. MCE-Net can generate temporally consistent salient edges, which are then leveraged to refine the salient object regions completely and uniformly. The core innovation in MCE-Net is an Asymmetric Cross-Reference Module (ACRM), which is designed to exploit the cross-modal complementarity between spatial structure and motion flow, facilitating robust salient object edge extraction. To leverage the extracted edge features for salient object refinement, we fuse them with multi-level spatial–temporal embeddings in a paralleled guidance manner, generating the final saliency results. The proposed method is end-to-end trainable and the edge annotations are generated automatically from ground truth saliency maps. Experimental evaluations on five widely-used benchmarks demonstrate that our proposed method can achieve superior performance to other outstanding methods. Moreover, the experimental results indicate that our method can preserve salient objects with clear boundary structures in video sequences. [Display omitted] • We propose to address the coarse boundary issue in video salient object detection. • A novel method that uses object boundaries to refine salient objects is presented. • The complementarity between spatial and motion cues is exploited to generate edges. • Evaluations on five benchmarks verify the efficacy of the edge refinement strategy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
41. Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework.
- Author
-
Wu, Jimmy Ming-Tai, Srivastava, Gautam, Wei, Min, Yun, Unil, and Lin, Jerry Chun-Wei
- Subjects
- *
SEQUENTIAL pattern mining , *BIG data - Abstract
• An efficient EFUPM to discover the fuzzy high-utility patterns is proposed. • A Hadoop-based HFUPM is proposed to handle large-scale databases. • Two upper-bounds are then designed to early remove the unpromising candidates. • Experiments showed that the better performance can be obtained. Over the past decade, high-utility itemset mining (HUIM) has received widespread attention that can emphasize more critical information than was previously possible using frequent itemset mining (FIM). Unfortunately, HUIM is very similar to FIM since the methodology determines itemsets using a binary model based on a pre-defined minimum utility threshold. Additionally, most previous works only focused on single, small datasets in HUIM, which is not realistic to any real-world scenarios today containing big data environments. In this work, the fuzzy-set theory and a MapReduce framework are both utilized to design a novel high fuzzy utility pattern mining algorithm to resolve the above issues. Fuzzy-set theory is first involved and a new algorithm called efficient high fuzzy utility itemset mining (EFUPM) is designed to discover high fuzzy utility patterns from a single machine. Two upper-bounds are then estimated to allow early pruning of unpromising candidates in the search space. To handle the large-scale of big datasets, a Hadoop-based high fuzzy utility pattern mining (HFUPM) algorithm is then developed to discover high fuzzy utility patterns based on the Hadoop framework. Experimental results clearly show that the proposed algorithms perform strongly to mine the required high fuzzy utility patterns whether in a single machine or a large-scale environment compared to the current state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
42. An optimization neural network model for bridge cable force identification.
- Author
-
Gai, Tongtong, Yu, Dehu, Zeng, Sen, and Lin, Jerry Chun-Wei
- Subjects
- *
CABLE structures , *CABLE-stayed bridges , *OPTIMIZATION algorithms , *PARTICLE swarm optimization , *FINITE element method , *FREQUENCIES of oscillating systems , *CABLES - Abstract
Accurate determination of cable force values is the most important technical means to avoid damage to the cable bridge. In order to avoid the influence of the difficulty in distinguishing the boundary conditions and the lack of low-order natural frequency on the cable force determination results, an intelligent method for determining the bridge cable force based on the vibration method is proposed. With the cable length, linear density, flexural stiffness and input frequency as input units and the cable force as output unit, a neural network is established to identify the cable force by combining the finite element simulation data, and the model is optimized using the intelligent swarm optimization algorithm. The results show that compared with the cable force prediction models using generalized regression neural network (GRNN) and GRNN optimized using particle swarm optimization (PSO-GRNN) and canonical identification methods, the GRNN optimized using sparrow search algorithm (SSA-GRNN) proposed in this paper has a better identification effect. The prediction error of short cables is essentially within 10%, and that of long cables is within 5%. It can not only realize the accurate identification of bridge cable force by ignoring the boundary conditions and vibration frequency order of cables, but also has a wide range of applications. • Establish finite element model to provide data basis for the design model. • Identify cable force ignoring the boundary conditions and vibration frequency order. • GRNN is optimized by SSA and PSO to improve the performance of the design model. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. Prediction and control of water quality in Recirculating Aquaculture System based on hybrid neural network.
- Author
-
Yang, Junchao, Jia, Lulu, Guo, Zhiwei, Shen, Yu, Li, Xianwei, Mou, Zhenping, Yu, Keping, and Lin, Jerry Chun-Wei
- Subjects
- *
WATER quality management , *CONVOLUTIONAL neural networks , *FEATURE extraction , *AQUACULTURE , *WATER quality - Abstract
In the Recirculating Aquaculture Systems (RAS), the control of water quality indices remains essential to survival and growth of aquaculture objects. This requires effect prediction of future water status in advance, which can be adopted to help the generation of following control strategies. However, conventional methods of water quality prediction were mostly dependent on redundant parameters of model, which leads to inefficiency and low accuracy. In addition, the complexity of the RAS multi-units requires intelligent control of the water quality unit. Thus, a prediction and control framework for predicting water quality in RAS is proposed in this paper. Specifically, a hybrid deep learning structure which combines the Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU) and Attention mechanism is presented. To begin with, the CNN is utilized to extract local features for different timestamped water quality parameter. After the local features have been extracted, the proposed GRU model replicates the global sequential features of the parameters. The attention mechanism is then applied to focus on more critical features to promote the efficiency and accuracy of prediction. Finally, to demonstrate the efficiency and stability of the prediction and control framework with the mixture of CNN, GRU and Attention (PC-CGA), multiple groups of experiments and evaluations are carried out in a medium size RAS. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. Energy grid management system with anomaly detection and Q-learning decision modules.
- Author
-
Syu, Jia-Hao, Srivastava, Gautam, Fojcik, Marcin, Cupek, Rafał, and Lin, Jerry Chun-Wei
- Subjects
- *
ANOMALY detection (Computer security) , *ENERGY management , *ENERGY subsidies , *ARTIFICIAL intelligence , *ENERGY security , *LEARNING Management System - Abstract
Stability and security issues in energy management have become widespread research topics, in which artificial intelligence techniques are often embedded in management systems to efficiently manage the smart grid. In this paper, we propose an energy grid management system with anomaly detection and Q-learning decision modules (EMSAD). The anomaly detection module is a multitask learning network that simultaneously classifies suppliers and predicts actual supply quantities. The Q-learning decision module then determines the operating reserve and subsidies to manage the energy grid. Experimental results show that the proposed anomaly detection module has an excellent performance in classifying malicious suppliers with F1-scores from 73.3% to 100.0%. The robustness evaluation also shows that EMSAD maintains high performance even in unseen environments without fine-tuning. Thus, the simulation results demonstrate the security, efficiency, transferability, and robustness of the proposed EMSAD in smart grid energy management. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Fast and effective cluster-based information retrieval using frequent closed itemsets.
- Author
-
Djenouri, Youcef, Belhadi, Asma, Fournier-Viger, Philippe, and Lin, Jerry Chun-Wei
- Subjects
- *
INFORMATION retrieval , *DOCUMENT clustering , *ALGORITHMS , *CLOUD computing , *DATA mining - Abstract
Document Information retrieval consists of finding the documents in a collection of documents that are the most relevant to a user query. Information retrieval techniques are widely-used by organizations to facilitate the search for information. However, applying traditional information retrieval techniques is time consuming for large document collections. Recently, cluster-based information retrieval approaches have been developed. Although these approaches are often much faster than traditional approaches for processing large document collections, the quality of the documents retrieved by cluster-based approaches is often less than that of traditional approaches. To address this drawback of cluster-based approaches, and improve the performance of information retrieval both in terms of runtime and quality of retrieved documents, this paper proposes a new cluster-based information retrieval approach named ICIR (Intelligent Cluster-based Information Retrieval). The proposed approach combines k-means clustering with frequent closed itemset mining to extract clusters of documents and find frequent terms in each cluster. Patterns discovered in each cluster are then used to select the most relevant document clusters to answer each user query. Four alternative heuristics are proposed to select the most relevant clusters, and two alternative heuristics for choosing documents in the selected clusters. Thus, eight versions of the proposed approach are obtained. To validate the proposed approach, extensive experiments have been carried out on well-known document collections. Results show that the designed approach outperforms traditional and cluster-based information retrieval approaches both in terms of execution time and quality of the returned documents. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
46. Explainable Artificial Intelligence for Cybersecurity.
- Author
-
Sharma, Deepak Kumar, Mishra, Jahanavi, Singh, Aeshit, Govil, Raghav, Srivastava, Gautam, and Lin, Jerry Chun-Wei
- Subjects
- *
ARTIFICIAL intelligence , *INTERNET security , *TRUST , *MACHINE learning - Abstract
Recently, numerous Machine Learning (ML) algorithms have been applied in many areas of cybersecurity. However, most of these systems can only be seen as a black box to users. To improve our understanding of such systems, adversarial machine learning approaches can be used. The main features are detected by analyzing the extent of such changes, which helps in identifying the main reasons for misclassification. In this paper, the presented approach has obtained satisfactory results that accurately explains the reasons for misclassifications. Some features of the presented method can be applied to any classifier with defined gradients without the need for modifications. The proposed model can be extended to perform more diagnoses and it can be used for a deeper analysis of systems, obtaining more than 95% accuracy classification on the used datasets in the experiments. [Display omitted] • Explains misclassifications by data-driven AI models using an adversarial approach. • Compute the minimum number of changes to input features required. • Increased average classification accuracy by 2.5% post modification. • Designed a black-box attack to test the correctness and trustworthiness. • Used explanation maps to examine the effectiveness of attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. Space–time series clustering: Algorithms, taxonomy, and case study on urban smart cities.
- Author
-
Belhadi, Asma, Djenouri, Youcef, Nørvåg, Kjetil, Ramampiaro, Heri, Masseglia, Florent, and Lin, Jerry Chun-Wei
- Subjects
- *
URBAN studies , *FUZZY logic , *TAXONOMY , *CASE studies , *OPEN-ended questions - Abstract
This paper provides a short overview of space–time series clustering, which can be generally grouped into three main categories such as: hierarchical, partitioning-based, and overlapping clustering. The first hierarchical category is to identify hierarchies in space–time series data. The second partitioning-based category focuses on determining disjoint partitions among the space–time series data, whereas the third overlapping category explores fuzzy logic to determine the different correlations between the space–time series clusters. We also further describe solutions for each category in this paper. Furthermore, we show the applications of these solutions in an urban traffic data captured on two urban smart cities (e.g., Odense in Denmark and Beijing in China). The perspectives on open questions and research challenges are also mentioned and discussed that allow to obtain a better understanding of the intuition, limitations, and benefits for the various space–time series clustering methods. This work can thus provide the guidances to practitioners for selecting the most suitable methods for their used cases, domains, and applications. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
48. Erasable pattern mining based on tree structures with damped window over data streams.
- Author
-
Baek, Yoonji, Yun, Unil, Kim, Heonho, Nam, Hyoju, Lee, Gangin, Yoon, Eunchul, Vo, Bay, and Lin, Jerry Chun-Wei
- Subjects
- *
RIVERS , *SEQUENTIAL pattern mining , *SCALABILITY , *ART techniques - Abstract
Several pattern mining methods have been proposed to process dynamic data streams because the data generated in industrial fields is continually accumulated. Erasable pattern mining techniques for processing dynamic data streams are needed to discover erasable patterns from dynamic data streams. In previous erasable pattern mining approaches suggested for dynamic data streams, all data are considered to have the same importance regardless of its timestamp. However, dynamic data streams have the characteristic that the new data is relatively more significant than the old data. In erasable pattern mining, one of the desired techniques is an approach in consideration of such characteristic of data streams. For this reason, we propose an erasable pattern mining algorithm over dynamic data streams based on the damped window model. Since the suggested technique considers the new data more important than the previous data, it can find more useful erasable patterns. In addition, erasable pattern mining based on the damped window model is conducted efficiently by employing the tree and table structures. In performance test, we present that our pruning techniques remove unnecessary operations related to invalid erasable patterns efficiently from damped-window-based data streams. Performance evaluation results using real datasets and synthetic datasets show that the proposed approach has good performance with regard to as execution time, pattern generation, and scalability by comparing between the suggested technique and the state of the art algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
49. Efficient approach for incremental weighted erasable pattern mining with list structure.
- Author
-
Nam, Hyoju, Yun, Unil, Yoon, Eunchul, and Lin, Jerry Chun-Wei
- Subjects
- *
PRUNING , *MANUFACTURING industries , *SEQUENTIAL pattern mining - Abstract
• Efficient Incremental Weighted Erasable Pattern Mining is suggested. • List structures for incremental weighted erasable patterns are proposed. • Pruning techniques considering the list structures are presented. • Performance improvements are shown with various experiments. Erasable pattern mining is one of the important fields of frequent pattern mining. It diagnoses and solves the economic problems that arise in the manufacturing industry. The real-world database is continually accumulated over time, and each item has a different importance. Therefore, if we use conventional erasable pattern mining without considering the characteristics of the real-world database, less meaningful patterns can be extracted. Also, when mining a real-world database, the algorithm must be able to process operations quickly and efficiently. In this paper, in order to meet these requirements, we propose an algorithm which is implemented as a list structure for mining erasable patterns in an incremental database with weighted condition. Compared to existing state-of-the-art mining algorithms, the proposed algorithm performs pattern pruning by applying weighted condition to a dynamic database, so it extracts fewer candidate patterns and shows fast performance. We test our algorithms and the algorithms previously presented with various real datasets and synthetic datasets and obtained results such as run time, memory usage, scalability, and accuracy tests. By analyzing and comparing these experimental results, we show that the proposed algorithm has outstanding performance. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.