Descriptor: "Conceptual clustering" / Topic: business.industry - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Conceptual clustering"' showing total 1,534 results

Start Over Descriptor "Conceptual clustering" Topic business.industry

1,534 results on '"Conceptual clustering"'

1. Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering

Author: Kadri Umbleja, Manabu Ichino, and Hiroyuki Yaguchi
Subjects: Computer science, Conceptual clustering, Feature selection, 02 engineering and technology, Similarity measure, hierarchical conceptual clustering, 01 natural sciences, 010104 statistics & probability, multi-role measure, Histogram, 0202 electrical engineering, electronic engineering, information engineering, 0101 mathematics, Cluster analysis, histogram-valued data, visualization, business.industry, Statistics, Pattern recognition, HA1-4737, Data set, ComputingMethodologies_PATTERNRECOGNITION, Compact space, Feature (computer vision), compactness, 020201 artificial intelligence & image processing, Artificial intelligence, business, unsupervised feature selection
Abstract: This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described using a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. Minimizing the compactness is equivalent to maximizing the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as a feature effectiveness criterion. Features having small average compactness are mutually covariate and are able to detect a geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data via visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.
Published: 2021

2. Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data

Author: Kadri Umbleja, Manabu Ichino, and Hiroyuki Yaguchi
Subjects: Statistics and Probability, Property (programming), Computer science, business.industry, Applied Mathematics, Big data, Conceptual clustering, 02 engineering and technology, computer.software_genre, 01 natural sciences, Data type, Computer Science Applications, 010104 statistics & probability, Monotone polygon, Histogram, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, 0101 mathematics, Cluster analysis, business, computer, Quantile
Abstract: Symbolic data is aggregated from bigger traditional datasets in order to hide entry specific details and to enable analysing large amounts of data, like big data, which would otherwise not be possible. Symbolic data may appear in many different but complex forms like intervals and histograms. Identifying patterns and finding similarities between objects is one of the most fundamental tasks of data mining. In order to accurately cluster these sophisticated data types, usual methods are not enough. Throughout the years different approaches have been proposed but they mainly concentrate on the “macroscopic” similarities between objects. Distributional data, for example symbolic data, has been aggregated from sets of large data and thus even the smallest microscopic differences and similarities become extremely important. In this paper a method is proposed for clustering distributional data based on these microscopic similarities by using quantile values. Having multiple points for comparison enables to identify similarities in small sections of distribution while producing more adequate hierarchical concepts. Proposed algorithm, called microscopic hierarchical conceptual clustering, has a monotone property and has been found to produce more adequate conceptual clusters during experimentation. Furthermore, thanks to the usage of quantiles, this algorithm allows us to compare different types of symbolic data easily without any additional complexity.
Published: 2020

3. AutoName: A Corpus-Based Set Naming Framework

Author: James Allan, Zhiqi Huang, Jingbo Shang, Puxuan Yu, and Razieh Rahimi
Subjects: Text corpus, Information seeking, business.industry, Computer science, Conceptual clustering, Context (language use), 02 engineering and technology, computer.software_genre, Set (abstract data type), Task (computing), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Language model, Artificial intelligence, business, computer, Natural language processing
Abstract: We propose AutoName, an unsupervised framework that extracts a name for a set of query entities from a large-scale text corpus. Entity-set naming is useful in many tasks related to natural language processing and information retrieval such as session-based and conversational information seeking. Previous studies mainly extract set names from knowledge bases which provide highly reliable entity relations, but suffer from limited coverage of entities and set names that represent broad semantic classes. To address these problems, AutoName generates hypernym-anchored candidate phrases via probing a pre-trained language model and the entities' context in documents. Phrases are then clustered to identify ones that describe common concepts among query entities. Finally, AutoName ranks refined phrases based on the co-occurrences of their words with query entities and the conceptual integrity of their respective clusters. We built a new benchmark dataset for this task, consisting of 130 entity sets with name labels. Experimental results show that AutoName generates coherent and meaningful set names and significantly outperforms all baselines.
Published: 2021

4. Unsupervised Feature Selection for Histogram-Valued Symbolic Data by Hierarchical Conceptual Clustering

Author: Manabu Ichino, Hiroyuki Yaguchi, and Kadri Umbleja
Subjects: ComputingMethodologies_PATTERNRECOGNITION, Compact space, Computer science, business.industry, Histogram, Conceptual clustering, Feature selection, Pattern recognition, Artificial intelligence, business, algebra_number_theory, Visualization
Abstract: This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described by a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. To minimize the compactness is equivalent to maximize the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as feature effectiveness criterion. Features having small average compactness are mutually covariate, and are able to detect geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data by the visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.
Published: 2021

5. The novel application of dynamic graphics to unsupervised learning, graphs, and cycle clustering

Author: Zachary Thomas Cox
Subjects: business.industry, Computer science, Conceptual clustering, Unsupervised learning, Artificial intelligence, Graphics, business, Cluster analysis, Machine learning, computer.software_genre, computer
Published: 2020

6. Fuzzy-Based Concept Learning Method: Exploiting Data With Fuzzy Conceptual Clustering

Author: Yunlong Mi, Wenqi Liu, Mengyu Yan, Jinhai Li, and Yong Shi
Subjects: business.industry, Computer science, Conceptual clustering, 02 engineering and technology, Similarity measure, Machine learning, computer.software_genre, Fuzzy logic, Computer Science Applications, Human-Computer Interaction, Control and Systems Engineering, 020204 information systems, Concept learning, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, Fuzzy concept, 020201 artificial intelligence & image processing, Artificial intelligence, Electrical and Electronic Engineering, business, Cluster analysis, computer, Software, MNIST database, Information Systems
Abstract: Concepts have been adopted in concept-cognitive learning (CCL) and conceptual clustering for concept classification and concept discovery. However, the standard CCL algorithms are incapable of tackling continuous data directly, and some standard conceptual clustering methods mainly focus on the attribute information, ignoring the object information that is also important to improve clustering analysis and concept classification ability. Therefore, in this article, we present a novel concept learning method, called the fuzzy-based concept learning model (FCLM), to address these two issues by exploiting concept hierarchical relations in concept lattices. More specifically, we first show some new related notions for FCLM based on a regular fuzzy formal decision context; among these notions, the object-oriented and attribute-oriented fuzzy concept similarities are used to achieve the concept similarity measure in concept lattices. Moreover, a novel fuzzy concept learning framework is designed, and its corresponding learning algorithms are developed. Finally, we conduct some experiments on various real-world datasets to demonstrate that the proposed method can achieve the state-of-the-art classification performance among similarity-based learning methods. In addition, we further verify the effectiveness of our method in concept discovery on the MNIST dataset.
Published: 2020

7. Feature Selection Algorithms for Classification and Clustering

Author: Arvind Kumar Tiwari
Subjects: 0303 health sciences, Fuzzy clustering, business.industry, Computer science, Correlation clustering, 030302 biochemistry & molecular biology, Conceptual clustering, Pattern recognition, Biclustering, 03 medical and health sciences, CURE data clustering algorithm, Canopy clustering algorithm, FLAME clustering, Artificial intelligence, business, Cluster analysis, 030304 developmental biology
Abstract: Feature selection is an important topic in data mining, especially for high dimensional dataset. Feature selection is a process commonly used in machine learning, wherein subsets of the features available from the data are selected for application of learning algorithm. The best subset contains the least number of dimensions that most contribute to accuracy. Feature selection methods can be decomposed into three main classes, one is filter method, another one is wrapper method and third one is embedded method. This chapter presents an empirical comparison of feature selection methods and its algorithm. In view of the substantial number of existing feature selection algorithms, the need arises to count on criteria that enable to adequately decide which algorithm to use in certain situation. This chapter reviews several fundamental algorithms found in the literature and assess their performance in a controlled scenario.
Published: 2020

8. Extreme Learning Machine for Joint Embedding and Clustering

Author: Zhiping Lin, Tianchi Liu, Chamara Kasun Liyanaarachchi Lekamalage, Guang-Bin Huang, and School of Electrical and Electronic Engineering
Subjects: 0209 industrial biotechnology, Fuzzy clustering, Cognitive Neuroscience, Correlation clustering, Conceptual clustering, 02 engineering and technology, Machine learning, computer.software_genre, 020901 industrial engineering & automation, Artificial Intelligence, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Mathematics, Feature Learning, business.industry, Constrained clustering, Computer Science Applications, Data stream clustering, Computer science and engineering [Engineering], 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, computer, Embedding
Abstract: Clustering generic data, i.e., data not specific to a particular field, is a challenging problem due to their diverse complex structures in the original feature space. Traditional approaches address this problem by complementing clustering with feature learning methods, which either capture the intrinsic structure of the data or represent the data such that clusters are better revealed. In this paper, we propose an approach referred to as Extreme Learning Machine for Joint Embedding and Clustering (ELM-JEC), which incorporates desirable properties of both types of feature learning methods at the same time, specifically by (1) preserving the manifold structure of the data in the original space; (2) maximizing the class separability of the data in the embedded space. Since either type of method has improved clustering performance in some cases, our motivation is to integrate the two desirable properties to further improve the accuracy and robustness of clustering. Additional notable features of ELM-JEC are that it provides nonlinear feature mappings and achieves feature learning and clustering in the same formulation. The proposed approach can be implemented using alternating optimization, and its clustering performance compares favorably with several state-of-the-art methods on the real-world benchmark datasets. MOE (Min. of Education, S’pore)
Published: 2018

9. Semi supervised classification of scientific and technical literature based on semi supervised hierarchical description of improved latent dirichlet allocation (LDA)

Author: Yongjun Zhang, Zijian Wang, and Jialin Ma
Subjects: Conceptualization, Computer Networks and Communications, Computer science, business.industry, Conceptual clustering, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, Technical literature, Latent Dirichlet allocation, symbols.namesake, Application domain, Test set, 0202 electrical engineering, electronic engineering, information engineering, Ontology, symbols, Graph (abstract data type), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Software, Natural language processing
Abstract: Chinese text classification problem was studied based on domain ontology graph (DOG) of semi-supervised conceptual clustering to solve the problem that English word disambiguation method cannot be applied to Chinese text classification. Structure model of domain ontology graph, text classification algorithm in HowNet dictionary and KLSeeker ontology and so on were used to realize accurate classification of Chinese text and display effectiveness of algorithm. Chinese text classification model in domain ontology graph based on conceptual clustering was developed from the angle of decreasing human participation in ontology construction as much as possible in the paper. Aimed at application domain of Chinese web text, the algorithm can generate DOG of knowledge conceptualization automatically. At the same time, document ontology graph (DocOG) was defined to represent contents of individual text document. DocOG extracting target realized text classification based on ontology by matching of single document ontology and domain ontology. Finally, example calculation analysis and actual data test set experiment were given in experimental stage. The result shows that proposed Chinese text classification method has higher classification accuracy and reflects effectiveness of design.
Published: 2018

10. Clustervision: Visual Supervision of Unsupervised Clustering

Author: Ben Eysenbach, Christopher De Filippi, Walter F. Stewart, Adam Perer, Janu Verma, Kenney Ng, and Bum Chul Kwon
Subjects: Clustering high-dimensional data, Fuzzy clustering, Computer science, Correlation clustering, Conceptual clustering, 02 engineering and technology, Machine learning, computer.software_genre, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Brown clustering, business.industry, 020207 software engineering, Computer Graphics and Computer-Aided Design, Data stream clustering, Signal Processing, Unsupervised learning, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Data mining, Artificial intelligence, business, computer, Software
Abstract: Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision , a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Published: 2018

11. Unsupervised clustering of service performance behaviors

Author: Hala S. Own and Hamdi Yahyaoui
Subjects: Clustering high-dimensional data, Multivariate statistics, Information Systems and Management, Fuzzy clustering, Computer science, Correlation clustering, Conceptual clustering, 02 engineering and technology, computer.software_genre, Machine learning, 01 natural sciences, Theoretical Computer Science, Biclustering, 010104 statistics & probability, Artificial Intelligence, CURE data clustering algorithm, 0202 electrical engineering, electronic engineering, information engineering, Entropy (information theory), 0101 mathematics, Cluster analysis, k-medians clustering, business.industry, Computer Science Applications, Data stream clustering, Control and Systems Engineering, Canopy clustering algorithm, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, business, computer, Software
Abstract: We propose in this paper a novel approach for unsupervised clustering of services’ behaviors. These behaviors are modeled as multivariate time series that capture the evaluation of several service quality attributes for a period of time. The importance weights of quality attributes are derived based on the Shannon’s entropy concept and the service data is flattened in a format that is convenient for clustering. The flattening process spans over a time oriented aggregation transformation, which leverages Haar reduction. The reduction is modeled as a maximization of an objective function. The absence of ground truth is tackled by performing a set of tests to determine the best number of clusters and clustering algorithms. Extensive experiments were conducted to validate the proposed unsupervised clustering approach.
Published: 2018

12. Learning Automata Clustering

Author: Alireza Rezvanian and Mohammad Hasanzadeh-Mofrad
Subjects: Computer Science::Machine Learning, Fuzzy clustering, General Computer Science, Learning automata, Computer science, business.industry, Correlation clustering, k-means clustering, Conceptual clustering, 020206 networking & telecommunications, 02 engineering and technology, Semi-supervised learning, Nonlinear Sciences::Cellular Automata and Lattice Gases, Theoretical Computer Science, ComputingMethodologies_PATTERNRECOGNITION, Modeling and Simulation, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Cluster analysis, Computer Science::Formal Languages and Automata Theory
Abstract: Clustering of data points has been a profound research avenue in the history of machine learning algorithms. Using learning automata which are autonomous decision making entities, in this paper, the learning automata clustering algorithm is proposed. In learning automata clustering, each data point is affiliated with a learning automaton where the learning automaton determines the cluster membership of that data point. The cluster rectification is done through a reinforcement signal for each learning automaton which is fabricated from the Euclidean distance of that data point and the mean value of its designated cluster. Finally, the learning automata clustering is compared with four centroid-based clustering algorithms, K-means, K-means++, K-medians, and K-medoids and results demonstrate the high clustering accuracy and comparable Silhouette coefficient of the proposed method.
Published: 2018

13. Information Clustering Using Manifold-Based Optimization of the Bag-of-Features Representation

Author: Nikolaos Passalis and Anastasios Tefas
Subjects: DBSCAN, Clustering high-dimensional data, Fuzzy clustering, Computer science, Correlation clustering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Conceptual clustering, 02 engineering and technology, computer.software_genre, Biclustering, Text mining, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Cluster analysis, Brown clustering, K-SVD, business.industry, 020206 networking & telecommunications, Pattern recognition, Spectral clustering, Manifold, Computer Science Applications, Human-Computer Interaction, ComputingMethodologies_PATTERNRECOGNITION, Data stream clustering, Control and Systems Engineering, Computer Science::Computer Vision and Pattern Recognition, Canopy clustering algorithm, FLAME clustering, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, computer, Software, Information Systems
Abstract: In this paper, a manifold-based dictionary learning method for the bag-of-features (BoF) representation optimized toward information clustering is proposed. First, the spectral representation, which unwraps the manifolds of the data and provides better clustering solutions, is formed. Then, a new dictionary is learned in order to make the histogram space, i.e., the space where the BoF historgrams exist, as similar as possible to the spectral space. The ability of the proposed method to improve the clustering solutions is demonstrated using a wide range of datasets: two image datasets, the 15-scene dataset and the Corel image dataset, one video dataset, the KTH dataset, and one text dataset, the RT-2k dataset. The proposed method improves both the internal and the external clustering criteria for two different clustering algorithms: 1) the -means and 2) the spectral clustering. Also, the optimized histogram space can be used to directly assign a new object to its cluster, instead of using the spectral space (which requires reapplying the spectral clustering algorithm or using incremental spectral clustering techniques). Finally, the learned representation is also evaluated using an information retrieval setup and it is demonstrated that improves the retrieval precision over the baseline BoF representation.
Published: 2018

14. Improving the Fuzzy Min–Max neural network performance with an ensemble of clustering trees

Author: Manjeevan Seera, Kuldeep Randhawa, and Chee Peng Lim
Subjects: Clustering high-dimensional data, 0209 industrial biotechnology, Fuzzy clustering, Computer science, Cognitive Neuroscience, Correlation clustering, MathematicsofComputing_NUMERICALANALYSIS, Decision tree, Conceptual clustering, 02 engineering and technology, Machine learning, computer.software_genre, Fuzzy logic, 020901 industrial engineering & automation, Artificial Intelligence, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, k-medians clustering, Artificial neural network, business.industry, Computer Science Applications, Data set, Data stream clustering, Canopy clustering algorithm, FLAME clustering, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, computer
Abstract: In this paper, an ensemble of clustering trees (ECTs) is adopted to improve the performance of the Fuzzy Min–Max (FMM) network with individual clustering trees. The key advantage of combining FMM and ECT together is to formulate an accurate and useful learning model that is able to perform online clustering and to explain its predictions. The online clustering capability is inherited from the FMM hyperboxes, while the explanatory capability arises from the underlying decision trees of ECT. Four different mean measures, namely harmonic, geometric, arithmetic, and root mean square, are incorporated into FMM for computing its hyperbox centroids. A series of benchmark and real-world data sets are used for evaluating the FMM-ECT performance. The results are analyzed and compared with those from other models. The outcomes indicate that FMM-ECT is able to achieve comparable clustering performances, with the advantage of providing explanations of its predictions using a decision tree.
Published: 2018

15. A SOM prototype-based cluster analysis methodology

Author: Jorge Calle-Espinosa, Soledad Delgado, Francisco Montero, Clara Higuera, and Federico Morn
Subjects: Self-organizing map, Clustering high-dimensional data, Fuzzy clustering, Computer science, Correlation clustering, Conceptual clustering, 02 engineering and technology, Machine learning, computer.software_genre, 03 medical and health sciences, 0302 clinical medicine, Artificial Intelligence, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Brown clustering, business.industry, General Engineering, Constrained clustering, Computer Science Applications, Data set, ComputingMethodologies_PATTERNRECOGNITION, Data stream clustering, Canopy clustering algorithm, FLAME clustering, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, computer, 030217 neurology & neurosurgery
Abstract: An original computational approach for cluster analysis is proposed.The method consists of two phases, which are based on Self-Organizing Map.Topology-preserving and connectivity functions are used in the clustering process.The method is proved using three benchmark datasets and a real biological dataset.Automation in parameterization results in a user-friendly methodology. Data clustering is aimed at finding groups of data that share common hidden properties. These kinds of techniques are especially critical at early stages of data analysis where no information about the dataset is available. One of the mayor shortcomings of the clustering algorithms is the difficulty for non-experts users to configure them and, in some cases, interpret the results. In this work a computational approach with a two-layer structure based on Self-Organizing Map (SOM) is presented for cluster analysis. In the first level, a quantization of the data samples using topology-preserving metrics to automatically determine the number of units in the SOM is proposed. In the second level the obtained SOM prototypes are clustered by means of a connectivity analysis to explore the quality of the partitioning with different number of clusters. The most important benefit of this two-layer procedure is that computational load decreases considerably in comparison with data based clustering methods, making it possible to cluster large data sets and to consider several different clustering alternatives in a limited time. This methodology produces a two-dimensional map representation of the, usually, high dimensional input space, along with quantitative information on viable clustering alternatives, which facilitates the exploration of the possible partitions in a dataset. The efficiency and interpretation of the methodology is illustrated by its application to artificial, benchmark and real complex biological datasets. The experimental results demonstrate the ability of the method to identify possible segmentations in a dataset, compared to algorithms that only yield a single clustering solution. The proposed algorithm tackles the intrinsic limitations of SOM and the parameter settings associated with the clustering methodology, without requiring the number of clusters or the SOM architecture as a prerequisite, among others. This way, it makes possible its application even by researchers with a limited expertise in machine learning.
Published: 2017

16. Dynamic Rough-Fuzzy Support Vector Clustering

Author: Ramiro Saltos, Sebastián Maldonado, and Richard Weber
Subjects: Fuzzy clustering, business.industry, Computer science, Applied Mathematics, Correlation clustering, Conceptual clustering, Constrained clustering, 020207 software engineering, 02 engineering and technology, Machine learning, computer.software_genre, Data stream clustering, Computational Theory and Mathematics, Artificial Intelligence, Control and Systems Engineering, CURE data clustering algorithm, 0202 electrical engineering, electronic engineering, information engineering, Canopy clustering algorithm, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, Cluster analysis, computer
Abstract: Clustering is one of the main data mining tasks with many proven techniques and successful real-world applications. However, in changing environments, the existing systems need to be regularly updated in order to describe in the best possible way an observed phenomenon at each point in time. Since changes lead to uncertainty, the respective systems also require an adequate modeling of the involved kinds of uncertainty. This paper presents a novel method for dynamic clustering called dynamic rough-fuzzy support vector clustering (D-RFSVC). Its main idea is to take advantage of the knowledge acquired in previous cycles to speed up model updating while tracking the structural changes that clusters can experience over time. The core method of the proposed approach is the well-known support vector clustering algorithm, which can be used for large datasets employing powerful optimization techniques. The computational experiments, together with a conceptual and numerical comparative study, highlight the potential D-RFSVC has in dynamic environments.
Published: 2017

17. Support Vector Motion Clustering

Author: Davide Anguita, Andrea Cavallaro, Isah A. Lawal, and Fabio Poiesi
Subjects: Clustering high-dimensional data, Fuzzy clustering, business.industry, Correlation clustering, Conceptual clustering, 020207 software engineering, Pattern recognition, 02 engineering and technology, ComputingMethodologies_PATTERNRECOGNITION, CURE data clustering algorithm, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Canopy clustering algorithm, FLAME clustering, 020201 artificial intelligence & image processing, Artificial intelligence, Electrical and Electronic Engineering, Cluster analysis, business, Mathematics
Abstract: We present a closed-loop unsupervised clustering method for motion vectors extracted from highly dynamic video scenes. Motion vectors are assigned to nonconvex homogeneous clusters characterizing direction, size and shape of regions with multiple independent activities. The proposed method is based on support vector clustering. Cluster labels are propagated over time via incremental learning. The proposed method uses a kernel function that maps the input motion vectors into a high-dimensional space to produce nonconvex clusters. We improve the mapping effectiveness by quantifying feature similarities via a blend of position and orientation affinities. We use the Quasiconformal Kernel Transformation to boost the discrimination of outliers. The temporal propagation of the clusters’ identities is achieved via incremental learning based on the concept of feature obsolescence to deal with appearing and disappearing features. Moreover, we design an online clustering performance prediction algorithm used as a feedback that refines the cluster model at each frame in an unsupervised manner. We evaluate the proposed method on synthetic data sets and real-world crowded videos and show that our solution outperforms state-of-the-art approaches.
Published: 2017

18. Automatic clustering constraints derivation from object-oriented software using weighted complex network with graph theory analysis

Author: Chun Yong Chong and Sai Peck Lee
Subjects: Computer science, Conceptual clustering, 02 engineering and technology, computer.software_genre, Machine learning, Software, Software sizing, 0202 electrical engineering, electronic engineering, information engineering, Domain analysis, Software system, Cluster analysis, business.industry, Constrained clustering, 020207 software engineering, Software metric, ComputingMethodologies_PATTERNRECOGNITION, Hardware and Architecture, Software design, Domain engineering, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, business, computer, Information Systems
Abstract: Constrained clustering or semi-supervised clustering has received a lot of attention due to its flexibility of incorporating minimal supervision of domain experts or side information to help improve clustering results of classic unsupervised clustering techniques. In the domain of software remodularisation, classic unsupervised software clustering techniques have proven to be useful to aid in recovering a high-level abstraction of the software design of poorly documented or designed software systems. However, there is a lack of work that integrates constrained clustering for the same purpose to help improve the modularity of software systems. Nevertheless, due to time and budget constraints, it is laborious and unrealistic for domain experts who have prior knowledge about the software to review each and every software artifact and provide supervision on an on-demand basis. We aim to fill this research gap by proposing an automated approach to derive clustering constraints from the implicit structure of software system based on graph theory analysis of the analysed software. Evaluations conducted on 40 open-source object-oriented software systems show that the proposed approach can serve as an alternative solution to derive clustering constraints in situations where domain experts are non-existent, thus helping to improve the overall accuracy of clustering results.
Published: 2017

19. Electricity clustering framework for automatic classification of customer loads

Author: Antonio García, Juan I. Guerrero, Carlos León, Iñigo Monedero, and Félix Biscarri
Subjects: Fuzzy clustering, Computer science, business.industry, 020209 energy, Correlation clustering, General Engineering, Conceptual clustering, 02 engineering and technology, computer.software_genre, Machine learning, Computer Science Applications, ComputingMethodologies_PATTERNRECOGNITION, Data stream clustering, Artificial Intelligence, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Canopy clustering algorithm, Artificial intelligence, Data mining, business, Cluster analysis, computer
Abstract: Clustering in energy markets is a top topic with high significance on expert and intelligent systems. The main impact of is paper is the proposal of a new clustering framework for the automatic classification of electricity customers’ loads. An automatic selection of the clustering classification algorithm is also highlighted. Finally, new customers can be assigned to a predefined set of clusters in the classification phase. The computation time of the proposed framework is less than that of previous classification techniques, which enables the processing of a complete electric company sample in a matter of minutes on a personal computer. The high accuracy of the predicted classification results verifies the performance of the clustering technique. This classification phase is of significant assistance in interpreting the results, and the simplicity of the clustering phase is sufficient to demonstrate the quality of the complete mining framework.
Published: 2017

20. Active learning through density clustering

Author: Yan-Xue Wu, Min Wang, Zhi-Heng Zhang, and Fan Min
Subjects: Clustering high-dimensional data, 0209 industrial biotechnology, Fuzzy clustering, Computer science, Correlation clustering, Single-linkage clustering, Conceptual clustering, 02 engineering and technology, Semi-supervised learning, Machine learning, computer.software_genre, 020901 industrial engineering & automation, Artificial Intelligence, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Instance-based learning, Instance selection, Cluster analysis, k-medians clustering, business.industry, General Engineering, Constrained clustering, Pattern recognition, Computer Science Applications, ComputingMethodologies_PATTERNRECOGNITION, Data stream clustering, Canopy clustering algorithm, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Active learning is used for classification when labeling data are costly, while the main challenge is to identify the critical instances that should be labeled. Clustering-based approaches take advantage of the structure of the data to select representative instances. In this paper, we developed the active learning through density peak clustering (ALEC) algorithm with three new features. First, a master tree was built to express the relationships among the nodes and assist the growth of the cluster tree. Second, a deterministic instance selection strategy was designed using a new importance measure. Third, tri-partitioning was employed to determine the action to be taken on each instance during iterative clustering, labeling, and classifying. Experiments were performed with 14 datasets to compare against state-of-the-art active learning algorithms. Results demonstrated that the new algorithm had higher classification accuracy using the same number of labeled data.
Published: 2017

21. Subspace multi-clustering: a review

Author: Jian Pei and Juhua Hu
Subjects: Clustering high-dimensional data, Fuzzy clustering, Computer science, Correlation clustering, Conceptual clustering, 02 engineering and technology, Machine learning, computer.software_genre, Artificial Intelligence, CURE data clustering algorithm, 020204 information systems, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, business.industry, Constrained clustering, Human-Computer Interaction, ComputingMethodologies_PATTERNRECOGNITION, Hardware and Architecture, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, business, computer, Software, Information Systems
Abstract: Clustering has been widely used to identify possible structures in data and help users to understand data in an unsupervised manner. Traditional clustering methods often provide a single partitioning of the data that groups similar data objects in one group while separates dissimilar ones into different groups. However, it has been well recognized that assuming only a single clustering for a data set can be too strict and cannot capture the diversity in applications. Multiple clustering structures can be hidden in the data, and each represents a unique perspective of the data. Different multi-clustering methods, which aim to discover multiple independent structures hidden in data, have been proposed in the recent decade. Although multi-clustering methods may provide more information for users, it is still challenging for users to efficiently and effectively understand each clustering structure. Subspace multi-clustering methods address this challenge by providing each clustering a feature subspace. Moreover, most subspace multi-clustering methods are especially scalable for high-dimensional data, which has become more and more popular in real applications due to the advances of big data technologies. In this paper, we focus on the subject of subspace multi-clustering, which has not been reviewed by any previous survey. We formulate the subspace multi-clustering problem and categorize the methodologies in different perspectives (e.g., de-coupled methods and coupled methods). We compare different methods on a series of specific properties (e.g., input parameters and different kinds of subspaces) and analyze the advantages and disadvantages. We also discuss several interesting and meaningful future directions.
Published: 2017

22. Improving medication adherence in hypertensive patients: A scoping review

Author: André Ramalho, Rute Sampaio, Filipa Ferreira, Simão Pinho, and Mariana Cruz
Subjects: medicine.medical_specialty, Databases, Factual, Web of science, Epidemiology, business.industry, 010102 general mathematics, Public Health, Environmental and Occupational Health, Psychological intervention, Conceptual clustering, MEDLINE, Medication adherence, Context (language use), 01 natural sciences, Medication Adherence, 03 medical and health sciences, 0302 clinical medicine, Categorization, Intervention (counseling), Hypertension, medicine, Humans, 030212 general & internal medicine, 0101 mathematics, Intensive care medicine, business
Abstract: In recent years, interest in medication adherence has greatly increased. Adherence has been particularly well studied in the context of arterial hypertension treatment. Numerous interventions have addressed this issue, however, the effort to improve adherence has been often frustrating and frequently disorganized. The aim of present study was to perform a scoping review of medication adherence interventions in hypertensive patients, so that a clear overview was achieved. Moreover, an evidence-based categorization of interventions was developed. The review was performed according to the PRISMA-ScR statement. MEDLINE and Web of Science were searched, and studies published from database inception until August 17, 2020 were included. A total of 2994 non-duplicate studies were retrieved. After screening and eligibility phases, a total of 45 articles were included. Studies were analyzed regarding their design, participant characteristics and management of adherence strategies employed. Furthermore, medication adherence and blood pressure outcomes, as well as adherence measuring tools were evaluated. Each study's intervention was then categorized using a novel evidence-based system of categorization, derived from the conceptual clustering framework used in machine learning. This work is an important step in pushing for better informed and more efficient future research efforts, both by providing an overview of the research field and by creating a new, evidence-based intervention categorization tool. It also provides valuable information to clinicians about medication adherence to antihypertensive therapy.
Published: 2021

23. Constructive and Clustering Methods

Author: Sayyid Samir Al-Busaidi, Medhat Awadalla, and Afaq Ahmad
Subjects: business.industry, Computer science, Conceptual clustering, Pattern recognition, Artificial intelligence, Cluster analysis, business, Gray (horse), Encoder
Published: 2017

24. Approach to Clustering with Variance-Based XCS

Author: Takato Tatsumi, Keiki Takadama, Masaya Nakata, and Caili Zhang
Subjects: TheoryofComputation_COMPUTATIONBYABSTRACTDEVICES, Computer science, Conceptual clustering, Multi-task learning, 02 engineering and technology, Semi-supervised learning, Machine learning, computer.software_genre, ComputingMethodologies_ARTIFICIALINTELLIGENCE, 050105 experimental psychology, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, Cluster analysis, Learning classifier system, business.industry, 05 social sciences, Variance (accounting), Human-Computer Interaction, ComputingMethodologies_PATTERNRECOGNITION, Margin classifier, Unsupervised learning, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer
Abstract: This paper presents an approach to clustering that extends the variance-based Learning Classifier System (XCS-VR). In real world problems, the ability to combine similar rules is crucial in the knowledge discovery and data mining field. Conventionally, XCS-VR is able to acquire generalized rules, but it cannot further acquire more generalized rules from these rules. The proposed approach (called XCS-VRc) accomplishes this by integrating similar generalized rules. To validate the proposed approach, we designed a bench-mark problem to examine whether XCS-VRc can cluster both the generalized and more generalized features in the input data. The proposed XCS-VRc proved to be more efficient than XCS and the conventional XCS-VR.
Published: 2017

25. Discrete Nonnegative Spectral Clustering

Author: Heng Tao Shen, Yang Yang, Xuelong Li, Fumin Shen, and Zi Huang
Subjects: Clustering high-dimensional data, Fuzzy clustering, Optimization problem, Computer science, Correlation clustering, Conceptual clustering, 02 engineering and technology, Matrix decomposition, Discriminative model, CURE data clustering algorithm, Robustness (computer science), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Brown clustering, business.industry, Constrained clustering, Pattern recognition, Spectral clustering, Computer Science Applications, Data set, Data stream clustering, Computational Theory and Mathematics, Canopy clustering algorithm, 020201 artificial intelligence & image processing, Artificial intelligence, business, Information Systems
Abstract: © 1989-2012 IEEE. Spectral clustering has been playing a vital role in various research areas. Most traditional spectral clustering algorithms comprise two independent stages (e.g., first learning continuous labels and then rounding the learned labels into discrete ones), which may cause unpredictable deviation of resultant cluster labels from genuine ones, thereby leading to severe information loss and performance degradation. In this work, we study how to achieve discrete clustering as well as reliably generalize to unseen data. We propose a novel spectral clustering scheme which deeply explores cluster label properties, including discreteness, nonnegativity, and discrimination, as well as learns robust out-of-sample prediction functions. Specifically, we explicitly enforce a discrete transformation on the intermediate continuous labels, which leads to a tractable optimization problem with a discrete solution. Besides, we preserve the natural nonnegative characteristic of the clustering labels to enhance the interpretability of the results. Moreover, to further compensate the unreliability of the learned clustering labels, we integrate an adaptive robust module with ℓ 2,p loss to learn prediction function for grouping unseen data. We also show that the out-of-sample component can inject discriminative knowledge into the learning of cluster labels under certain conditions. Extensive experiments conducted on various data sets have demonstrated the superiority of our proposal as compared to several existing clustering approaches.
Published: 2017

26. Multi-task clustering through instances transfer

Author: Xinyue Liu, Han Liu, Xiaotong Zhang, and Xianchao Zhang
Subjects: DBSCAN, Clustering high-dimensional data, Fuzzy clustering, Computer science, Cognitive Neuroscience, Correlation clustering, Conceptual clustering, Multi-task learning, 02 engineering and technology, 010501 environmental sciences, computer.software_genre, Machine learning, 01 natural sciences, Biclustering, Text mining, Artificial Intelligence, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, 0105 earth and related environmental sciences, Brown clustering, business.industry, Constrained clustering, Computer Science Applications, Hierarchical clustering, Data set, ComputingMethodologies_PATTERNRECOGNITION, Data stream clustering, Canopy clustering algorithm, Affinity propagation, FLAME clustering, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, business, computer, Subspace topology
Abstract: We propose a multi-task clustering method by transferring knowledge of instances.The sample distance in different tasks is reweighted by learning a shared subspace.Related samples from other tasks are reused as auxiliary data to aid clustering.Our method maintains the label marginal distribution of each individual task.Better performance is observed compared with other multi-task clustering methods. Clustering is an essential issue in machine learning and data mining. As there are many related tasks in the real world, multi-task clustering, which improves the clustering performance of each task by transferring knowledge across the related tasks, receives increasing attention recently. Generally knowledge transfer can be accomplished in different ways. Nevertheless, besides transferring knowledge of feature representations, other knowledge transfer ways have seldom been adopted for multi-task clustering. In this paper, we propose a general multi-task clustering algorithm by transferring knowledge of instances. Our algorithm reweights the distance between samples in different tasks by learning a shared subspace, then selects the nearest neighbors for each sample from the other tasks in the learned shared subspace as the auxiliary data to aid the clustering process of each individual task. Experiments on real data sets in text mining and image mining demonstrate that our proposed algorithm outperforms the traditional single-task clustering methods and existing cross-domain multi-task clustering methods.
Published: 2017

27. A mutual information based online evolving clustering approach and its applications

Author: Dimitar Petrov Filev, Ratna Babu Chinnam, and Fling Tseng
Subjects: Clustering high-dimensional data, Control and Optimization, Fuzzy clustering, Computer science, business.industry, Correlation clustering, Conceptual clustering, 02 engineering and technology, Machine learning, computer.software_genre, Computer Science Applications, Data stream clustering, Control and Systems Engineering, CURE data clustering algorithm, 020204 information systems, Modeling and Simulation, 0202 electrical engineering, electronic engineering, information engineering, Canopy clustering algorithm, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, Cluster analysis, business, computer
Abstract: In this article, a new recursive evolving clustering method is proposed based on the well-known Gustafson–Kessel algorithm. The novelty of the proposed method involves the adaptation and integration of the mutual information based formulation to accommodate the Mahalanobis distance, which functions as the similarity measure and the unification of the clustering generation and pruning mechanisms. Example applications of the method are also discussed in the areas of data compression and knowledge extraction.
Published: 2017

28. Reverse clustering: an outline for a concept and its use

Author: Jan W. Owsiński, Sławomir Zadrożny, Jarosław Stańczak, Karol Opara, and Janusz Kacprzyk
Subjects: Fuzzy clustering, business.industry, Computer science, Health, Toxicology and Mutagenesis, Correlation clustering, Constrained clustering, Conceptual clustering, Evolutionary algorithm, 02 engineering and technology, Machine learning, computer.software_genre, 01 natural sciences, Pollution, Rendering (computer graphics), 010104 statistics & probability, Consensus clustering, Statistics, 0202 electrical engineering, electronic engineering, information engineering, Environmental Chemistry, 020201 artificial intelligence & image processing, Artificial intelligence, 0101 mathematics, business, Cluster analysis, computer
Abstract: In this study, a new perspective on the application of the clustering approach is proposed. The perspective aims to identify the values of the parameters of clustering, including the choice of the algorithm itself, which lead to a possibly faithful rendering of a partition of data, which is known a priori. Motivation and possible interpretations are discussed which can be associated with such a reverse identification process. The essential motivation is associated, but not limited, to the primary objective of cluster analysis, i.e. gaining insight into the structure of the given data-set or family of data-sets. We propose to use evolutionary strategies for reverse analysis to be carried out in view of the characteristics of the problem considered. The concept and the feasibility of the proposed computational approach are illustrated by the analysis of an exemplary data-set. The preliminary results obtained are promising in both technical and cognitive terms.
Published: 2017

29. A novel clustering algorithm based on data transformation approaches

Author: Hedieh Sajedi, Rasool Azimi, M. Ghayekhloo, and M. Ghofrani
Subjects: DBSCAN, Clustering high-dimensional data, Fuzzy clustering, Computer science, Correlation clustering, Conceptual clustering, Initialization, 02 engineering and technology, computer.software_genre, Biclustering, Artificial Intelligence, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, k-medians clustering, Brown clustering, Artificial neural network, business.industry, General Engineering, Constrained clustering, 020206 networking & telecommunications, Pattern recognition, Computer Science Applications, Determining the number of clusters in a data set, Data set, Data stream clustering, Principal component analysis, Canopy clustering algorithm, Affinity propagation, FLAME clustering, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, computer
Abstract: A new initialization technique is proposed to improve the performance of K-means.A data transformation approach is proposed to solve empty cluster problem.An efficient method is proposed to estimate the optimal number of clusters.Proposed clustering method provides more accurate clustering results. Clustering provides a knowledge acquisition method for intelligent systems. This paper proposes a novel data-clustering algorithm, by combining a new initialization technique, K-means algorithm and a new gradual data transformation approach to provide more accurate clustering results than the K-means algorithm and its variants by increasing the clusters coherence. The proposed data transformation approach solves the problem of generating empty clusters, which frequently occurs for other clustering algorithms. An efficient method based on the principal component transformation and a modified silhouette algorithm is also proposed in this paper to determine the number of clusters. Several different data sets are used to evaluate the efficacy of the proposed method to deal with the empty cluster generation problem and its accuracy and computational performance in comparison with other K-means based initialization techniques and clustering methods. The developed estimation method for determining the number of clusters is also evaluated and compared with other estimation algorithms. Significances of the proposed method include addressing the limitations of the K-means based clustering and improving the accuracy of clustering as an important method in the field of data mining and expert systems. Application of the proposed method for the knowledge acquisition in time series data such as wind, solar, electric load and stock market provides a pre-processing tool to select the most appropriate data to feed in neural networks or other estimators in use for forecasting such time series. In addition, utilization of the knowledge discovered by the proposed K-means clustering to develop rule based expert systems is one of the main impacts of the proposed method.
Published: 2017

30. Predicting Student Performance Based On Clustering And Classification

Author: Purva Naik, Rubana Shaikh, Odelia Diukar, Saylee Dessai, and Prof. Snehal Bhogan Project Guide]
Subjects: Computer science, business.industry, 05 social sciences, Conceptual clustering, 050301 education, Machine learning, computer.software_genre, 0502 economics and business, 050211 marketing, Artificial intelligence, Cluster analysis, business, 0503 education, computer
Published: 2017

31. MIFuzzy clustering for incomplete longitudinal data in smart health

Author: Hua Fang
Subjects: Soft computing, Engineering, 030505 public health, Fuzzy clustering, business.industry, Conceptual clustering, Medicine (miscellaneous), Health Informatics, Missing data, computer.software_genre, Article, Computer Science Applications, 03 medical and health sciences, 0302 clinical medicine, Health Information Management, CURE data clustering algorithm, Unsupervised learning, Observational study, 030212 general & internal medicine, Data mining, 0305 other medical science, business, Cluster analysis, computer, Information Systems
Abstract: Missing data are common in longitudinal observational and randomized controlled trials in smart health studies. Multiple-imputation based fuzzy clustering is an emerging non-parametric soft computing method, used for either semi-supervised or unsupervised learning. Multiple imputation (MI) has been widely-used in missing data analyses, but has not yet been scrutinized for unsupervised learning methods, although they are important for explaining the heterogeneity of treatment effects. Built upon our previous work on MIfuzzy clustering, this paper introduces the MIFuzzy concepts and performance, theoretically, empirically and numerically demonstrate how MI-based approach can reduce the uncertainty of clustering accuracy in comparison to non- and single-imputation based clustering approach. This paper advances our understanding of the utility and strength of MIFuzzy clustering approach to processing incomplete longitudinal behavioral intervention data.
Published: 2017

32. Subspace clustering guided unsupervised feature selection

Author: Qinghua Hu, Pengfei Zhu, Wangmeng Zuo, Wencheng Zhu, and Changqing Zhang
Subjects: Fuzzy clustering, business.industry, Correlation clustering, Conceptual clustering, Feature selection, Pattern recognition, 02 engineering and technology, Machine learning, computer.software_genre, Spectral clustering, ComputingMethodologies_PATTERNRECOGNITION, Artificial Intelligence, 020204 information systems, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Canopy clustering algorithm, Graph (abstract data type), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Cluster analysis, computer, Software, Mathematics
Abstract: Unsupervised feature selection (UFS) aims to reduce the time complexity and storage burden, improve the generalization ability of learning machines by removing the redundant, irrelevant and noisy features. Due to the lack of training labels, most existing UFS methods generate the pseudo labels by spectral clustering, matrix factorization or dictionary learning, and convert UFS to a supervised problem. The learned clustering labels reflect the data distribution with respect to classes and therefore are vital to the UFS performance. In this paper, we proposed a novel subspace clustering guided unsupervised feature selection (SCUFS) method. The clustering labels of the training samples are learned by representation based subspace clustering, and features that can well preserve the cluster labels are selected. SCUFS can well learn the data distribution in that it uncovers the underlying multi-subspace structure of the data and iteratively learns the similarity matrix and clustering labels. Experimental results on benchmark datasets for unsupervised feature selection show that SCUFS outperforms the state-of-the-art UFS methods. HighlightsA novel subspace clustering guided unsupervised feature selection (SCUFS) model is proposed.SCUFS learns a similarity graph by self-representation of samples and can uncover the underlying multi-subspace structure of data.The iterative updating of similarity graph and pseudo label matrix can learn a more accurate data distribution.
Published: 2017

33. Optimal Decision Tree Based Unsupervised Learning Method for Data Clustering

Author: Babu Mukkala, Nagarjuna Seelam, and Sai Seelam
Subjects: Incremental decision tree, General Computer Science, business.industry, Computer science, 020209 energy, Decision tree learning, Correlation clustering, General Engineering, ID3 algorithm, Conceptual clustering, 02 engineering and technology, Semi-supervised learning, computer.software_genre, Machine learning, 0202 electrical engineering, electronic engineering, information engineering, Unsupervised learning, Data mining, Artificial intelligence, business, Cluster analysis, computer
Published: 2017

34. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

Author: Laith Abualigah and Ahamad Tajudin Khader
Subjects: Clustering high-dimensional data, Fuzzy clustering, Computer science, Correlation clustering, Conceptual clustering, Feature selection, 02 engineering and technology, computer.software_genre, Theoretical Computer Science, CURE data clustering algorithm, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Brown clustering, business.industry, Particle swarm optimization, 020207 software engineering, Pattern recognition, Document clustering, Hybrid algorithm, Data stream clustering, Hardware and Architecture, Canopy clustering algorithm, Unsupervised learning, Affinity propagation, FLAME clustering, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, business, computer, Algorithm, Software, Information Systems
Abstract: The text clustering technique is an appropriate method used to partition a huge amount of text documents into groups. The documents size affects the text clustering by decreasing its performance. Subsequently, text documents contain sparse and uninformative features, which reduce the performance of the underlying text clustering algorithm and increase the computational time. Feature selection is a fundamental unsupervised learning technique used to select a new subset of informative text features to improve the performance of the text clustering and reduce the computational time. This paper proposes a hybrid of particle swarm optimization algorithm with genetic operators for the feature selection problem. The k-means clustering is used to evaluate the effectiveness of the obtained features subsets. The experiments were conducted using eight common text datasets with variant characteristics. The results show that the proposed algorithm hybrid algorithm (H-FSPSOTC) improved the performance of the clustering algorithm by generating a new subset of more informative features. The proposed algorithm is compared with the other comparative algorithms published in the literature. Finally, the feature selection technique encourages the clustering algorithm to obtain accurate clusters.
Published: 2017

35. A semi-supervised probabilistic model for clustering large databases of complex images

Author: Ankush Mittal, Durgaprasad Gangodkar, and S. Nisha Chandran
Subjects: Clustering high-dimensional data, Fuzzy clustering, Computer Networks and Communications, Computer science, Feature vector, Single-linkage clustering, Correlation clustering, Conceptual clustering, 02 engineering and technology, Content-based image retrieval, computer.software_genre, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Cluster analysis, k-medians clustering, Brown clustering, Database, business.industry, Constrained clustering, 020207 software engineering, Pattern recognition, Mixture model, Hierarchical clustering, Determining the number of clusters in a data set, ComputingMethodologies_PATTERNRECOGNITION, Data stream clustering, Hardware and Architecture, Metric (mathematics), Canopy clustering algorithm, FLAME clustering, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, computer, Software, Semantic gap
Abstract: Image content clustering is an effective way to organize large databases thereby making the content based image retrieval process much easier. However, clustering of images with varied background and foreground is quite challenging. In this paper, we propose a novel image content clustering paradigm suitable for clustering large and diverse image databases. In our approach images are represented in a continuous domain based on a probabilistic Gaussian Mixture Model (GMM) with the images modeled as mixture of Gaussian distributions in the selected feature space. The distance metric between the Gaussian distributions is defined in the sense of Kullback–Leibler (KL) divergence. The clustering is done using a semi-supervised learning framework where labeled data in the form of cluster templates is used to classify the unlabelled data. The clusters are formed around initially chosen seeds and are updated in the due course based on user inputs. In our clustering approach the user interaction is done in a structured way as to get maximum inputs from the user in a limited time. We propose two methods to carry out the structured user interaction using which the cluster templates are updated to improve the quality of the clusters formed. The proposed method is experimentally evaluated on benchmark datasets that are specifically chosen to include a wide variation of images around a common theme that is typically encountered in applications like photo-summarization and poses a major semantic gap challenge to conventional clustering approaches. The experimental results presented demonstrate the effectiveness of the proposed approach.
Published: 2017

36. Implementation of Hierarchical Clustering for Improved Classification of Incomplete Pattern

Author: Kartik S. Thakre
Subjects: Brown clustering, Computer science, business.industry, Correlation clustering, Conceptual clustering, Pattern recognition, 02 engineering and technology, Hierarchical clustering, CURE data clustering algorithm, 020204 information systems, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Canopy clustering algorithm, 020201 artificial intelligence & image processing, Artificial intelligence, Cluster analysis, business
Published: 2017

37. UALM: Unsupervised Active Learning Method for clustering low-dimensional data

Author: Saeed Bagheri Shouraki and Mohammad Javadian
Subjects: Statistics and Probability, Active learning (machine learning), business.industry, Computer science, General Engineering, Conceptual clustering, 02 engineering and technology, Semi-supervised learning, Machine learning, computer.software_genre, Artificial Intelligence, 020204 information systems, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Unsupervised learning, 020201 artificial intelligence & image processing, Artificial intelligence, business, Cluster analysis, computer
Published: 2017

38. Modified Active Learning for Document Level Clustering

Author: Snehal Patil and Jayant Jadhav
Subjects: Document level, Computer science, Active learning (machine learning), business.industry, Consensus clustering, Conceptual clustering, Artificial intelligence, Document clustering, Cluster analysis, Machine learning, computer.software_genre, business, computer
Published: 2017

39. An Improved Clustering Method for Detection System of Public Security Events Based on Genetic Algorithm and Semisupervised Learning

Author: Heng Wang, Zhenzhen Zhao, Zhiwei Guo, Zhenfeng Wang, and Guangyin Xu
Subjects: Clustering high-dimensional data, 0209 industrial biotechnology, Fuzzy clustering, Article Subject, General Computer Science, Computer science, Population-based incremental learning, Correlation clustering, Conceptual clustering, 02 engineering and technology, Machine learning, computer.software_genre, Fuzzy logic, lcsh:QA75.5-76.95, 020901 industrial engineering & automation, CURE data clustering algorithm, Genetic algorithm, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Multidisciplinary, business.industry, Constrained clustering, Data stream clustering, Canopy clustering algorithm, FLAME clustering, 020201 artificial intelligence & image processing, lcsh:Electronic computers. Computer science, Data mining, Artificial intelligence, business, computer
Abstract: The occurrence of series of events is always associated with the news report, social network, and Internet media. In this paper, a detecting system for public security events is designed, which carries out clustering operation to cluster relevant text data, in order to benefit relevant departments by evaluation and handling. Firstly, texts are mapped into three-dimensional space using the vector space model. Then, to overcome the shortcoming of the traditional clustering algorithm, an improved fuzzy c-means (FCM) algorithm based on adaptive genetic algorithm and semisupervised learning is proposed. In the proposed algorithm, adaptive genetic algorithm is employed to select optimal initial clustering centers. Meanwhile, motivated by semisupervised learning, guiding effect of prior knowledge is used to accelerate iterative process. Finally, simulation experiments are conducted from two aspects of qualitative analysis and quantitative analysis, which demonstrate that the proposed algorithm performs excellently in improving clustering centers, clustering results, and consuming time.
Published: 2017

40. Featured Based Pattern Analysis using Machine Learning and Artificial Intelligence Techniques for Multiple Featured Dataset

Author: Annaram Soujanya, O. Subhash Chander Goud, G. Prabhakar Reddy, and K. Sai Prasad
Subjects: Computer science, business.industry, 020208 electrical & electronic engineering, Correlation clustering, Single-linkage clustering, Data classification, Conceptual clustering, Pattern recognition, 02 engineering and technology, Machine learning, computer.software_genre, Data stream clustering, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, Data mining, Cluster analysis, business, computer
Abstract: Data mining is a process of extracting patterns from a large datasets. We are trying to uncover the data bonded features which are hard to visualize and if many feature exists for the data then it becomes difficult to analyze the data. The main aim of the scheme of work is to classify the features and over which try to classify the data sets. Machine learning is one of the techniques of Artificial Intelligence which is used for extracting valuable knowledge from large data base. Machine Learning is also used for extracting patterns, models in data. In this paper we are trying to group the data based on multi-dimensional feature classification. Clustering process makes the similar features to form into one group and or else multiple groups, here in we try to group the features which are similar and form multiple groups. The US schooling data is in the form of flat files. Classification process is performed on the raw data. Classification is performed according to hierarchical clustering. Filtration process is used in order to obtain non-zero values. The saturation points are generated by performing clustering. Based on the clusters obtained the patterns can be extracted. Attribute based classification and hierarchical clustering is performed on the data. The attributes obtained are named as income and expenses. By combining the income and expenses attributes patterns can be identified. By using some combination of the two attributes some patterns have been obtain. By performing clustering on all the combinations of each attribute we can identify the patterns. Then according to this process we find the patterns generated by the feature classification or clustering process against the data classification or clustering. This new methodology tries to give the co-relation between the data and properties of a data set and how they behave.
Published: 2017

41. Semi-Supervised Clustering Algorithms for Grouping Scientific Articles

Author: Diego Vallejo-Huanga, Paulina Morillo, and Cèsar Ferri
Subjects: 0301 basic medicine, Clustering high-dimensional data, DBSCAN, Fuzzy clustering, Computer science, Correlation clustering, Conceptual clustering, 02 engineering and technology, Machine learning, computer.software_genre, Biclustering, 03 medical and health sciences, CURE data clustering algorithm, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, General Environmental Science, Brown clustering, k-medoids, business.industry, Constrained clustering, ComputingMethodologies_PATTERNRECOGNITION, 030104 developmental biology, Data stream clustering, Canopy clustering algorithm, General Earth and Planetary Sciences, FLAME clustering, 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, business, computer, Algorithm
Abstract: Creating sessions in scientific conferences consists in grouping papers with common topics taking into account the size restrictions imposed by the conference schedule. Therefore, this problem can be considered as semi-supervised clustering of documents based on their content. This paper aims to propose modifications in traditional clustering algorithms to incorporate size constraints in each cluster. Specifically, two new algorithms are proposed to semi-supervised clustering, based on: binary integer linear programming with cannot-link constraints and a variation of the K-Medoids algorithm, respectively. The applicability of the proposed semi-supervised clustering methods is illustrated by addressing the problem of automatic configuration of conference schedules by clustering articles by similarity. We include experiments, applying the new techniques, over real conferences datasets: ICMLA-2014, AAAI-2013 and AAAI-2014. The results of these experiments show that the new methods are able to solve practical and real problems.
Published: 2017

42. Comparing clustering models in bank customers: Based on Fuzzy relational clustering approach

Author: Mohsen Gheitasi, Jafar Razmi, and Ayad Hendalianpour
Subjects: C-mean, Fuzzy clustering, Correlation clustering, Conceptual clustering, Pharmaceutical Science, computer.software_genre, Machine learning, Kernel K-mean, lcsh:Accounting. Bookkeeping, Accounting, Consensus clustering, Fuzzy variables, Cluster analysis, Mathematics, Fuzzy relation clustering (FRC), Brown clustering, business.industry, k-means clustering, lcsh:HF5601-5689, K-mean, Fuzzy C-mean, ComputingMethodologies_PATTERNRECOGNITION, FLAME clustering, Data mining, Artificial intelligence, business, computer
Abstract: Article history: Received December 5, 2015 Received in revised format February 16 2016 Accepted August 15 2016 Available online August 16 2016 Clustering is absolutely useful information to explore data structures and has been employed in many places. It organizes a set of objects into similar groups called clusters, and the objects within one cluster are both highly similar and dissimilar with the objects in other clusters. The K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms are the most popular clustering algorithms for their easy implementation and fast work, but in some cases we cannot use these algorithms. Regarding this, in this paper, a hybrid model for customer clustering is presented that is applicable in five banks of Fars Province, Shiraz, Iran. In this way, the fuzzy relation among customers is defined by using their features described in linguistic and quantitative variables. As follows, the customers of banks are grouped according to K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms and the proposed Fuzzy Relation Clustering (FRC) algorithm. The aim of this paper is to show how to choose the best clustering algorithms based on density-based clustering and present a new clustering algorithm for both crisp and fuzzy variables. Finally, we apply the proposed approach to five datasets of customer's segmentation in banks. The result of the FCR shows the accuracy and high performance of FRC compared other clustering methods. Growing Science Ltd. All rights reserved. 7 © 201
Published: 2017

43. A Self-Enforcing Network as a Tool for Clustering and Analyzing Complex Data

Author: Christina Klüver
Subjects: Complex data type, 0209 industrial biotechnology, Fuzzy clustering, Artificial neural network, Computer science, business.industry, Conceptual clustering, 02 engineering and technology, computer.software_genre, Machine learning, Informatik, 020901 industrial engineering & automation, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, General Earth and Planetary Sciences, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, Cluster analysis, business, computer, General Environmental Science
Abstract: The Self-Enforcing Network (SEN), which is a self-organized learning neural network, is introduced as a tool for clustering to define reference types in complex data. In order to achieve this, a cue validity factor is defined, which first steers the clustering of the data. Finding reference types allows the analysis and classification of new data. The results show that a user can influence the clustering of data by sEN, thus allowing the analysis of the data depending on specific interests. The described tool includes concrete examples with real clinical data and shows the potential of such a network for the analysis of complex data.
Published: 2017

44. Experiment on Methods for Clustering and Categorization of Polish Text

Author: Agnieszka Dabrowska-Boruch, Pawel Grzegorz Russek, Ernest Jamro, Marcin Pietron, Maciej Wielgosz, Kazimierz Wiatr, and Rafal Fraczek
Subjects: Scheme (programming language), Computer Networks and Communications, Computer science, Conceptual clustering, computer.software_genre, Machine learning, Polish text, Task (project management), Cluster analysis, tf–idf, computer.programming_language, VSM, business.industry, TF-IDF, categorization, ComputingMethodologies_PATTERNRECOGNITION, Computational Theory and Mathematics, Categorization, Hardware and Architecture, Vector space model, Unsupervised learning, Artificial intelligence, business, computer, Software, Natural language processing, clustering
Abstract: The main goal of this work was to experimentally verify the methods for a challenging task of categorization and clustering Polish text. Supervised and unsupervised learning was employed respectively for the categorization and clustering. A profound examination of the employed methods was done for the custom-built corpus of Polish texts. The corpus was assembled by the authors from Internet resources. The corpus data was acquired from the news portal and, therefore, it was sorted by type by journalists according to their specialization. The presented algorithms employ Vector Space Model (VSM) and TF-IDF (Term Frequency-Inverse Document Frequency) weighing scheme. Series of experiments were conducted that revealed certain properties of algorithms and their accuracy. The accuracy of algorithms was elaborated regarding their ability to match human arrangement of the documents by the topic. For both the categorization and clustering, the authors used F-measure to assess the quality of allocation.
Published: 2017

45. Using Data Mining on Students’ Learning Features: A Clustering Approach for Student Classification

Author: Yuanxing Dong, Jianqi An, Xiaolan Zhou, and Xin Zhao
Subjects: Computer science, business.industry, 05 social sciences, Conceptual clustering, 050301 education, 02 engineering and technology, Machine learning, computer.software_genre, Human-Computer Interaction, Artificial Intelligence, Consensus clustering, ComputingMilieux_COMPUTERSANDEDUCATION, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Student learning, business, Cluster analysis, 0503 education, computer, Ward's method
Abstract: Students have different levels of motivation, approaches to learning, and intellectual levels. The better that instructors understand these differences, the better the chances they have of improving their quality of teaching. To explore differences thoroughly, we focuses on three crucial factors in student learning features – i.e., personality, learning style and multiple intelligences – and propose an approach effective in classifying students for the purpose of instructing instructors while optimizing their teaching process. We collected data on learning features from a class of 58 college students and analyzed these data by using principal component analysis (PCA) and then classified them using Ward clustering. Results of experiments indicate that our proposal effectively classifies students based on their learning features and that classification results facilitate instructors in creating personalized teaching strategies.
Published: 2016

46. Comparing Clustering with Pairwise and Relative Constraints

Author: Teresa Vania Tjahja, Xiaoli Z. Fern, Yuanli Pei, and Romer Rosales
Subjects: Clustering high-dimensional data, Fuzzy clustering, Brown clustering, General Computer Science, business.industry, Correlation clustering, Conceptual clustering, Constrained clustering, 02 engineering and technology, computer.software_genre, Machine learning, CURE data clustering algorithm, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, Cluster analysis, business, computer, Mathematics
Abstract: Clustering can be improved with the help of side information about the similarity relationships among instances. Such information has been commonly represented by two types of constraints: pairwise constraints and relative constraints, regarding similarities about instance pairs and triplets, respectively. Prior work has mostly considered these two types of constraints separately and developed individual algorithms to learn from each type. In practice, however, it is critical to understand/compare the usefulness of the two types of constraints as well as the cost of acquiring them, which has not been studied before. This paper provides an extensive comparison of clustering with these two types of constraints. Specifically, we compare their impacts both on human users that provide such constraints and on the learning system that incorporates such constraints into clustering. In addition, to ensure that the comparison of clustering is performed on equal ground (without the potential bias introduced by different learning algorithms), we propose a probabilistic semi-supervised clustering framework that can learn from either type of constraints. Our experiments demonstrate that the proposed semi-supervised clustering framework is highly effective at utilizing both types of constraints to aid clustering. Our user study provides valuable insights regarding the impact of the constraints on human users, and our experiments on clustering with the human-labeled constraints reveal that relative constraint is often more efficient at improving clustering.
Published: 2016

47. Multi-Task Multi-View Clustering

Author: Xinyue Liu, Xiaotong Zhang, Han Liu, and Xianchao Zhang
Subjects: Clustering high-dimensional data, DBSCAN, Fuzzy clustering, Computer science, Single-linkage clustering, Correlation clustering, Conceptual clustering, 02 engineering and technology, computer.software_genre, Machine learning, Biclustering, CURE data clustering algorithm, 020204 information systems, Consensus clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, k-medians clustering, Brown clustering, business.industry, Constrained clustering, Computer Science Applications, Hierarchical clustering, Data set, Data stream clustering, Computational Theory and Mathematics, Bipartite graph, Canopy clustering algorithm, FLAME clustering, Affinity propagation, 020201 artificial intelligence & image processing, Algorithm design, Data mining, Artificial intelligence, Hierarchical clustering of networks, business, computer, Information Systems
Abstract: Multi-task clustering and multi-view clustering have severally found wide applications and received much attention in recent years. Nevertheless, there are many clustering problems that involve both multi-task clustering and multi-view clustering, i.e., the tasks are closely related and each task can be analyzed from multiple views. In this paper, we introduce a multi-task multi-view clustering framework which integrates within-view-task clustering, multi-view relationship learning, and multi-task relationship learning. Under this framework, we propose two multi-task multi-view clustering algorithms, the bipartite graph based multi-task multi-view clustering algorithm, and the semi-nonnegative matrix tri-factorization based multi-task multi-view clustering algorithm. The former one can deal with the multi-task multi-view clustering of nonnegative data, the latter one is a general multi-task multi-view clustering method, i.e., it can deal with the data with negative feature values. Experimental results on publicly available data sets in web page mining and image mining show the superiority of the proposed multi-task multi-view clustering algorithms over either multi-task clustering algorithms or multi-view clustering algorithms for multi-task clustering of multi-view data.
Published: 2016

48. Visual Focus of Attention Estimation With Unsupervised Incremental Learning

Author: Christophe Garcia, Stefan Duffner, Extraction de Caractéristiques et Identification (imagine), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), and Université de Lyon-Université Lumière - Lyon 2 (UL2)
Subjects: Computer science, ACM: I.: Computing Methodologies/I.4: IMAGE PROCESSING AND COMPUTER VISION/I.4.8: Scene Analysis, pattern clustering, Conceptual clustering, 02 engineering and technology, [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], Unsupervised learning, ACM: I.: Computing Methodologies/I.4: IMAGE PROCESSING AND COMPUTER VISION/I.4.8: Scene Analysis/I.4.8.12: Tracking, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], 0202 electrical engineering, electronic engineering, information engineering, Media Technology, [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO], 0501 psychology and cognitive sciences, Computer vision, image sequence analysis, Electrical and Electronic Engineering, Cluster analysis, Hidden Markov model, Pose, 050107 human factors, business.industry, 05 social sciences, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Pattern recognition, Mixture model, Visualization, [INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV], Face (geometry), 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: International audience; In this paper, we propose a new method for estimating the Visual Focus Of Attention (VFOA) in a video stream captured by a single distant camera and showing several persons sitting around table, like in formal meeting or video-conferencing settings. The visual targets for a given person are automatically extracted on-line using an unsupervised algorithm that incrementally learns the different appearance clusters from low-level visual features computed from face patches provided by a face tracker without the need of an intermediate error-prone step of head-pose estimation as in classical approaches. The clusters learnt in that way can then be used to classify the different visual attention targets of the person during a tracking run, without any prior knowledge on the environment and the configuration of the room or the visible persons. Experiments on public datasets containing almost two hours of annotated videos from meetings and video-conferencing show that the proposed algorithm produces state-of-the-art results and even outperforms a traditional supervised method that is based on head orientation estimation and that classifies visual focus of attention using Gaussian Mixture Models.
Published: 2016

49. Power Quality Analysis Using a Hybrid Model of the Fuzzy Min–Max Neural Network and Clustering Tree

Author: Manjeevan Seera, Chee Peng Lim, Chu Kiong Loo, and Harapajan Singh
Subjects: Self-organizing map, Fuzzy clustering, Computer Networks and Communications, Computer science, 020209 energy, Feature extraction, Correlation clustering, MathematicsofComputing_NUMERICALANALYSIS, Conceptual clustering, 02 engineering and technology, computer.software_genre, Machine learning, Fuzzy logic, Data modeling, Artificial Intelligence, CURE data clustering algorithm, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Artificial neural network, business.industry, Computer Science Applications, Tree (data structure), Canopy clustering algorithm, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, business, computer, Software
Abstract: A hybrid intelligent model comprising a modified fuzzy min-max (FMM) clustering neural network and a modified clustering tree (CT) is developed. A review of clustering models with rule extraction capabilities is presented. The hybrid FMM-CT model is explained. We first use several benchmark problems to illustrate the cluster evolution patterns from the proposed modifications in FMM. Then, we employ a case study with real data related to power quality monitoring to assess the usefulness of FMM-CT. The results are compared with those from other clustering models. More importantly, we extract explanatory rules from FMM-CT to justify its predictions. The empirical findings indicate the usefulness of the proposed model in tackling data clustering and power quality monitoring problems under different environments.
Published: 2016

50. Online Feature Selection Based on Fuzzy Clustering and Its Applications

Author: Thanh Minh Nguyen and Q. M. Jonathan Wu
Subjects: Fuzzy clustering, Computer science, business.industry, Applied Mathematics, Correlation clustering, Constrained clustering, Conceptual clustering, 02 engineering and technology, Machine learning, computer.software_genre, Determining the number of clusters in a data set, ComputingMethodologies_PATTERNRECOGNITION, Data stream clustering, Computational Theory and Mathematics, Artificial Intelligence, Control and Systems Engineering, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, FLAME clustering, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, Cluster analysis, business, computer
Abstract: Fuzzy c-means (FCM) clustering has been successfully applied in various pattern recognition areas. While FCM is gaining attention, an important issue arising from these studies is the need to determine which attributes of the data should be used. Answering this question is difficult, because there is no labeled training data available in clustering to guide the search. We present a feature selection for FCM. The advantage of our method is that it is intuitively appealing, avoiding combinatorial searches, and allowing us to prune the feature set. Our method is also adaptable and can change through complex scenes in an online environment. We do not have to wait until all data have been generated before learning begins. Finally, to estimate the model parameters, the gradient method is adopted to minimize the fuzzy objective function with the Kullback–Leibler divergence information. Numerical experiments are presented to demonstrate the robustness and accuracy of our method.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

1,534 results on '"Conceptual clustering"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources