4,177 results on '"Constrained clustering"'
Search Results
2. Constrained clustering and multiple kernel learning without pairwise constraint relaxation.
- Author
-
Boecking, Benedikt, Jeanselme, Vincent, and Dubrawski, Artur
- Abstract
Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of must-link and cannot-link pairs, arise naturally in many applications and are intuitive for users to provide. However, the common practice of relaxing discrete constraints to a continuous domain to ease optimization when learning kernels or metrics can harm generalization, as information which only encodes linkage is transformed to informing distances. We introduce a new constrained clustering algorithm that jointly clusters data and learns a kernel in accordance with the available pairwise constraints. To generalize well, our method is designed to maximize constraint satisfaction without relaxing pairwise constraints to a continuous domain where they inform distances. We show that the proposed method outperforms existing approaches on a large number of diverse publicly available datasets, and we discuss how our method can scale to handling large data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Dynamic Generation of Subordinate Clusters Based on Bayesian Information Criterion for Must-Link Constrained K-Means.
- Author
-
Shimizu, Shota, Sakayauchi, Shun, Shibata, Hiroki, and Takama, Yasufumi
- Subjects
- *
K-means clustering - Abstract
This paper proposes a constrained K-means clustering method that dynamically generates subordinate clusters based on Bayesian information criterion (BIC). COP K-means, which considers a pairwise constraints in partition-based clustering, have difficulty in handling the case that a must-link is given to instances located far away from each other. To address this problem, the proposed method generates subordinate clusters that have a must-link to a master cluster during a clustering process. The final clustering result is obtained by merging the subordinate clusters. The proposed method determines whether to generate subordinate clusters or not based on the BIC. This paper also introduces an idea of mitigating the sensitivity to initial position of subordinate clusters. The effectiveness of the proposed methods is shown through the experiment with two synthetic datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. A k-Means Algorithm for Clustering with Soft Must-link and Cannot-link Constraints
- Author
-
Baumann, Philipp and Hochbaum, Dorit
- Subjects
Constrained Clustering ,Must-link and Cannot-link Constraints ,Mixed-binary Linear Programming - Published
- 2022
5. Semi-supervised K-Means Clustering via DC Programming Approach
- Author
-
Gruzdeva, Tatiana V., Ushakov, Anton V., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Khachay, Michael, editor, Kochetov, Yury, editor, Eremeev, Anton, editor, Khamisov, Oleg, editor, Mazalov, Vladimir, editor, and Pardalos, Panos, editor
- Published
- 2023
- Full Text
- View/download PDF
6. Knowledge Integration in Deep Clustering
- Author
-
Nghiem, Nguyen-Viet-Dung, Vrain, Christel, Dao, Thi-Bich-Hanh, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amini, Massih-Reza, editor, Canu, Stéphane, editor, Fischer, Asja, editor, Guns, Tias, editor, Kralj Novak, Petra, editor, and Tsoumakas, Grigorios, editor
- Published
- 2023
- Full Text
- View/download PDF
7. Double-Constrained Consensus Clustering with Application to Online Anti-Counterfeiting.
- Author
-
Carpineto, Claudio and Romano, Giovanni
- Subjects
WEBSITES ,FORGERY ,MARKETING - Abstract
Semi-supervised consensus clustering is a promising strategy to compensate for the subjectivity of clustering and its sensitivity to design factors, with various techniques being recently proposed to integrate domain knowledge and multiple clustering partitions. In this article, we present a new approach that makes double use of domain knowledge, namely to build the initial partitions, as well as to combine them. In particular, we show how to model and integrate must-link and cannot-link constraints into the objective function of a generic consensus clustering ( C C ) framework that maximizes the similarity between the consensus partition and the input partitions, which have, in turn, been enriched with the same constraints. In addition, borrowing from the theory of functional dependencies, the integrated framework exploits the notions of deductive closure and minimal cover to take full advantage of the logical implication between constraints. Using standard UCI benchmarks, we found that the resulting algorithm, termed C C C double-constrained consensus clustering), was more effective than plain C C at combining base-constrained partitions, with an average performance improvement of 5.54%. We then argue that C C C is especially well-suited for profiling counterfeit e-commerce websites, as constraints can be acquired by leveraging specific domain features, and demonstrate its potential for detecting affiliate marketing programs. Taken together, our experiments suggest that C C C makes the process of clustering more robust and able to withstand changes in clustering algorithms, datasets, and features, with a remarkable improvement in average performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. Enhanced Multi-View Subspace Clustering via Twist Tensor Nuclear Norm and Constraint Propagation
- Author
-
Wei Yan, Yu Wang, Mengxin Wang, and Junjie Yang
- Subjects
Multi-view clustering ,low-rank tensor representation ,tensor singular value decomposition (T-SVD) ,constrained clustering ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Multi-view subspace clustering (MVSC) can effectively group multi-view data distributed around several low-dimensional subspaces. Although encouraging results, most existing methods suffer from two typical limitations, resulting in clustering performance degradation. They ignore high-order correlations underlying the multi-view data, leading to degeneration of complementary power; in addition, they rely on much prior knowledge (e.g., pairwise constraints) for clustering enhancement. In this paper, a novel algorithm called Enhanced Multi-view Subspace Clustering (EMVSC) is proposed to address both limitations. EMVSC can effectively exploit high-order correlations and optimally use limited prior knowledge for better clustering performance. Specifically, EMVSC imposes twist tensor nuclear norm on multi-view tensor representation constructed by stacking view-specific self-representations; in addition, EMVSC exploits prior knowledge of pairwise constraints from whole dataset by employing constraint propagation, which propagates limited constraint knowledge from constrained samples to unconstrained samples. To efficiently optimize EMVSC, an extended intact augmented Lagrangian method is derived with good convergence. Experimental results on seven standard multi-view databases demonstrate its efficacy.
- Published
- 2023
- Full Text
- View/download PDF
9. Interpretable Multi-Criteria ABC Analysis Based on Semi-Supervised Clustering and Explainable Artificial Intelligence
- Author
-
Alaa Asim Qaffas, Mohamed Aymen Ben Hajkacem, Chiheb-Eddine Ben Ncir, and Olfa Nasraoui
- Subjects
ABC classification ,inventory classification ,eXplainble Artificial Intelligence (XAI) ,explainable clustering ,SHapley Additive exPlanations (SHAP) ,constrained clustering ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Multi-criteria ABC classification is an effective technique that allows rapid and automatic organization of a growing number of inventory items into classes having different managerial levels. These built classes help decision-makers efficiently control the inventory and optimize the whole supply chain. However, existing ABC classification methods work as black-box processes that produce ABC classes without providing any explanations behind the assignment of the items. Given the multi-criteria nature of the ABC classification problem, managers cannot easily analyze and interpret the item managerial classes. Another problem of existing methods is their inability to follow the Pareto principle which states that items must be Pareto distributed over the ABC classes. To solve these two problems, we propose a semi-supervised explainable approach based on both semi-supervised clustering and explainable artificial intelligence. The semi-supervised technique is used to integrate an intelligent initialization and a constrained clustering process that guides the classification process to lead to Pareto distributed items, whereas explainable artificial intelligence is used to build detailed micro and macro explanations of the inventory classes at the item and the class levels. Application of the proposed approach for the automatic classification of chemical products of a distribution company has shown the effectiveness of the proposed approach in providing accurate, transparent, and well-explained ABC classes.
- Published
- 2023
- Full Text
- View/download PDF
10. Constrained Clustering: General Pairwise and Cardinality Constraints
- Author
-
Adel Bibi, Ali Alqahtani, and Bernard Ghanem
- Subjects
Constrained clustering ,K-means ,pairwise ,cardinality constraints ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In this work, we study constrained clustering, where constraints are utilized to guide the clustering process. In existing works, two categories of constraints have been widely explored, namely pairwise and cardinality constraints. Pairwise constraints enforce the cluster labels of two instances to be the same (must-link constraints) or different (cannot-link constraints). Cardinality constraints encourage cluster sizes to satisfy a user-specified distribution. However, most existing constrained clustering models can only utilize one category of constraints at a time. In this paper, we enforce the above two categories into a unified clustering model starting with the integer program formulation of the standard K-means. As these two categories provide useful information at different levels, utilizing both of them is expected to allow for better clustering performance. However, the optimization is difficult due to the binary and quadratic constraints in the proposed unified formulation. To alleviate this difficulty, we utilize two techniques: equivalently replacing the binary constraints by the intersection of two continuous constraints; the other is transforming the quadratic constraints into bi-linear constraints by introducing extra variables. Then we derive an equivalent continuous reformulation with simple constraints, which can be efficiently solved by Alternating Direction Method of Multipliers (ADMM) algorithm. Extensive experiments on both synthetic and real data demonstrate: 1) when utilizing a single category of constraint, the proposed model is superior to or competitive with state-of-the-art constrained clustering models, and 2) when utilizing both categories of constraints jointly, the proposed model shows better performance than the case of the single category. The experimental results show that the proposed method exploits the constraints to achieve perfect clustering performance with improved clustering to $2-5$ % in classical clustering metrics, e.g., Adjusted Random Index (ARI), Mirkin’s Index (MI), and Huber’s Index (HI), outerperfomring all compared-againts methods across the board. Moreover, we show that our method is robust to initialization.
- Published
- 2023
- Full Text
- View/download PDF
11. 基于支持对挖掘的主动学习行人再识别.
- Author
-
金大鹏 and 李先
- Subjects
- *
BLENDED learning , *ACTIVE learning , *LEARNING strategies , *SUPERVISED learning , *ANNOTATIONS , *ALGORITHMS - Abstract
Supervised-learning based person re-identification require a large amount of manual labeled data, which is not applicable in practical deployment. This paper proposes a support pairs active learning(SPAL) re-identification framework to lower the manual labeling cost for large-scale person re-identification. Specifically, this paper build a kind of unsupervised active learning framework, and in this framework it designs a dual uncertainty selection strategy to iteratively discover support pairs and requires human annotations. Afterwards, it introduces a constrained clustering algorithm to propagate the relationships of labeled support pairs to other unlabeled samples. Moreover, a hybrid learning strategy consisting of an unsupervised contrastive loss and a supervised support pairs loss is proposed to learn the discriminative feature representation. On large-scale person re-identification dataset MSMT17, compared with the state-of-the-art method, the labeling cost of the proposed method is reduced by 64%, mAP and Rank1 are increased by 11.0% and 14.9% respectively. Extensive experiments demonstrate that it can effectively lower the labeling cost and is superior to state-of-the-art unsupervised active learning person re-identification methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. A Novel Semi-supervised Clustering Algorithm: CoExDBSCAN
- Author
-
Ertl, Benjamin, Schneider, Matthias, Meyer, Jörg, Streit, Achim, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Fred, Ana, editor, Aveiro, David, editor, Dietz, Jan, editor, Salgado, Ana, editor, and Bernardino, Jorge, editor
- Published
- 2022
- Full Text
- View/download PDF
13. Decomposition-Based Job-Shop Scheduling with Constrained Clustering
- Author
-
El-Kholany, Mohammed M. S., Schekotihin, Konstantin, Gebser, Martin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cheney, James, editor, and Perri, Simona, editor
- Published
- 2022
- Full Text
- View/download PDF
14. Combining Active Semi-supervised Learning and Rare Category Detection
- Author
-
Loveland, Rohan, Kaplan, Noah, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Troiano, Luigi, editor, Vaccaro, Alfredo, editor, Tagliaferri, Roberto, editor, Kesswani, Nishtha, editor, Díaz Rodriguez, Irene, editor, Brigui, Imene, editor, and Parente, Domenico, editor
- Published
- 2022
- Full Text
- View/download PDF
15. Global optimization for cardinality-constrained minimum sum-of-squares clustering via semidefinite programming
- Author
-
Piccialli, Veronica and Sudoso, Antonio M.
- Published
- 2023
- Full Text
- View/download PDF
16. Hierarchical Clustering with Contiguity Constraint in R
- Author
-
Guillaume Guénard and Pierre Legendre
- Subjects
r language ,hclust ,constrained clustering ,space ,chronological clustering ,lance & williams algorithm ,Statistics ,HA1-4737 - Abstract
This article presents a new implementation of hierarchical clustering for the R language that allows one to apply spatial or temporal contiguity constraints during the clustering process. The need for contiguity constraint arises, for instance, when one wants to partition a map into different domains of similar physical conditions, identify discontinuities in time series, group regional administrative units with respect to their performance, and so on. To increase computation efficiency, we programmed the core functions in plain C. The result is a new R function, constr.hclust, which is distributed in package adespatial. The program implements the general agglomerative hierarchical clustering algorithm described by Lance and Williams (1966; 1967), with the particularity of allowing only clusters that are contiguous in geographic space or along time to fuse at any given step. Contiguity can be defined with respect to space or time. Information about spatial contiguity is provided by a connection network among sites, with edges describing the links between connected sites. Clustering with a temporal contiguity constraint is also known as chronological clustering. Information on temporal contiguity can be implicitly provided as the rank positions of observations in the time series. The implementation was mirrored on that found in the hierarchical clustering function hclust of the standard R package stats (R Core Team 2022). We transcribed that function from Fortran to C and added the functionality to apply constraints when running the function. The implementation is efficient. It is limited mainly by input/output access as massive amounts of memory are potentially needed to store copies of the dissimilarity matrix and update its elements when analyzing large problems. We provided R computer code for plotting results for numbers of clusters.
- Published
- 2022
- Full Text
- View/download PDF
17. Active constrained deep embedded clustering with dual source.
- Author
-
Hazratgholizadeh, R., Balafar, M. A., and Derakhshi, M. R. F.
- Subjects
DEEP learning ,ACTIVE learning ,SUPERVISED learning ,PRIOR learning - Abstract
Deep clustering using a deep neural network (DNN) is widely used for simultaneously learning feature representation and clustering. The existing constrained deep clustering methods utilize prior knowledge for improving deep clustering. However, most of these methods randomly select prior knowledge (pairwise constraints) and fail to use it appropriately in the deep clustering process. The present study aims to address this limitation by proposing a new scheme for integrating and improving constrained deep clustering by active learning from dual source. The scheme is DNN for initializing the nonlinear transformation of the original feature space, clustering layer, as well as the constrained clustering layer which is parallel to the clustering layer and uses prior knowledge as a set of neighborhoods. In addition, active learning uses the above-mentioned two layers as a source simultaneously as the proposed scheme for selecting informative and diverse data. The suggested method can simultaneously lead to constrained clustering, learn the latent feature space with the guidance of the constraints set, and indirectly cause the data belonging to one neighborhood to be closer to its center (i.e. away from other neighborhoods centers). Different experiments on different datasets indicate the efficiency and robustness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Deep semi-supervised clustering for multi-variate time-series.
- Author
-
Ienco, Dino and Interdonato, Roberto
- Subjects
- *
CRANES (Birds) , *MULTIVARIATE analysis , *TIME series analysis , *DATA analysis - Abstract
Huge amount of data are nowadays produced by a large and disparate family of sensors, which typically measure multiple variables over time. Such rich information can be profitably organized as multivariate time-series. Collect enough labelled samples to set up supervised analysis for such kind of data is challenging while a reasonable assumption is to dispose of a limited background knowledge that can be injected in the analysis process. In this context, semi-supervised clustering methods represent a well suited tool to get the most out of such reduced amount of knowledge. With the aim to deal with multivariate time-series analysis under a limited background knowledge setting, we propose a semi-supervised (constrained) deep embedding time-series clustering framework that exploits knowledge supervision modeled as Must- and Cannot-link constraints. More in detail, our proposal, named conDetSEC (constrained Deep embedding time SEries Clustering), is based on Gated Recurrent Units (GRUs) with the aim to explicitly manage the temporal dimension associated to multi-variate time series data. conDetSEC implements a procedure in which an embedding generation step is combined with a clustering refinement step. Both steps exploit the small amount of available knowledge provided by Must- and Cannot-link constraints. More specifically, during the data embedding generation the constraints are used by jointly optimizing the network parameters via both unsupervised and semi-supervised tasks, while at the refinement step they are used in conjunction with the goal to stretch the embedding manifold towards the clustering centroids to recover a more clear cluster structure. Experimental evaluation on real-world benchmarks coming from diverse domains has highlighted the effectiveness of our proposal in comparison with state-of-the-art unsupervised and semi-supervised time-series clustering methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. Scalable Feature Matching Across Large Data Collections.
- Author
-
Degras, David
- Subjects
- *
EUCLIDEAN distance , *ASSIGNMENT problems (Programming) , *DATABASES , *COMBINATORIAL optimization , *ALGORITHMS - Abstract
This article is concerned with matching feature vectors in a one-to-one fashion across large collections of datasets. Formulating this task as a multidimensional assignment problem with decomposable costs (MDADC), we develop fast algorithms with time complexity roughly linear in the number n of datasets and space complexity a small fraction of the data size. These remarkable properties hinge on using the squared Euclidean distance as dissimilarity function, which can reduce ( n 2 ) matching problems between pairs of datasets to n problems and enable calculating assignment costs on the fly. To our knowledge, no other method applicable to the MDADC possesses these linear scaling and low-storage properties necessary to large-scale applications. In numerical experiments, the novel algorithms outperform competing methods and show excellent computational and optimization performances. An application of feature matching to a large neuroimaging database is presented. The algorithms of this article are implemented in the R package matchFeat available at github.com/ddegras/matchFeat. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. Towards more efficient local search algorithms for constrained clustering.
- Author
-
Gao, Jian, Tao, Xiaoxia, and Cai, Shaowei
- Subjects
- *
SEARCH algorithms , *CONSTRAINT satisfaction , *SUM of squares , *DOCUMENT clustering , *ALGORITHMS , *INFORMATION filtering - Abstract
• The constrained clustering problem is studied. • An efficient local search algorithm is proposed. • A node filtering strategy is introduced for improving efficiency. • The proposed algorithm is more effective than state-of-the-art heuristics. Constrained clustering extends clustering by integrating user constraints, and aims to determine an optimal assignment under the constraints. In this paper, we propose a local search algorithm called FastCCP to solve the constrained clustering problem. In the algorithm, instances connected by must-link constraints are first merged into nodes, and then, a local search method is performed to handle the cannot-link constraints while minimizing the Within-Cluster Sum of Squares (WCSS). Several strategies are proposed to enhance the solution diversity and achieve a trade-off between constraint satisfaction and WCSS minimization during the search. Furthermore, a node-filtering strategy is proposed to improve the efficiency of the algorithm. Experiments are performed on benchmark datasets to evaluate our algorithm. The comparative results indicate that our algorithm outperforms state-of-the-art algorithms in terms of both the solution quality and CPU runtime. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Active Pairwise Constraint Learning in Constrained Time-Series Clustering for Crop Mapping from Airborne SAR Imagery.
- Author
-
Qin, Xingli, Zhao, Lingli, Yang, Jie, Li, Pingxiang, Wu, Bingfang, Sun, Kaimin, and Xu, Yubin
- Subjects
- *
CROPS , *ITERATIVE learning control , *SYNTHETIC aperture radar - Abstract
Airborne SAR is an important data source for crop mapping and has important applications in agricultural monitoring and food safety. However, the incidence-angle effects of airborne SAR imagery decrease the crop mapping accuracy. An active pairwise constraint learning method (APCL) is proposed for constrained time-series clustering to address this problem. APCL constructs two types of instance-level pairwise constraints based on the incidence angles of the samples and a non-iterative batch-mode active selection scheme: the must-link constraint, which links two objects of the same crop type with large differences in backscattering coefficients and the shapes of time-series curves; the cannot-link constraint, which links two objects of different crop types with only small differences in the values of backscattering coefficients. Experiments were conducted using 12 time-series images with incidence angles ranging from 21.2° to 64.3°, and the experimental results prove the effectiveness of APCL in improving crop mapping accuracy. More specifically, when using dynamic time warping (DTW) as the similarity measure, the kappa coefficient obtained by APCL was increased by 9.5%, 8.7%, and 5.2% compared to the results of the three other methods. It provides a new solution for reducing the incidence-angle effects in the crop mapping of airborne SAR time-series images. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Maintaining Consistency with Constraints: A Constrained Deep Clustering Method
- Author
-
Cui, Yi, Zhang, Xianchao, Zong, Linlin, Mu, Jie, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Karlapalem, Kamal, editor, Cheng, Hong, editor, Ramakrishnan, Naren, editor, Agrawal, R. K., editor, Reddy, P. Krishna, editor, Srivastava, Jaideep, editor, and Chakraborty, Tanmoy, editor
- Published
- 2021
- Full Text
- View/download PDF
23. CDEC: a constrained deep embedded clustering
- Author
-
Amirizadeh, Elham and Boostani, Reza
- Published
- 2021
- Full Text
- View/download PDF
24. Double-Constrained Consensus Clustering with Application to Online Anti-Counterfeiting
- Author
-
Claudio Carpineto and Giovanni Romano
- Subjects
semi-supervised consensus clustering ,ensemble clustering ,constrained clustering ,analysis of clustering constraints ,online anti-counterfeiting ,clustering fraudulent websites ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Semi-supervised consensus clustering is a promising strategy to compensate for the subjectivity of clustering and its sensitivity to design factors, with various techniques being recently proposed to integrate domain knowledge and multiple clustering partitions. In this article, we present a new approach that makes double use of domain knowledge, namely to build the initial partitions, as well as to combine them. In particular, we show how to model and integrate must-link and cannot-link constraints into the objective function of a generic consensus clustering (CC) framework that maximizes the similarity between the consensus partition and the input partitions, which have, in turn, been enriched with the same constraints. In addition, borrowing from the theory of functional dependencies, the integrated framework exploits the notions of deductive closure and minimal cover to take full advantage of the logical implication between constraints. Using standard UCI benchmarks, we found that the resulting algorithm, termed CCC double-constrained consensus clustering), was more effective than plain CC at combining base-constrained partitions, with an average performance improvement of 5.54%. We then argue that CCC is especially well-suited for profiling counterfeit e-commerce websites, as constraints can be acquired by leveraging specific domain features, and demonstrate its potential for detecting affiliate marketing programs. Taken together, our experiments suggest that CCC makes the process of clustering more robust and able to withstand changes in clustering algorithms, datasets, and features, with a remarkable improvement in average performance.
- Published
- 2023
- Full Text
- View/download PDF
25. DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data.
- Author
-
Lee, Alexandra J, Chang, Ivan, Burel, Julie G, Lindestam Arlehamn, Cecilia S, Mandava, Aishwarya, Weiskopf, Daniela, Peters, Bjoern, Sette, Alessandro, Scheuermann, Richard H, and Qian, Yu
- Subjects
Lymphocytes ,Humans ,Flow Cytometry ,Cluster Analysis ,Data Interpretation ,Statistical ,Pattern Recognition ,Automated ,Data Analysis ,autogating ,cell population identification ,constrained clustering ,data prefiltering ,recursive clustering ,Bioengineering ,Biochemistry and Cell Biology ,Immunology - Abstract
Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and the ClusterR package. For cell population identification, DAFi supports multiple options including clustering, bisecting, slope-based gating, and reversed filtering to meet various autogating needs from different scientific use cases. © 2018 International Society for Advancement of Cytometry.
- Published
- 2018
26. A Constrained Cluster Analysis with Homogeneity of External Criterion
- Author
-
Takahashi, Masao, Asakawa, Tomoo, Sato-Ilic, Mika, Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, and Czarnowski, Ireneusz, editor
- Published
- 2020
- Full Text
- View/download PDF
27. An Effective and Efficient Constrained Ward’s Hierarchical Agglomerative Clustering Method
- Author
-
Aljohani, Abeer A., Edirisinghe, Eran A., Lai, Daphne Teck Ching, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Bi, Yaxin, editor, Bhatia, Rahul, editor, and Kapoor, Supriya, editor
- Published
- 2020
- Full Text
- View/download PDF
28. Multi Object Tracking for Similar Instances: A Hybrid Architecture
- Author
-
Fóthi, Áron, Faragó, Kinga B., Kopácsi, László, Milacski, Zoltán Á., Varga, Viktor, Lőrincz, András, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Yang, Haiqin, editor, Pasupa, Kitsuchart, editor, Leung, Andrew Chi-Sing, editor, Kwok, James T., editor, Chan, Jonathan H., editor, and King, Irwin, editor
- Published
- 2020
- Full Text
- View/download PDF
29. Agglomerative Constrained Clustering Through Similarity and Distance Recalculation
- Author
-
González-Almagro, Germán, Suarez, Juan Luis, Luengo, Julián, Cano, José-Ramón, García, Salvador, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, de la Cal, Enrique Antonio, editor, Villar Flecha, José Ramón, editor, Quintián, Héctor, editor, and Corchado, Emilio, editor
- Published
- 2020
- Full Text
- View/download PDF
30. Constrained Clustering via Post-processing
- Author
-
Nghiem, Nguyen-Viet-Dung, Vrain, Christel, Dao, Thi-Bich-Hanh, Davidson, Ian, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Appice, Annalisa, editor, Tsoumakas, Grigorios, editor, Manolopoulos, Yannis, editor, and Matwin, Stan, editor
- Published
- 2020
- Full Text
- View/download PDF
31. Mining Constrained Regions of Interest: An Optimization Approach
- Author
-
Dubray, Alexandre, Derval, Guillaume, Nijssen, Siegfried, Schaus, Pierre, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Appice, Annalisa, editor, Tsoumakas, Grigorios, editor, Manolopoulos, Yannis, editor, and Matwin, Stan, editor
- Published
- 2020
- Full Text
- View/download PDF
32. A Framework for Deep Constrained Clustering - Algorithms and Advances
- Author
-
Zhang, Hongjing, Basu, Sugato, Davidson, Ian, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Brefeld, Ulf, editor, Fromont, Elisa, editor, Hotho, Andreas, editor, Knobbe, Arno, editor, Maathuis, Marloes, editor, and Robardet, Céline, editor
- Published
- 2020
- Full Text
- View/download PDF
33. Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
- Author
-
Jan Geryk, Alzbeta Zinkova, Iveta Zedníková, Halina Simková, Vlastimil Stenzl, and Marie Korabecna
- Subjects
Structural variants ,Breakpoints uncertainty problem ,Whole genome sequencing ,Mendelian inheritance error ,Constrained clustering ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. Results We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy–Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. Conclusions We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used.
- Published
- 2021
- Full Text
- View/download PDF
34. Theoretical analysis of classic and capacity constrained fuzzy clustering.
- Author
-
Benatti, Kléber A., Pedroso, Lucas G., and Ribeiro, Ademir A.
- Subjects
- *
POINT set theory , *A priori - Abstract
In this paper we present a theoretical analysis on fuzzy centroid-based clustering methods. In addition to the formulation on the classical approaches, we consider constraints that may be useful in some practical applications, such as restrictions on the number of points in each group, and methods that deal with these constraints. We propose a more general formulation to the constrained clustering problem, where each point has an associated weight, and the sum of the weights of the points that compose each group is established a priori. For both classical and proposed approaches we discuss existence and uniqueness of solutions of the involved problems, providing mathematical foundations for the established formulas. Preliminary numerical experiments, performed by means of two-dimensional examples, are also presented. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. 3SHACC: Three stages hybrid agglomerative constrained clustering.
- Author
-
González-Almagro, Germán, Suárez, Juan Luis, Luengo, Julián, Cano, José-Ramón, and García, Salvador
- Subjects
- *
HIERARCHICAL clustering (Cluster analysis) - Abstract
Traditionally within the unsupervised learning paradigm, hierarchical and partitional clustering techniques have been shown to produce better results when provided with partial information, leading to a renewed attention towards this topic. Constrained clustering is a semi-supervised learning problem that combines classic clustering techniques with background knowledge given in the form of a set of constraints. In this paper, we propose to incorporate constraints into the clustering process in three phases: the first phase is devoted to quantify constraint relevance and to learn a metric matrix according to such relevance, a second phase computing similarities between instances by means of the reconstruction coefficient and pairwise distances, and a third stage performing agglomerative hierarchical clustering with a reward-style stepped affinity function favoring merges satisfying the higher possible number of constraints. Experimental results, supported by Bayesian statistical testing, show a consistent improvement in favor of our proposal over previous approaches to the constrained clustering problem. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Identification of piecewise affine model for batch processes based on constrained clustering technique.
- Author
-
Liu, Jiaxin, Xu, Zuhua, Zhao, Jun, and Shao, Zhijiang
- Subjects
- *
BATCH processing , *GOLDEN ratio , *K-means clustering , *PHASE partition - Abstract
In this paper, a novel identification method of piecewise affine (PWA) model for batch processes based on constrained clustering technique is proposed. In traditional clustering-based identification approaches, data classification and region partition are performed individually so that inseparable problem usually occurs in the partition phase. The proposed method uses a constrained K-means clustering algorithm to simultaneously perform both data classification and region partition, which is accomplished by imposing the complete and non-overlapping partition constraints into the clustering optimization problem. We employ a greedy iterative approach combined with the golden section search to efficiently solve the constrained clustering problem. This method can greatly improve the accuracy of the identified PWA model. Finally, we demonstrate the effectiveness of the proposed identification method. ● An identification method of PWA model for batch processes based on constrained clustering technique is proposed. ● It uses constrained K-means clustering algorithm to simultaneously perform both data classification and region partition. ● We employ a greedy iterative approach combined with the golden section search to solve the constrained clustering problem. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. Weighted sparse simplex representation: a unified framework for subspace clustering, constrained clustering, and active learning.
- Author
-
Peng, Hankui and Pavlidis, Nicos G.
- Subjects
ACTIVE learning ,IMAGE recognition (Computer vision) - Abstract
Spectral-based subspace clustering methods have proved successful in many challenging applications such as gene sequencing, image recognition, and motion segmentation. In this work, we first propose a novel spectral-based subspace clustering algorithm that seeks to represent each point as a sparse convex combination of a few nearby points. We then extend the algorithm to a constrained clustering and active learning framework. Our motivation for developing such a framework stems from the fact that typically either a small amount of labelled data are available in advance; or it is possible to label some points at a cost. The latter scenario is typically encountered in the process of validating a cluster assignment. Extensive experiments on simulated and real datasets show that the proposed approach is effective and competitive with state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Iterative and Semi-Supervised Design of Chatbots Using Interactive Clustering.
- Author
-
Schild, Erwan, Durantin, Gautier, Lamirel, Jean-Charles, and Miconi, Florian
- Abstract
Chatbots represent a promising tool to automate the processing of requests in a business context. However, despite major progress in natural language processing technologies, constructing a dataset deemed relevant by business experts is a manual, iterative, and error-prone process. To assist these experts during modelling and labelling, the authors propose an active learning methodology coined interactive clustering. It relies on interactions between computer-guided segmentation of data in intents and response-driven human annotations imposing constraints on clusters to improve relevance. This article applies interactive clustering on a realistic dataset and measures the optimal settings required for relevant segmentation in a minimal number of annotations. The usability of the method is discussed in terms of computation time and the achieved compromise between business relevance and classification performance during training. In this context, interactive clustering appears as a suitable methodology combining human and computer initiatives to efficiently develop a useable chatbot. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
39. A review on declarative approaches for constrained clustering.
- Author
-
Dao, Thi-Bich-Hanh and Vrain, Christel
- Subjects
- *
CONSTRAINT programming , *LINEAR programming , *INTEGER programming , *ALGORITHMS - Abstract
Clustering is an important Machine Learning task, which aims at discovering the implicit structure of data. Applying a clustering algorithm is easy but since clustering is an unsupervised task, tuning it so that the results is appropriate to the expert expectations is much less obvious. To overcome this, expert knowledge can be integrated into a clustering process; this is generally formalized as constraints on the desired output, thus leading to constrained clustering. There are two lines of research for clustering: distance based clustering, where data are grouped into clusters according to their dissimilarity and conceptual clustering, where a cluster must be a concept that is a set of objects and a set of properties that describe them. This second approach relies on Formal Concept Analysis and benefits from advances in Pattern Mining. [66] has shown the interest of declarative approaches for pattern mining and has led to a new research direction for clustering that is interested in the use of declarative frameworks, such as Integer Linear Programming, Constraint Programming or SAT. This has several advantages: finding a global optimum, integrating different kinds of constraints, even complex ones in a clustering process and even combining conceptual and distance-based clustering. In this paper we present an inventory of constraints and a survey of declarative methods for constrained clustering. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Fuzzy clustering with capacity constraints: Algorithm, convergence analysis and numerical experiments.
- Author
-
Benatti, Kléber A., Pedroso, Lucas G., and Ribeiro, Ademir A.
- Subjects
- *
CONSTRAINT algorithms , *NUMERICAL analysis , *MATHEMATICAL optimization , *PROBLEM solving , *LINEAR systems - Abstract
In this paper we study the fuzzy clustering problem with capacity constraints. Despite of the fact that the fuzzy clustering approach is widely encountered in the literature, the inclusion of capacity constraints is recent and has several practical applications. We propose a general formulation of the clustering problem, where each point has an associated weight and the sum of the weights of the points that compose each group is established a priori. We discuss existence of solutions of the involved problems, providing a mathematical foundation for the established formulas. Besides, we propose a practical algorithm for solving this problem and present its convergence analysis. This algorithm follows an alternate minimization scheme, wherein a given iteration addresses the problem first in terms of the probabilities of each point x j belonging to each cluster i , denoted as u i j , finding subsequently the position of the centroids, c i , i = 1 , ... , g , j = 1 , ... , n. This procedure is K-means-like, with the distinction that, as a point does not exclusively belong to a group, the computation of u i j requires optimization techniques. In our case, this involves solving a linear system derived from the Karush–Kuhn–Tucker (KKT) conditions. With the aim of validating our algorithm, we present numerical tests with synthetic and real-world data to demonstrate its performance for the given problems. Since the proposal successfully solved these numerical tests within a reasonable computational time, it can be considered a valuable resource for addressing real-world applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. A geospatial clustering algorithm and its integration into a techno-economic rural electrification planning model.
- Author
-
Torres-Pérez, Mirelys, Domínguez, Javier, Arribas, Luis, Amador, Julio, Ciller, Pedro, and González-García, Andrés
- Published
- 2024
- Full Text
- View/download PDF
42. Constrained Density-Based Spatial Clustering of Applications with Noise (DBSCAN) using hyperparameter optimization.
- Author
-
Kim, Jongwon, Lee, Hyeseon, and Ko, Young Myoung
- Abstract
This article proposes a hyperparameter optimization method for density-based spatial clustering of applications with noise (DBSCAN) with constraints, termed HC-DBSCAN. While DBSCAN is effective at creating non-convex clusters, it cannot limit the number of clusters. This limitation is difficult to address with simple adjustments or heuristic methods. We approach constrained DBSCAN as an optimization problem and solve it using a customized alternating direction method of multipliers Bayesian optimization (ADMMBO). Our custom ADMMBO enables HC-DBSCAN to reuse clustering results for enhanced computational efficiency, handle integer-valued parameters, and incorporate constraint functions that account for the degree of violations to improve clustering performance. Furthermore, we propose an evaluation metric, penalized Davies–Bouldin score , with a computational cost of O (N). This metric aims to mitigate the high computational cost associated with existing metrics and efficiently manage noise instances in DBSCAN. Numerical experiments demonstrate that HC-DBSCAN, equipped with the proposed metric, generates high-quality non-convex clusters and outperforms benchmark methods on both simulated and real datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Incremental Constrained Random Walk Clustering
- Author
-
He, Ping, Jing, Tianyu, Xu, Xiaohua, Lin, Huihui, Liao, Zheng, Fan, Baichuan, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Bhatia, Sanjiv K., editor, Tiwari, Shailesh, editor, Mishra, Krishn K., editor, and Trivedi, Munesh C., editor
- Published
- 2019
- Full Text
- View/download PDF
44. Modernizing Deep Unsupervised Learning with Human Experience
- Author
-
Zhang, Hongjing
- Subjects
Computer science ,Anomaly Detection ,Constrained Clustering ,Explainable AI ,Fair Machine Learning ,Representation Learning - Abstract
Deep unsupervised learning has emerged as a promising alternative to supervised approaches. However, supervised learning needs a tremendous amount of information in the form of annotations on specific pre-defined tasks. In contrast, human learning requires much fewer annotations and is flexible. Recent research efforts have been motivated to explore different deep unsupervised learning algorithms to leverage the massive unlabeled data for various applications that move beyond the supervised learning setting. While recent deep unsupervised learning works have shown their success in representation learning, clustering, and anomaly detection, many challenges remain unsolved. For example, how to improve the quality of learned representations used for downstream applications (the quality of learned representations challenge)? How to interpret and understand the deep unsupervised learning model predictions (the explainability challenge)? Is there any risk of bias for deep unsupervised learning applications (the bias and fairness challenge)?To gain insights into the aforementioned challenges, we propose a broad range of novel techniques to address them. Each injects human-level knowledge into deep unsupervised learning. To be specific, this dissertation presents five approaches. The first two address the quality of representation challenge, the third the explainability challenge, and the last two the bias and fairness challenges. Our first formulation introduces a deep constrained clustering framework that enhances clustering performance via various constraints. Our second formulation is a self-supervised representation learning framework that automatically discovers and differentiates different categories. The third formulation simultaneously performs representation learning for clustering and describing the generated clusters with semantic tags associated with the clustered instances. Our fourth formulation proposes a novel deep fair anomaly detection architecture that uses adversarial learning to inject human fairness rules. Finally, our fifth formulation enforces disparate impact rules into deep clustering models via minimal modification learning. These methods are unified in modernizing deep unsupervised learning with different types of human guidance.
- Published
- 2022
45. A quantitative comparison of regionalization methods.
- Author
-
Aydin, Orhun, Janikas, Mark. V., Assunção, Renato Martins, and Lee, Ting-Hwan
- Subjects
- *
ECOLOGICAL regions , *CURRICULUM , *DATA mining - Abstract
Regionalization is the task of partitioning a set of contiguous areas into spatial clusters or regions. The theoretical and empirical literature focusing on regionalization is extensive, yet few quantitative comparisons have been conducted. We present a simulation study and explore the quality of frequently used and state-of-the-art regionalization algorithms, namely AZP, AZP-SA, AZPTabu, ARISEL, REDCAP, and SKATER, where the number of regions is an exogenous variable. The simulated benchmark data set consists of model realizations that represent various complexities in spatial data. Model families are defined with respect to regions' shapes, value-mixing between regions, and the number of underlying spatial clusters. We evaluate the performance of different regionalization methods for realizations families using internal and external measures of regionalization quality. A large number of regionalization quality metrics expose a detailed profile of the analyzed methods' strengths and weaknesses. We investigate the computational efficiency of every method as a function of the number of spatial units studied. We summarize results for different region families and discuss circumstances that make a certain method more desirable. We illustrate different regionalization algorithms' implications on defining ecological regions for the conterminous US and compare them against expert-defined ecoregions. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
46. Interactive Steering of Hierarchical Clustering.
- Author
-
Yang, Weikai, Wang, Xiting, Lu, Jie, Dou, Wenwen, and Liu, Shixia
- Subjects
HIERARCHICAL clustering (Cluster analysis) ,ANT algorithms ,DATA distribution - Abstract
Hierarchical clustering is an important technique to organize big data for exploratory data analysis. However, existing one-size-fits-all hierarchical clustering methods often fail to meet the diverse needs of different users. To address this challenge, we present an interactive steering method to visually supervise constrained hierarchical clustering by utilizing both public knowledge (e.g., Wikipedia) and private knowledge from users. The novelty of our approach includes 1) automatically constructing constraints for hierarchical clustering using knowledge (knowledge-driven) and intrinsic data distribution (data-driven), and 2) enabling the interactive steering of clustering through a visual interface (user-driven). Our method first maps each data item to the most relevant items in a knowledge base. An initial constraint tree is then extracted using the ant colony optimization algorithm. The algorithm balances the tree width and depth and covers the data items with high confidence. Given the constraint tree, the data items are hierarchically clustered using evolutionary Bayesian rose tree. To clearly convey the hierarchical clustering results, an uncertainty-aware tree visualization has been developed to enable users to quickly locate the most uncertain sub-hierarchies and interactively improve them. The quantitative evaluation and case study demonstrate that the proposed approach facilitates the building of customized clustering trees in an efficient and effective manner. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. Active Pairwise Constraint Learning in Constrained Time-Series Clustering for Crop Mapping from Airborne SAR Imagery
- Author
-
Xingli Qin, Lingli Zhao, Jie Yang, Pingxiang Li, Bingfang Wu, Kaimin Sun, and Yubin Xu
- Subjects
synthetic aperture radar (SAR) ,crop mapping ,time-series images ,constrained clustering ,active constraint learning ,Science - Abstract
Airborne SAR is an important data source for crop mapping and has important applications in agricultural monitoring and food safety. However, the incidence-angle effects of airborne SAR imagery decrease the crop mapping accuracy. An active pairwise constraint learning method (APCL) is proposed for constrained time-series clustering to address this problem. APCL constructs two types of instance-level pairwise constraints based on the incidence angles of the samples and a non-iterative batch-mode active selection scheme: the must-link constraint, which links two objects of the same crop type with large differences in backscattering coefficients and the shapes of time-series curves; the cannot-link constraint, which links two objects of different crop types with only small differences in the values of backscattering coefficients. Experiments were conducted using 12 time-series images with incidence angles ranging from 21.2° to 64.3°, and the experimental results prove the effectiveness of APCL in improving crop mapping accuracy. More specifically, when using dynamic time warping (DTW) as the similarity measure, the kappa coefficient obtained by APCL was increased by 9.5%, 8.7%, and 5.2% compared to the results of the three other methods. It provides a new solution for reducing the incidence-angle effects in the crop mapping of airborne SAR time-series images.
- Published
- 2022
- Full Text
- View/download PDF
48. Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem.
- Author
-
Geryk, Jan, Zinkova, Alzbeta, Zedníková, Iveta, Simková, Halina, Stenzl, Vlastimil, and Korabecna, Marie
- Subjects
- *
HEREDITY , *GENETIC variation , *WHOLE genome sequencing - Abstract
Background: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. Results: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy–Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. Conclusions: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
49. Clustering in the presence of side information: a non-linear approach
- Author
-
Abin, Ahmad Ali
- Published
- 2019
- Full Text
- View/download PDF
50. Expert-driven trace clustering with instance-level constraints.
- Author
-
De Koninck, Pieter, Nelissen, Klaas, vanden Broucke, Seppe, Baesens, Bart, Snoeck, Monique, and De Weerdt, Jochen
- Subjects
PROCESS mining - Abstract
Within the field of process mining, several different trace clustering approaches exist for partitioning traces or process instances into similar groups. Typically, this partitioning is based on certain patterns or similarity between the traces, or driven by the discovery of a process model for each cluster. The main drawback of these techniques, however, is that their solutions are usually hard to evaluate or justify by domain experts. In this paper, we present two constrained trace clustering techniques that are capable to leverage expert knowledge in the form of instance-level constraints. In an extensive experimental evaluation using two real-life datasets, we show that our novel techniques are indeed capable of producing clustering solutions that are more justifiable without a substantial negative impact on their quality. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.