153 results on '"Ludmila I. Kuncheva"'
Search Results
2. Combination of Object Tracking and Object Detection for Animal Recognition
- Author
-
Francis Williams, Ludmila I. Kuncheva, Juan J. Rodríguez, and Samuel L. Hennessey
- Published
- 2022
- Full Text
- View/download PDF
3. Basic Ensembles of Vanilla-Style Deep Learning Models Improve Liver Segmentation From CT Images
- Author
-
A. Emre Kavur, Ludmila I. Kuncheva, and M. Alper Selver
- Published
- 2022
- Full Text
- View/download PDF
4. Prototype Classifiers and the Big Fish: The Case of Prototype (Instance) Selection
- Author
-
Ludmila I. Kuncheva
- Subjects
0209 industrial biotechnology ,Artificial neural network ,Computer Networks and Communications ,business.industry ,Computer science ,Human Factors and Ergonomics ,02 engineering and technology ,Excuse ,Computer Science Applications ,Human-Computer Interaction ,Statistical classification ,020901 industrial engineering & automation ,Control and Systems Engineering ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Cybernetics ,%22">Fish ,020201 artificial intelligence & image processing ,Artificial intelligence ,Instance selection ,business - Abstract
Jim Bezdek once told me, "Write just the same way you talk!" That is my excuse for the unashamedly colloquial text to follow.
- Published
- 2020
- Full Text
- View/download PDF
5. An experiment on animal re-identification from video
- Author
-
Ludmila I. Kuncheva, José Luis Garrido-Labrador, Ismael Ramos-Pérez, Samuel L. Hennessey, and Juan J. Rodríguez
- Subjects
Informática ,Ecology ,Biología ,Applied Mathematics ,Ecological Modeling ,Classification ,Computer science ,Animal re-identification ,Computer Science Applications ,Computational Theory and Mathematics ,Modeling and Simulation ,Computer vision ,Comparative study ,Biology ,Ecology, Evolution, Behavior and Systematics ,Convolutional networks - Abstract
In the face of the global concern about climate change and endangered ecosystems, monitoring individual animals is of paramount importance. Computer vision methods for animal recognition and re-identification from video or image collections are a modern alternative to more traditional but intrusive methods such as tagging or branding. While there are many studies reporting results on various animal re-identification databases, there is a notable lack of comparative studies between different classification methods. In this paper we offer a comparison of 25 classification methods including linear, non-linear and ensemble models, as well as deep learning networks. Since the animal databases are vastly different in characteristics and difficulty, we propose an experimental protocol that can be applied to a chosen data collections. We use a publicly available database of five video clips, each containing multiple identities (9 to 27), where the animals are typically present as a group in each video frame. Our experiment involves five data representations: colour, shape, texture, and two feature spaces extracted by deep learning. In our experiments, simpler models (linear classifiers) and just colour feature space gave the best classification accuracy, demonstrating the importance of running a comparative study before resorting to complex, time-consuming, and potentially less robust methods., This work is supported by the UKRI Centre for Doctoral Training in Artificial Intelligence, Machine Learning and Advanced Computing (AIMLAC), funded by grant EP/S023992/1. This work is also supported by the Junta de Castilla León under project BU055P20 (JCyL/FEDER, UE), and the Ministry of Science and Innovation under project PID2020-119894 GB-I00 co-financed through European Union FEDER funds. J.L. Garrido-Labrador is supported through Consejería de Educación of the Junta de Castilla y León and the European Social Fund through a pre-doctoral grant (EDU/875/2021). I. Ramos-Perez is supported by the predoctoral grant (BDNS 510149) awarded by the Universidad de Burgos, Spain. J.J. Rodríguez was supported by mobility grant PRX21/00638 of the Spanish Ministry of Universities.
- Published
- 2023
- Full Text
- View/download PDF
6. Classification and comparison of on-line video summarisation methods
- Author
-
Paria Yousefi, Clare E. Matthews, and Ludmila I. Kuncheva
- Subjects
Similarity (geometry) ,business.industry ,Computer science ,Feature vector ,Frame (networking) ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,Hardware and Architecture ,Taxonomy (general) ,Line (geometry) ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Selection (linguistics) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Representation (mathematics) ,computer ,Software ,Natural language processing - Abstract
Many methods exist for generating keyframe summaries of videos. However, relatively few methods consider on-line summarisation, where memory constraints mean it is not practical to wait for the full video to be available for processing. We propose a classification (taxonomy) for on-line video summarisation methods based upon their descriptive and distinguishing properties such as feature space for frame representation, strategies for grouping time-contiguous frames, and techniques for selecting representative frames. Nine existing on-line methods are presented within the terms of our taxonomy and subsequently compared by testing on two synthetic data sets and a collection of short videos. We find that success of the methods is largely independent of techniques for grouping time-contiguous frames and for measuring similarity between frames. On the other hand, decisions about the number of keyframes and the selection mechanism may substantially affect the quality of the summary. Finally, we remark on the difficulty in tuning the parameters of the methods “on-the-fly”, without knowledge of the video duration, dynamic or content.
- Published
- 2019
- Full Text
- View/download PDF
7. Restricted Set Classification with prior probabilities: A case study on chessboard recognition
- Author
-
Ludmila I. Kuncheva and James H.V. Constance
- Subjects
business.industry ,Computer science ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,010305 fluids & plasmas ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,Software - Abstract
In the Restricted Set Classification approach (RSC), a set of instances must be labelled simultaneously into a given number of classes, while observing an upper limit on the number of instances from each class. In this study we expand RSC by incorporating prior probabilities for the classes and demonstrate the improvement on the classification accuracy by doing so. As a case-study, we chose the challenging task of recognising the pieces on a chessboard from top-view images, without any previous knowledge of the game. This task fits elegantly into the RSC approach as the number of pieces on the board is limited, and each class (type of piece) may have only a fixed number of instances. We prepared an image dataset by sampling from existing competition games, arranging the pieces on the chessboard, and taking top-view snapshots. Using the grey-level intensities of each square as features, we applied single and ensemble classifiers within the RSC approach. Our results demonstrate that including prior probabilities calculated from existing chess games improves the RSC classification accuracy, which, in its own accord, is better than the accuracy of the classifier applied independently.
- Published
- 2018
- Full Text
- View/download PDF
8. Edited nearest neighbour for selecting keyframe summaries of egocentric videos
- Author
-
Paria Yousefi, Ludmila I. Kuncheva, and Jurandy Almeida
- Subjects
Ground truth ,Computer science ,business.industry ,Feature vector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Nearest neighbour ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Convolutional neural network ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Classifier (UML) - Abstract
A keyframe summary of a video must be concise, comprehensive and diverse. Current video summarisation methods may not be able to enforce diversity of the summary if the events have highly similar visual content, as is the case of egocentric videos. We cast the problem of selecting a keyframe summary as a problem of prototype (instance) selection for the nearest neighbour classifier (1-nn). Assuming that the video is already segmented into events of interest (classes), and represented as a dataset in some feature space, we propose a Greedy Tabu Selector algorithm (GTS) which picks one frame to represent each class. An experiment with the UT (Egocentric) video database and seven feature representations illustrates the proposed keyframe summarisation method. GTS leads to improved match to the user ground truth compared to the closest-to-centroid baseline summarisation method. Best results were obtained with feature spaces obtained from a convolutional neural network (CNN).
- Published
- 2018
- Full Text
- View/download PDF
9. Random Balance ensembles for multiclass imbalance learning
- Author
-
Ludmila I. Kuncheva, Álvar Arnaiz-González, Juan J. Rodríguez, and José-Francisco Díez-Pastor
- Subjects
Informática ,Information Systems and Management ,Training set ,Boosting (machine learning) ,Computer science ,business.industry ,02 engineering and technology ,Imbalanced data ,Machine learning ,computer.software_genre ,Management Information Systems ,Multiclass classification ,Artificial Intelligence ,Classifier ensembles ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Software - Abstract
Random Balance strategy (RandBal) has been recently proposed for constructing classifier ensembles for imbalanced, two-class data sets. In RandBal, each base classifier is trained with a sample of the data with a random class prevalence, independent of the a priori distribution. Hence, for each sample, one of the classes will be undersampled while the other will be oversampled. RandBal can be applied on its own or can be combined with any other ensemble method. One particularly successful variant is RandBalBoost which integrates Random Balance and boosting. Encouraged by the success of RandBal, this work proposes two approaches which extend RandBal to multiclass imbalance problems. Multiclass imbalance implies that at least two classes have substantially different proportion of instances. In the first approach proposed here, termed Multiple Random Balance (MultiRandBal), we deal with all classes simultaneously. The training data for each base classifier are sampled with random class proportions. The second approach we propose decomposes the multiclass problem into two-class problems using one-vs-one or one-vs-all, and builds an ensemble of RandBal ensembles. We call the two versions of the second approach OVO-RandBal and OVA-RandBal, respectively. These two approaches were chosen because they are the most straightforward extensions of RandBal for multiple classes. Our main objective is to evaluate both approaches for multiclass imbalanced problems. To this end, an experiment was carried out with 52 multiclass data sets. The results suggest that both MultiRandBal, and OVO/OVA-RandBal are viable extensions of the original two-class RandBal. Collectively, they consistently outperform acclaimed state-of-the art methods for multiclass imbalanced problems., Ministerio de Economía yCompetitividad[http://dx.doi.org/10.13039/501100003329] of theSpanishGovernmentthroughprojectTIN2015-67534-P(MINECO/FEDER, UE) and the Junta de Castilla y León through projectBU085P17 (JCyL/FEDER, UE)
- Published
- 2020
10. Animal reidentification using restricted set classification
- Author
-
Ludmila I. Kuncheva
- Subjects
0106 biological sciences ,Ecology ,Individual animal ,biology ,business.industry ,010604 marine biology & hydrobiology ,Applied Mathematics ,Ecological Modeling ,Pattern recognition ,010603 evolutionary biology ,01 natural sciences ,Convolutional neural network ,Computer Science Applications ,Image (mathematics) ,Set (abstract data type) ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Modeling and Simulation ,biology.protein ,%22">Fish ,Artificial intelligence ,Chromatin structure remodeling (RSC) complex ,business ,Classifier (UML) ,Ecology, Evolution, Behavior and Systematics - Abstract
Individual animal recognition and re-identification from still images or video are useful for research in animal behaviour, environment preservation, biology and more. We propose to use Restricted Set Classification (RSC) for classifying multiple animals simultaneously from the same image. Our literature review revealed that this problem has not been solved thus far. We applied RSC on a koi fish video using a convolutional neural network (CNN) as the individual classifier. Our results demonstrate that RSC is significantly better than applying just the CNN, as it eliminates duplicate labels in the same image and improves the overall classification accuracy.
- Published
- 2021
- Full Text
- View/download PDF
11. An experimental evaluation of mixup regression forests
- Author
-
Ludmila I. Kuncheva, Juan J. Rodríguez, Mario Juez-Gil, and Álvar Arnaiz-González
- Subjects
0209 industrial biotechnology ,Artificial neural network ,business.industry ,Computer science ,General Engineering ,Decision tree ,02 engineering and technology ,Machine learning ,computer.software_genre ,Ensemble learning ,Regularization (mathematics) ,Regression ,Computer Science Applications ,Random forest ,Range (mathematics) ,020901 industrial engineering & automation ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
Over the past few decades, the remarkable prediction capabilities of ensemble methods have been used within a wide range of applications. Maximization of base-model ensemble accuracy and diversity are the keys to the heightened performance of these methods. One way to achieve diversity for training the base models is to generate artificial/synthetic instances for their incorporation with the original instances. Recently, the mixup method was proposed for improving the classification power of deep neural networks (Zhang, Cisse, Dauphin, and Lopez-Paz, 2017). Mixup method generates artificial instances by combining pairs of instances and their labels, these new instances are used for training the neural networks promoting its regularization. In this paper, new regression tree ensembles trained with mixup, which we will refer to as Mixup Regression Forest, are presented and tested. The experimental study with 61 datasets showed that the mixup approach improved the results of both Random Forest and Rotation Forest.
- Published
- 2020
- Full Text
- View/download PDF
12. Combining univariate approaches for ensemble change detection in multivariate data
- Author
-
Ludmila I. Kuncheva, Juan J. Rodríguez, and William J. Faithfull
- Subjects
Informática ,Multivariate statistics ,Computer science ,business.industry ,Univariate ,Pattern recognition ,02 engineering and technology ,Class (biology) ,Hardware and Architecture ,Feature (computer vision) ,020204 information systems ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Control chart ,Artificial intelligence ,Network intrusion detection ,business ,Software ,Change detection ,Information Systems - Abstract
Detecting change in multivariate data is a challenging problem, especially when class labels are not available. There is a large body of research on univariate change detection, notably in control charts developed originally for engineering applications. We evaluate univariate change detection approaches —including those in the MOA framework — built into ensembles where each member observes a feature in the input space of an unsupervised change detection problem. We present a comparison between the ensemble combinations and three established ‘pure’ multivariate approaches over 96 data sets, and a case study on the KDD Cup 1999 network intrusion detection dataset. We found that ensemble combination of univariate methods consistently outperformed multivariate methods on the four experimental metrics., project RPG-2015-188 funded by The Leverhulme Trust, UK; Spanish Ministry of Economy and Competitiveness through project TIN 2015-67534-P and the Spanish Ministry of Education, Culture and Sport through Mobility Grant PRX16/00495. The 96 datasets were originally curated for use in the work of Fernández-Delgado et al. [53] and accessed from the personal web page of the author5. The KDD Cup 1999 dataset used in the case study was accessed from the UCI Machine Learning Repository [10]
- Published
- 2019
13. Selective Keyframe Summarisation for Egocentric Videos Based on Semantic Concept Search
- Author
-
Paria Yousefi and Ludmila I. Kuncheva
- Subjects
Vocabulary ,Information retrieval ,Concept search ,Computer science ,business.industry ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Semantic search ,020206 networking & telecommunications ,02 engineering and technology ,Lifelog ,Convolutional neural network ,Compass ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,media_common - Abstract
Large volumes of egocentric video data are being continually collected every day. While the standard video summarisation approach offers all-purpose summaries, here we propose a method for selective video summarisation. The user can query the video with an unlimited vocabulary of terms. The result is a time-tagged summary of keyframes related to the query concept. Our method uses a pre-trained Convolutional Neural Network (CNN) for the semantic search, and visualises the generated summary as a compass. Two commonly used datasets were chosen for the evaluation: UTEgo egocentric video and EDUB lifelog.
- Published
- 2018
- Full Text
- View/download PDF
14. On feature selection protocols for very low-sample-size data
- Author
-
Ludmila I. Kuncheva and Juan J. Rodríguez
- Subjects
0301 basic medicine ,Computer science ,Feature selection ,02 engineering and technology ,03 medical and health sciences ,Discriminative model ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Statistical hypothesis testing ,Informática ,Experimental protoco ,business.industry ,Pattern recognition ,Cross-validation ,Wide datasets ,030104 developmental biology ,Training/testing ,Sample size determination ,Signal Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,Software - Abstract
High-dimensional data with very few instances are typical in many application domains. Selecting a highly discriminative subset of the original features is often the main interest of the end user. The widely-used feature selection protocol for such type of data consists of two steps. First, features are selected from the data (possibly through cross-validation), and, second, a cross-validation protocol is applied to test a classifier using the selected features. The selected feature set and the testing accuracy are then returned to the user. For the lack of a better option, the same low-sample-size dataset is used in both steps. Questioning the validity of this protocol, we carried out an experiment using 24 high-dimensional datasets, three feature selection methods and five classifier models. We found that the accuracy returned by the above protocol is heavily biased, and therefore propose an alternative protocol which avoids the contamination by including both steps in a single cross-validation loop. Statistical tests verify that the classification accuracy returned by the proper protocol is significantly closer to the true accuracy (estimated from an independent testing set) compared to that returned by the currently favoured protocol., project RPG-2015-188 funded by The Leverhulme Trust, UK and by project TIN2015-67534-P (MINECO/FEDER, UE) funded by the Ministerio de Economía y Competitividad of the Spanish Government and European Union FEDER funds
- Published
- 2018
15. Instance Selection Improves Geometric Mean Accuracy: A Study on Imbalanced Data Classification
- Author
-
Ludmila I. Kuncheva, José-Francisco Díez-Pastor, Álvar Arnaiz-González, and Iain A. D. Gunn
- Subjects
FOS: Computer and information sciences ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Machine Learning (stat.ML) ,Computational intelligence ,02 engineering and technology ,Imbalanced data ,Machine Learning (cs.LG) ,Artificial Intelligence ,Statistics - Machine Learning ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Instance selection ,business.industry ,Pattern recognition ,Ensemble learning ,Computer Science - Learning ,True negative ,020201 artificial intelligence & image processing ,Artificial intelligence ,Geometric mean ,Benchmark data ,business ,Classifier (UML) ,62H30 - Abstract
A natural way of handling imbalanced data is to attempt to equalise the class frequencies and train the classifier of choice on balanced data. For two-class imbalanced problems, the classification success is typically measured by the geometric mean (GM) of the true positive and true negative rates. Here we prove that GM can be improved upon by instance selection, and give the theoretical conditions for such an improvement. We demonstrate that GM is non-monotonic with respect to the number of retained instances, which discourages systematic instance selection. We also show that balancing the distribution frequencies is inferior to a direct maximisation of GM. To verify our theoretical findings, we carried out an experimental study of 12 instance selection methods for imbalanced data, using 66 standard benchmark data sets. The results reveal possible room for new instance selection methods for imbalanced data., 11 pages, 7 figures
- Published
- 2018
16. Diversity techniques improve the performance of the best imbalance learning ensembles
- Author
-
José F. Díez-Pastor, César García-Osorio, Ludmila I. Kuncheva, and Juan J. Rodríguez
- Subjects
Information Systems and Management ,business.industry ,Computer science ,Credit card fraud ,Ensemble diversity ,computer.software_genre ,Machine learning ,Ensemble learning ,Computer Science Applications ,Theoretical Computer Science ,ComputingMethodologies_PATTERNRECOGNITION ,Ensembles of classifiers ,Artificial Intelligence ,Control and Systems Engineering ,Feature (machine learning) ,Data mining ,Artificial intelligence ,Medical diagnosis ,business ,computer ,Software ,Diversity (business) - Abstract
Many real-life problems can be described as unbalanced, where the number of instances belonging to one of the classes is much larger than the numbers in other classes. Examples are spam detection, credit card fraud detection or medical diagnosis. Ensembles of classifiers have acquired popularity in this kind of problems for their ability to obtain better results than individual classifiers. The most commonly used techniques by those ensembles especially designed to deal with imbalanced problems are for example Re-weighting, Oversampling and Undersampling. Other techniques, originally intended to increase the ensemble diversity, have not been systematically studied for their effect on imbalanced problems. Among these are Random Oracles, Disturbing Neighbors, Random Feature Weights or Rotation Forest. This paper presents an overview and an experimental study of various ensemble-based methods for imbalanced problems, the methods have been tested in its original form and in conjunction with several diversity-increasing techniques, using 84 imbalanced data sets from two well known repositories. This paper shows that these diversity-increasing techniques significantly improve the performance of ensemble methods for imbalanced problems and provides some ideas about when it is more convenient to use these diversifying techniques.
- Published
- 2015
- Full Text
- View/download PDF
17. Random Balance: Ensembles of variable priors classifiers for imbalanced data
- Author
-
Juan J. Rodríguez, José F. Díez-Pastor, César García-Osorio, and Ludmila I. Kuncheva
- Subjects
Information Systems and Management ,Training set ,Computer science ,business.industry ,computer.software_genre ,Machine learning ,Ensemble learning ,Management Information Systems ,Random subspace method ,Data set ,ComputingMethodologies_PATTERNRECOGNITION ,Ensembles of classifiers ,Artificial Intelligence ,AdaBoost ,Artificial intelligence ,Data mining ,business ,computer ,Software - Abstract
Proportions of the classes for each ensemble member are chosen randomly.Member training data: sub-sample and over-sample through SMOTE.RB-Boost combines Random Balance with AdaBoost.M2.Experiments with 86 data sets demonstrate the advantage of Random Balance. In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Imbalanced data sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to building ensembles of classifiers for two-class imbalanced data sets, called Random Balance. Each member of the Random Balance ensemble is trained with data sampled from the training set and augmented by artificial instances obtained using SMOTE. The novelty in the approach is that the proportions of the classes for each ensemble member are chosen randomly. The intuition behind the method is that the proposed diversity heuristic will ensure that the ensemble contains classifiers that are specialized for different operating points on the ROC space, thereby leading to larger AUC compared to other ensembles of classifiers. Experiments have been carried out to test the Random Balance approach by itself, and also in combination with standard ensemble methods. As a result, we propose a new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost.M2. This combination involves enforcing random class proportions in addition to instance re-weighting. Experiments with 86 imbalanced data sets from two well known repositories demonstrate the advantage of the Random Balance approach.
- Published
- 2015
- Full Text
- View/download PDF
18. Pattern Recognition and Classification
- Author
-
Christopher J. Whitaker and Ludmila I. Kuncheva
- Subjects
Computer Science::Machine Learning ,Probabilistic classification ,business.industry ,Supervised learning ,Linear classifier ,Pattern recognition ,Bayes classifier ,Quadratic classifier ,Machine learning ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Margin classifier ,Unsupervised learning ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Mathematics - Abstract
Pattern recognition concerns assigning objects to classes. The objects are described by features (variables or measurements) organized as p-dimensional points in some feature space. A classifier is a formula, an algorithm or a technique that can assign a class label to any given point in the feature space. Pattern recognition comprises supervised learning (predefined class labels) and unsupervised learning (unknown class labels). Supervised learning includes choosing a classifier model, training and testing the classifier using a data set and selecting the relevant features. Unsupervised learning is usually approached by cluster analysis. Multiple classifier systems or classifier ensembles are a recent branch of pattern recognition whereby the outputs of several classifiers (regarded as experts) are combined for improved accuracy. Keywords: classification; discriminant analysis; cluster analysis; feature selection; classifier ensembles
- Published
- 2015
- Full Text
- View/download PDF
19. Budget-Constrained Online Video Summarisation of Egocentric Video Using Control Charts
- Author
-
Clare E. Matthews, Paria Yousefi, and Ludmila I. Kuncheva
- Subjects
Reduction (complexity) ,Ground truth ,Information retrieval ,Video capture ,Computer science ,Event (computing) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,0202 electrical engineering, electronic engineering, information engineering ,020206 networking & telecommunications ,020201 artificial intelligence & image processing ,Control chart ,02 engineering and technology ,Online video - Abstract
Despite the existence of a large number of approaches for generating summaries from egocentric video, online video summarisation has not been fully explored yet. We present an online video summarisation algorithm to generate keyframe summaries during video capture. Event boundaries are identified using control charts and a keyframe is subsequently selected for each event. The number of keyframes is restricted from above which requires a constant review and possible reduction of the cumulatively built summary. The new method was compared against a baseline and a state-of-the-art online video summarisation methods. The evaluation was done on an egocentric video database (Activity of Daily Living (ADL)). Semantic content of the frames in the video was used to evaluate matches with ground truth. The summaries generated by the proposed method outperform those generated by the two competitors.
- Published
- 2018
- Full Text
- View/download PDF
20. Comparing keyframe summaries of egocentric videos: Closest-to-centroid baseline
- Author
-
Paria Yousefi, Ludmila I. Kuncheva, and Jurandy Almeida
- Subjects
Computer science ,business.industry ,Feature vector ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Centroid ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Convolutional neural network ,Visualization ,Feature (computer vision) ,Histogram ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Evaluation of keyframe video summaries is a notoriously difficult problem. So far, there is no consensus on guidelines, protocols, benchmarks and baseline models. This study contributes in three ways: (1) We propose a new baseline model for creating a keyframe summary, called Closest-to-Centroid, and show that it is a better contestant compared to the two most popular baselines: uniform sampling and choosing the mid-event frame. (2) We also propose a method for matching the visual appearance of keyframes, suitable for comparing summaries of egocentric videos and lifelogging photostreams. (3) We examine 24 image feature spaces (different descriptors) including colour, texture, shape, motion and a feature space extracted by a pre-trained convolutional neural network (CNN). Our results using the four egocentric videos in the UTE database favour low-level shape and colour feature spaces for use with CC.
- Published
- 2017
- Full Text
- View/download PDF
21. Restricted set classification: Who is there?
- Author
-
Juan J. Rodríguez, Aaron S. Jackson, and Ludmila I. Kuncheva
- Subjects
0209 industrial biotechnology ,Class (set theory) ,Posterior probability ,02 engineering and technology ,Image (mathematics) ,Set (abstract data type) ,020901 industrial engineering & automation ,Artificial Intelligence ,Pattern recognition ,0202 electrical engineering, electronic engineering, information engineering ,One-class classification ,Mathematics ,Informática ,Compound decision problem ,Object (computer science) ,Chess pieces classification ,Computer science ,Data set ,Signal Processing ,Pattern recognition (psychology) ,Object classification ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Algorithm ,Software ,Restricted set classification - Abstract
We consider a problem where a set X of N objects (instances) coming from c classes have to be classified simultaneously. A restriction is imposed on X in that the maximum possible number of objects from each class is known, hence we dubbed the problem who-is-there? We compare three approaches to this problem: (1) independent classification whereby each object is labelled in the class with the largest posterior probability; (2) a greedy approach which enforces the restriction; and (3) a theoretical approach which, in addition, maximises the likelihood of the label assignment, implemented through the Hungarian assignment algorithm. Our experimental study consists of two parts. The first part includes a custom-made chess data set where the pieces on the chess board must be recognised together from an image of the board. In the second part, we simulate the restricted set classification scenario using 96 datasets from a recently collated repository (University of Santiago de Compostela, USC). Our results show that the proposed approach (3) outperforms approaches (1) and (2)., Spanish Ministry of Economy and Competitiveness through project TIN 2015-67534-P
- Published
- 2017
22. PCA Feature Extraction for Change Detection in Multidimensional Unlabeled Data
- Author
-
Ludmila I. Kuncheva and William J. Faithfull
- Subjects
Contextual image classification ,Computer Networks and Communications ,business.industry ,Computer science ,Dimensionality reduction ,Feature extraction ,Pattern recognition ,Image segmentation ,computer.software_genre ,Computer Science Applications ,Artificial Intelligence ,Feature (computer vision) ,Principal component analysis ,Segmentation ,Data mining ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Software ,Change detection ,Feature detection (computer vision) - Abstract
When classifiers are deployed in real-world applications, it is assumed that the distribution of the incoming data matches the distribution of the data used to train the classifier. This assumption is often incorrect, which necessitates some form of change detection or adaptive classification. While there has been a lot of work on change detection based on the classification error monitored over the course of the operation of the classifier, finding changes in multidimensional unlabeled data is still a challenge. Here, we propose to apply principal component analysis (PCA) for feature extraction prior to the change detection. Supported by a theoretical example, we argue that the components with the lowest variance should be retained as the extracted features because they are more likely to be affected by a change. We chose a recently proposed semiparametric log-likelihood change detection criterion that is sensitive to changes in both mean and variance of the multidimensional distribution. An experiment with 35 datasets and an illustration with a simple video segmentation demonstrate the advantage of using extracted features compared to raw data. Further analysis shows that feature extraction through PCA is beneficial, specifically for data with multiple balanced classes.
- Published
- 2014
- Full Text
- View/download PDF
23. Change Detection in Streaming Multivariate Data Using Likelihood Detectors
- Author
-
Ludmila I. Kuncheva
- Subjects
Normalization (statistics) ,Multivariate statistics ,Receiver operating characteristic ,Computer science ,business.industry ,Detector ,Pattern recognition ,Computer Science Applications ,Computational Theory and Mathematics ,Artificial intelligence ,business ,Change detection ,Information Systems ,Statistical hypothesis testing - Abstract
Change detection in streaming data relies on a fast estimation of the probability that the data in two consecutive windows come from different distributions. Choosing the criterion is one of the multitude of questions that need to be addressed when designing a change detection procedure. This paper gives a log-likelihood justification for two well-known criteria for detecting change in streaming multidimensional data: Kullback-Leibler (K-L) distance and Hotelling's T-square test for equal means (H). We propose a semiparametric log-likelihood criterion (SPLL) for change detection. Compared to the existing log-likelihood change detectors, SPLL trades some theoretical rigor for computation simplicity. We examine SPLL together with K-L and H on detecting induced change on 30 real data sets. The criteria were compared using the area under the respective Receiver Operating Characteristic (ROC) curve (AUC). SPLL was found to be on the par with H and better than K-L for the nonnormalized data, and better than both on the normalized data.
- Published
- 2013
- Full Text
- View/download PDF
24. A Bound on Kappa-Error Diagrams for Analysis of Classifier Ensembles
- Author
-
Ludmila I. Kuncheva
- Subjects
business.industry ,Feature extraction ,Decision tree ,Pattern recognition ,Ensemble diversity ,Upper and lower bounds ,Computer Science Applications ,Diversity methods ,Computational Theory and Mathematics ,Pairwise comparison ,Artificial intelligence ,business ,Algorithm ,Classifier (UML) ,Kappa ,Information Systems ,Mathematics - Abstract
Kappa-error diagrams are used to gain insights about why an ensemble method is better than another on a given data set. A point on the diagram corresponds to a pair of classifiers. The x-axis is the pairwise diversity (kappa), and the y-axis is the averaged individual error. In this study, kappa is calculated from the 2 × 2 correct/wrong contingency matrix. We derive a lower bound on kappa which determines the feasible part of the kappa-error diagram. Simulations and experiments with real data show that there is unoccupied feasible space on the diagram corresponding to (hypothetical) better ensembles, and that individual accuracy is the leading factor in improving the ensemble accuracy.
- Published
- 2013
- Full Text
- View/download PDF
25. A concept-drift perspective on prototype selection and generation
- Author
-
Ludmila I. Kuncheva and Iain A. D. Gunn
- Subjects
Concept drift ,Computer science ,business.industry ,Data editing ,02 engineering and technology ,Machine learning ,computer.software_genre ,Discriminative model ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Adaptive learning ,Artificial intelligence ,business ,computer ,Classifier (UML) - Abstract
This study brings together systematised views of two related areas: data editing for the nearest neighbour classifier and adaptive learning in the presence of concept drift. The growing number of studies in the intersection of these areas warrants a closer look. We revise and update the taxonomies of the two areas proposed in the literature and argue that they are not sufficiently discriminative with respect to methods for prototype selection and prototype generation in the presence of concept drift. We proceed to create a bespoke taxonomy of these methods and illustrate it with ten examples from the literature. The new taxonomy can serve as a road-map for researching the intersection area and inform the development of new methods.
- Published
- 2016
- Full Text
- View/download PDF
26. Interval feature extraction for classification of event-related potentials (ERP) in EEG data analysis
- Author
-
Ludmila I. Kuncheva and Juan J. Rodríguez
- Subjects
Computer science ,business.industry ,Feature extraction ,Pattern recognition ,Perceptron ,computer.software_genre ,Independent component analysis ,Random forest ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Wavelet ,Artificial Intelligence ,Principal component analysis ,Artificial intelligence ,Data mining ,business ,Classifier (UML) ,computer - Abstract
Event-related potential data can be used to index perceptual and cognitive operations. However, they are typically high-dimensional and noisy. This study examines the original raw data and six feature-extraction methods as a pre-processing step before classification. Four traditionally used feature-extraction methods were considered: principal component analysis, independent component analysis, auto-regression, and wavelets. We add to these a less well-known method called interval feature extraction. It overproduces features from the ERP signal and then eliminates irrelevant and redundant features by the fast correlation-based filter. To make the comparisons fair, the other feature-extraction methods were also run with the filter. An experiment on two EEG datasets (four classification scenarios) was carried out to examine the classification accuracy of four classifiers on the extracted features: support vector machines with linear and perceptron kernel, the nearest neighbour classifier and the random forest ensemble method. The interval features led to the best classification accuracy in most of the configurations, specifically when used with the Random Forest classifier ensemble.
- Published
- 2012
- Full Text
- View/download PDF
27. A weighted voting framework for classifiers ensembles
- Author
-
Ludmila I. Kuncheva and Juan J. Rodríguez
- Subjects
Weighted majority vote ,Randomized weighted majority algorithm ,Majority rule ,business.industry ,Small number ,Weighted voting ,Pattern recognition ,Human-Computer Interaction ,Naive Bayes classifier ,Artificial Intelligence ,Hardware and Architecture ,Artificial intelligence ,Probabilistic framework ,business ,Classifier (UML) ,Software ,Information Systems ,Mathematics - Abstract
We propose a probabilistic framework for classifier combination, which gives rigorous optimality conditions (minimum classification error) for four combination methods: majority vote, weighted majority vote, recall combiner and the naive Bayes combiner. The framework is based on two assumptions: class-conditional independence of the classifier outputs and an assumption about the individual accuracies. The four combiners are derived subsequently from one another, by progressively relaxing and then eliminating the second assumption. In parallel, the number of the trainable parameters increases from one combiner to the next. Simulation studies reveal that if the parameter estimates are accurate and the first assumption is satisfied, the order of preference of the combiners is: naive Bayes, recall, weighted majority and majority. By inducing label noise, we expose a caveat coming from the stability-plasticity dilemma. Experimental results with 73 benchmark data sets reveal that there is no definitive best combiner among the four candidates, giving a slight preference to naive Bayes. This combiner was better for problems with a large number of fairly balanced classes while weighted majority vote was better for problems with a small number of unbalanced classes.
- Published
- 2012
- Full Text
- View/download PDF
28. Naive random subspace ensemble with linear classifiers for real-time classification of fMRI data
- Author
-
Stephen J. Johnston, Ludmila I. Kuncheva, Catrin Plumpton, and Nikolaas N. Oosterhof
- Subjects
Ground truth ,medicine.diagnostic_test ,business.industry ,Computer science ,Pattern recognition ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Signal Processing ,medicine ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Neurofeedback ,business ,Real time classification ,Functional magnetic resonance imaging ,Classifier (UML) ,Software ,Subspace topology - Abstract
Functional magnetic resonance imaging (fMRI) provides a spatially accurate measure of brain activity. Real-time classification allows the use of fMRI in neurofeedback experiments. With limited labelled data available, a fixed pre-trained classifier may be inaccurate. We propose that streaming fMRI data may be classified using a classifier ensemble which is updated through naive labelling. Naive labelling is a protocol where in the absence of ground truth, updates are carried out using the label assigned by the classifier. We perform experiments on three fMRI datasets to demonstrate that naive labelling is able to improve upon a pre-trained initial classifier.
- Published
- 2012
- Full Text
- View/download PDF
29. A spatial discrepancy measure between voxel sets in brain imaging
- Author
-
Kenneth S. L. Yuen, David Martínez-Rego, David Edmund Johannes Linden, Stephen J. Johnston, and Ludmila I. Kuncheva
- Subjects
business.industry ,Pattern recognition ,Voxel-based morphometry ,computer.software_genre ,Measure (mathematics) ,Neuroimaging ,Voxel ,Distortion ,Signal Processing ,Pattern recognition (psychology) ,Medical imaging ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Set (psychology) ,computer ,Mathematics - Abstract
Functional Magnetic Resonance Imaging serves to identify networks and regions in the brain engaged in vari- ous mental activities, represented as a set of voxels in the 3D image. It is important to be able to measure how similar two selected voxel sets are. The major flaw of the currently used correlation-based and overlap-based measures is that they disregard the spatial proximity of the selected voxel sets. Here, we propose a measure for comparing two voxel sets, called Spatial Discrepancy, based upon the average Haus- dorff distance. We demonstrate that Spatial Discrepancy can detect genuine similarities and differences where other com- monly used measures fail to do so. A simulation experiment was carried out where distorted copies of the same voxel sets were compared, varying the level of distortion. The exper- iment revealed that the proposed measure correlates better with the level of distortion than any of the other measures. Data from a 10-subject experiment were used to demonstrate the advantages of the Spatial Discrepancy measure in multi- subject studies.
- Published
- 2012
- Full Text
- View/download PDF
30. Classifier Ensemble Methods for Diagnosing COPD from Volatile Organic Compounds in Exhaled Air
- Author
-
Christopher Phillips, Juan J. Rodríguez, Keir Lewis, Yasir Syed, and Ludmila I. Kuncheva
- Subjects
Spirometry ,medicine.medical_specialty ,COPD ,medicine.diagnostic_test ,business.industry ,Pulmonary disease ,Physical examination ,Airflow obstruction ,medicine.disease ,Ensemble learning ,Exhaled air ,respiratory tract diseases ,Emergency medicine ,medicine ,business ,Classifier (UML) ,Simulation - Abstract
The diagnosis of Chronic Obstructive Pulmonary Disease (COPD) is based on symptoms, clinical examination, exposure to risk factors (smoking and certain occupational dusts) and confirming lung airflow obstruction (on spirometry). However, most people with COPD remain undiagnosed and controversies regarding spirometry persist. Developing accurate and reliable automated tests for the early diagnosis of COPD would aid successful management. We evaluated the diagnostic potential of a non-invasive test of chemical analysis (volatile organic compounds - VOCs) from exhaled breath. We applied 26 individual classifier methods and 30 state-of-the-art classifier ensemble methods to a large VOC data set from 109 patients with COPD and 63 healthy controls of similar age; we evaluated the classification error, the F measure and the area under the ROC curve (AUC). The results show that classifying the VOCs leads to substantial gain over chance but of varying accuracy. We found that Rotation Forest ensemble (AUC 0.825) had the highest accuracy for COPD classification from exhaled VOCs.
- Published
- 2012
- Full Text
- View/download PDF
31. Classifier ensembles for fMRI data analysis: an experiment
- Author
-
Ludmila I. Kuncheva and Juan J. Rodríguez
- Subjects
Computer science ,Statistics as Topic ,Biomedical Engineering ,Biophysics ,Machine learning ,computer.software_genre ,Pattern Recognition, Automated ,Artificial Intelligence ,Humans ,Computer Simulation ,Radiology, Nuclear Medicine and imaging ,Brain Mapping ,Probabilistic classification ,Structured support vector machine ,Computers ,business.industry ,Brain ,Reproducibility of Results ,Bayes Theorem ,Signal Processing, Computer-Assisted ,Pattern recognition ,Quadratic classifier ,Image Enhancement ,Magnetic Resonance Imaging ,Ensemble learning ,Random subspace method ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Margin classifier ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Algorithms - Abstract
Functional magnetic resonance imaging (fMRI) is becoming a forefront brain–computer interface tool. To decipher brain patterns, fast, accurate and reliable classifier methods are needed. The support vector machine (SVM) classifier has been traditionally used. Here we argue that state-of-the-art methods from pattern recognition and machine learning, such as classifier ensembles, offer more accurate classification. This study compares 18 classification methods on a publicly available real data set due to Haxby et al. [ Science 293 (2001) 2425–2430]. The data comes from a single-subject experiment, organized in 10 runs where eight classes of stimuli were presented in each run. The comparisons were carried out on voxel subsets of different sizes, selected through seven popular voxel selection methods. We found that, while SVM was robust, accurate and scalable, some classifier ensemble methods demonstrated significantly better performance. The best classifiers were found to be the random subspace ensemble of SVM classifiers, rotation forest and ensembles with random linear and random spherical oracle.
- Published
- 2010
- Full Text
- View/download PDF
32. On the window size for classification in changing environments
- Author
-
Ludmila I. Kuncheva and Indrė Žliobaitė
- Subjects
Concept drift ,business.industry ,Gaussian ,aintel ,Word error rate ,Moving window ,Pattern recognition ,Theoretical Computer Science ,symbols.namesake ,Artificial Intelligence ,Streaming data ,symbols ,Probability distribution ,Electricity market ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,Mathematics - Abstract
Classification in changing environments (commonly known as concept drift) requires adaptation of the classifier to accommodate the\ud changes. One approach is to keep a moving window on the streaming data and constantly update the classifier on it. Here we consider an\ud abrupt change scenario where one set of probability distributions of the classes is instantly replaced with another. For a fixed ‘transition\ud period’ around the change, we derive a generic relationship between the size of the moving window and the classification error rate. We\ud derive expressions for the error in the transition period and for the optimal window size for the case of two Gaussian classes where the\ud concept change is a geometrical displacement of the whole class configuration in the space. A simple window resize strategy based\ud on the derived relationship is proposed and compared with fixed-size windows on a real benchmark data set data set (Electricity Market).
- Published
- 2009
- Full Text
- View/download PDF
33. Stability of Kerogen Classification with Regard to Image Segmentation
- Author
-
Ik Soo Lim, James Charles, Ludmila I. Kuncheva, and B. Wells
- Subjects
Ground truth ,business.industry ,Stability (learning theory) ,Mineralogy ,Pattern recognition ,Image segmentation ,chemistry.chemical_compound ,Range (mathematics) ,Mathematics (miscellaneous) ,Inertinite ,chemistry ,Robustness (computer science) ,Kerogen ,General Earth and Planetary Sciences ,Segmentation ,Artificial intelligence ,business ,Mathematics - Abstract
This paper investigates the stability of an automatic system for classifying kerogen material from images of sieved rock samples. The system comprises four stages: image acquisition, background removal, segmentation, and classification of the segmented kerogen pieces as either inertinite or vitrinite. Depending upon a segmentation parameter d, called “overlap”, touching pieces of kerogen may be split differently. The aim of this study is to establish how robust the classification result is to variations of the segmentation parameter. There are two issues that pose difficulties in carrying out an experiment. First, even a trained professional may be uncertain when distinguishing between isolated pieces of inertinite and vitrinite, extracted from transmitted-light microscope images. Second, because manual labelling of large amount of data for training the system is an arduous task, we acquired the true labels (ground truth) only for the pieces obtained at overlap d=0.5. To construct ground truth for various values of d we propose here label-inheritance trees. With thus estimated ground truth, an experiment was carried out to evaluate the robustness of the system to changes in the segmentation through varying the overlap value d. The average system accuracy across values of d spanning the range from 0 to 1 was 86.5%, which is only slightly lower than the accuracy of the system at the design value of d=0.5 (89.07%).
- Published
- 2009
- Full Text
- View/download PDF
34. A case-study on naïve labelling for the nearest mean and the linear discriminant classifiers
- Author
-
A. Narasimhamurthy, Christopher J. Whitaker, and Ludmila I. Kuncheva
- Subjects
business.industry ,Gaussian ,Supervised learning ,Pattern recognition ,Semi-supervised learning ,Quadratic classifier ,Machine learning ,computer.software_genre ,Linear discriminant analysis ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Labelling ,Signal Processing ,symbols ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Adaptive learning ,business ,computer ,Classifier (UML) ,Software ,Mathematics - Abstract
The abundance of unlabelled data alongside limited labelled data has provoked significant interest in semi-supervised learning methods. ''Naive labelling'' refers to the following simple strategy for using unlabelled data in on-line classification. A new data point is first labelled by the current classifier and then added to the training set together with the assigned label. The classifier is updated before seeing the subsequent data point. Although the danger of a run-away classifier is obvious, versions of naive labelling pervade in on-line adaptive learning. We study the asymptotic behaviour of naive labelling in the case of two Gaussian classes and one variable. The analysis shows that if the classifier model assumes correctly the underlying distribution of the problem, naive labelling will drive the parameters of the classifier towards their optimal values. However, if the model is not guessed correctly, the benefits are outweighed by the instability of the labelling strategy (run-away behaviour of the classifier). The results are based on exact calculations of the point of convergence, simulations, and experiments with 25 real data sets. The findings in our study are consistent with concerns about general use of unlabelled data, flagged up in the recent literature.
- Published
- 2008
- Full Text
- View/download PDF
35. Object segmentation within microscope images of palynofacies
- Author
-
James Charles, Ludmila I. Kuncheva, B. Wells, and Ik Soo Lim
- Subjects
Data processing ,Microscope ,Basis (linear algebra) ,business.industry ,Computer science ,Image processing ,Object (computer science) ,law.invention ,Sieve ,Inertinite ,law ,Segmentation ,Computer vision ,Artificial intelligence ,Computers in Earth Sciences ,business ,Information Systems - Abstract
Identification of fossil material under a microscope is the basis of micropalentology. Our task is to locate and count the pieces of inertinite and vitrinite in images of sieve sampled rock. The classical watershed algorithm oversegments the objects because of their irregular shapes. In this paper we propose a method for locating multiple objects in a black and white image while accounting for possible overlapping or touching. The method, called Centre Supported Segmentation (CSS), eliminates oversegmentation and is robust against differences in size and shape of the objects.
- Published
- 2008
- Full Text
- View/download PDF
36. Automated Kerogen Classification in Microscope Images of Dispersed Kerogen Preparation
- Author
-
B. Wells, A. Collins, Ik Soo Lim, N. Miles, Ludmila I. Kuncheva, and James Charles
- Subjects
business.industry ,Computer science ,Image processing ,Pattern recognition ,Perceptron ,Random forest ,Support vector machine ,Naive Bayes classifier ,Mathematics (miscellaneous) ,General Earth and Planetary Sciences ,Artificial intelligence ,AdaBoost ,business ,Classifier (UML) ,LogitBoost - Abstract
We develop the classification part of a system that analyses transmitted light microscope images of dispersed kerogen preparation. The system automatically extracts kerogen pieces from the image and labels each piece as either inertinite or vitrinite. The image pre-processing analysis consists of background removal, identification of kerogen material, object segmentation, object extraction (individual images of pieces of kerogen) and feature calculation for each object. An expert palynologist was asked to label the objects into categories inertinite and vitrinite, which provided the ground truth for the classification experiment. Ten state-of-the-art classifiers and classifier ensembles were compared: Naive Bayes, decision tree, nearest neighbour, the logistic classifier, multilayered perceptron (MLP), support vector machines (SVM), AdaBoost, Bagging, LogitBoost and Random Forest. The logistic classifier was singled out as the most accurate classifier, with an accuracy greater than 90. Using a 10 times 10-fold cross-validation provided within the Weka software, we found that the logistic classifier was significantly better than five classifiers (p
- Published
- 2008
- Full Text
- View/download PDF
37. Diagnosing scrapie in sheep: A classification experiment
- Author
-
Ludmila I. Kuncheva, Victor J. Del Rio Vilas, and Juan J. Rodríguez
- Subjects
Sheep ,Databases, Factual ,Receiver operating characteristic ,business.industry ,Health Informatics ,Scrapie ,Machine learning ,computer.software_genre ,United Kingdom ,Computer Science Applications ,Random forest ,ROC Curve ,Animals ,Classification methods ,Computer Simulation ,Diagnosis, Computer-Assisted ,AdaBoost ,Artificial intelligence ,business ,computer - Abstract
Scrapie is a neuro-degenerative disease in small ruminants. A data set of 3113 records of sheep reported to the Scrapie Notifications Database in Great Britain has been studied. Clinical signs were recorded as present/absent in each animal by veterinary officials (VO) and a post-mortem diagnosis was made. In an attempt to detect healthy animals within the set of suspects using only the clinical signs, 18 classification methods were applied ranging from simple linear classifiers to classifier ensembles such as Bagging, AdaBoost and Random Forests. The results suggest that the clinical classification by the VO was adequate as no further differentiation within the set of suspects was feasible.
- Published
- 2007
- Full Text
- View/download PDF
38. Classifier Ensembles with a Random Linear Oracle
- Author
-
Ludmila I. Kuncheva and Juan J. Rodríguez
- Subjects
Ensemble forecasting ,business.industry ,Computer science ,Decision tree ,Pattern recognition ,Machine learning ,computer.software_genre ,Sensor fusion ,Ensemble learning ,Oracle ,Computer Science Applications ,Computational Theory and Mathematics ,Hyperplane ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Subspace topology ,Information Systems - Abstract
We propose a combined fusion-selection approach to classifier ensemble design. Each classifier in the ensemble is replaced by a miniensemble of a pair of subclassifiers with a random linear oracle to choose between the two. It is argued that this approach encourages extra diversity in the ensemble while allowing for high accuracy of the individual ensemble members. Experiments were carried out with 35 data sets from UCI and 11 ensemble models. Each ensemble model was examined with and without the oracle. The results showed that all ensemble methods benefited from the new approach, most markedly so random subspace and bagging. A further experiment with seven real medical data sets demonstrates the validity of these findings outside the UCI data collection
- Published
- 2007
- Full Text
- View/download PDF
39. Theoretical and Empirical Criteria for the Edited Nearest Neighbour Classifier
- Author
-
Mikel Galar and Ludmila I. Kuncheva
- Subjects
business.industry ,Computer science ,Bayesian probability ,Probabilistic logic ,Nearest neighbour ,Bayes classifier ,Machine learning ,computer.software_genre ,Artificial intelligence ,Data mining ,business ,Voronoi diagram ,computer ,Classifier (UML) - Abstract
We aim to dispel the blind faith in theoretical criteria for optimisation of the edited nearest neighbour classifier and its version called the Voronoi classifier. Three criteria from past and recent literature are considered: two bounds using Vapnik-Chervonenkis (VC) dimension and a probabilistic criterion derived by a Bayesian approach. We demonstrate the shortcomings of these criteria for selecting the best reference set, and summarise alternative empirical criteria found in the literature.
- Published
- 2015
- Full Text
- View/download PDF
40. Rotation Forest: A New Classifier Ensemble Method
- Author
-
Juan J. Rodríguez, Carlos J. Alonso, and Ludmila I. Kuncheva
- Subjects
Computer science ,Feature extraction ,Decision tree ,Information Storage and Retrieval ,computer.software_genre ,Sensitivity and Specificity ,Pattern Recognition, Automated ,Artificial Intelligence ,Cluster Analysis ,Computer Simulation ,AdaBoost ,Principal Component Analysis ,Models, Statistical ,Training set ,Ensemble forecasting ,business.industry ,Applied Mathematics ,Supervised learning ,Reproducibility of Results ,Numerical Analysis, Computer-Assisted ,Pattern recognition ,Random forest ,Computational Theory and Mathematics ,Principal component analysis ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Data mining ,business ,Classifier (UML) ,computer ,Algorithms ,Software - Abstract
We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest." Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the Rotation Forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with Bagging, AdaBoost, and Random Forest. The results were favorable to Rotation Forest and prompted an investigation into diversity-accuracy landscape of the ensemble models. Diversity-error diagrams revealed that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest, and more diverse than these in Bagging, sometimes more accurate as well.
- Published
- 2006
- Full Text
- View/download PDF
41. Moderate diversity for better cluster ensembles
- Author
-
Ludmila Todorova, Ludmila I. Kuncheva, and Stefan Hadjitodorov
- Subjects
business.industry ,Rand index ,Diversity measure ,Pattern recognition ,Measure (mathematics) ,Hardware and Architecture ,Signal Processing ,Pattern recognition (psychology) ,Cluster (physics) ,Artificial intelligence ,Cluster analysis ,business ,Software ,Selection (genetic algorithm) ,Information Systems ,Diversity (business) ,Mathematics - Abstract
Adjusted Rand index is used to measure diversity in cluster ensembles and a diversity measure is subsequently proposed. Although the measure was found to be related to the quality of the ensemble, this relationship appeared to be non-monotonic. In some cases, ensembles which exhibited a moderate level of diversity gave a more accurate clustering. Based on this, a procedure for building a cluster ensemble of a chosen type is proposed (assuming that an ensemble relies on one or more random parameters): generate a small random population of cluster ensembles, calculate the diversity of each ensemble and select the ensemble corresponding to the median diversity. We demonstrate the advantages of both our measure and procedure on 5 data sets and carry out statistical comparisons involving two diversity measures for cluster ensembles from the recent literature. An experiment with 9 data sets was also carried out to examine how the diversity-based selection procedure fares on ensembles of various sizes. For these experiments the classification accuracy was used as the performance criterion. The results suggest that selection by median diversity is no worse and in some cases is better than building and holding on to one ensemble.
- Published
- 2006
- Full Text
- View/download PDF
42. ROC curves and video analysis optimization in intestinal capsule endoscopy
- Author
-
Petia Radeva, Ludmila I. Kuncheva, and Fernando Vilariño
- Subjects
Receiver operating characteristic ,business.industry ,education ,Pattern recognition ,Inspection time ,Machine learning ,computer.software_genre ,Ensemble learning ,law.invention ,Artificial Intelligence ,Capsule endoscopy ,law ,Signal Processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Software ,Mathematics - Abstract
Wireless capsule endoscopy involves inspection of hours of video material by a highly qualified professional. Time episodes corresponding to intestinal contractions, which are of interest to the physician constitute about 1% of the video. The problem is to label automatically time episodes containing contractions so that only a fraction of the video needs inspection. As the classes of contraction and non-contraction images in the video are largely imbalanced, ROC curves are used to optimize the trade-off between false positive and false negative rates. Classifier ensemble methods and simple classifiers were examined. Our results reinforce the claims from recent literature that classifier ensemble methods specifically designed for imbalanced problems have substantial advantages over simple classifiers and standard classifier ensembles. By using ROC curves with the bagging ensemble method the inspection time can be drastically reduced at the expense of a small fraction of missed contractions.
- Published
- 2006
- Full Text
- View/download PDF
43. On the optimality of Naïve Bayes with dependent binary features
- Author
-
Ludmila I. Kuncheva
- Subjects
business.industry ,Binary number ,Pattern recognition ,Bayes classifier ,Combinatorics ,Naive Bayes classifier ,Artificial Intelligence ,Signal Processing ,Statistics::Methodology ,Bayes error rate ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,Mathematics - Abstract
While Naive Bayes classifier (NB) is Bayes-optimal for independent features, we prove that it is also optimal for two equiprobable classes and two features with equal class-conditional covariances. Although strict optimality does not extend for three features, equal covariances are expected to be beneficial in higher-dimensional spaces.
- Published
- 2006
- Full Text
- View/download PDF
44. Selection Of Independent Binary Features Using Probabilities: An Example From Veterinary Medicine
- Author
-
Zoe Hoare, Ludmila I. Kuncheva, and Peter D. Cockcroft
- Subjects
Statistics and Probability ,Binary number ,Table (database) ,Feature selection ,Data mining ,Statistics, Probability and Uncertainty ,computer.software_genre ,Mutually exclusive events ,computer ,Selection (genetic algorithm) ,Mathematics - Abstract
Supervised classification into c mutually exclusive classes based on n binary features is considered. The only information available is an n×c table with probabilities. Knowing that the best d features are not the d best, simulations were run for 4 feature selection methods and an application to diagnosing BSE in cattle and Scrapie in sheep is presented.
- Published
- 2005
- Full Text
- View/download PDF
45. An ensemble-based method for linear feature extraction for two-class problems
- Author
-
Ludmila I. Kuncheva, David Masip, and Jordi Vitrià
- Subjects
Boosting (machine learning) ,business.industry ,Dimensionality reduction ,Feature extraction ,Kanade–Lucas–Tomasi feature tracker ,Pattern recognition ,Linear discriminant analysis ,Artificial Intelligence ,Sample size determination ,Computer Vision and Pattern Recognition ,AdaBoost ,Artificial intelligence ,business ,Mathematics ,Curse of dimensionality - Abstract
In this paper we propose three variants of a linear feature extraction technique based on Adaboost for two-class classification problems. Unlike other feature extraction techniques, we do not make any assumptions about the distribution of the data. At each boosting step we select from a pool of linear projections the one that minimizes the weighted error. We propose three different variants of the feature extraction algorithm, depending on the way the pool of individual projections is constructed. Using nine real and two artificial data sets of different original dimensionality and sample size we compare the performance of the three proposed techniques with three classical techniques for linear feature extraction: Fisher linear discriminant analysis (FLD), Nonparametric discriminant analysis (NDA) and a recently proposed feature extraction method for heteroscedastic data based on the Chernoff criterion. Our results show that for data sets of relatively low-original dimensionality FLD appears to be both the most accurate and the most economical feature extraction method (giving just one-dimension in the case of two classes). The techniques based on Adaboost fare better than the classical techniques for data sets of large original dimensionality.
- Published
- 2005
- Full Text
- View/download PDF
46. Using diversity measures for generating error-correcting output codes in classifier ensembles
- Author
-
Ludmila I. Kuncheva
- Subjects
business.industry ,Computer science ,Evolutionary algorithm ,Hamming distance ,Pattern recognition ,Machine learning ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Signal Processing ,Margin classifier ,Error correcting ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Software - Abstract
Error-correcting output codes (ECOC) are used to design diverse classifier ensembles. Diversity within ECOC is traditionally measured by Hamming distance. Here we argue that this measure is insufficient for assessing the quality of code for the purposes of building accurate ensembles. We propose to use diversity measures from the literature on classifier ensembles and suggest an evolutionary algorithm to construct the code.
- Published
- 2005
- Full Text
- View/download PDF
47. Multiple Classifier Systems
- Author
-
Ludmila I. Kuncheva
- Subjects
Computer science ,business.industry ,Speech recognition ,Margin classifier ,Pattern recognition ,Artificial intelligence ,Quadratic classifier ,business ,Multiple classifier - Published
- 2004
- Full Text
- View/download PDF
48. Fusion of Continuous‐Valued Outputs
- Author
-
Ludmila I. Kuncheva
- Subjects
Algebra ,Fusion ,Mathematical optimization ,Generalized mean ,Mathematics - Published
- 2004
- Full Text
- View/download PDF
49. Fusion of Label Outputs
- Author
-
Ludmila I. Kuncheva
- Subjects
Weighted majority vote ,Naive Bayes classifier ,Majority rule ,Fusion ,Computer science ,business.industry ,Pattern recognition ,Artificial intelligence ,business - Published
- 2004
- Full Text
- View/download PDF
50. Theoretical Views and Results
- Author
-
Ludmila I. Kuncheva
- Subjects
business.industry ,Artificial intelligence ,Machine learning ,computer.software_genre ,business ,computer ,Mathematics - Published
- 2004
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.