Back to Search
Start Over
A fast and flexible instance selection algorithm adapted to non-trivial database sizes
- Source :
- Intelligent Data Analysis, Intelligent Data Analysis, IOS Press, 2015, 19 (3), pp.631-658. ⟨10.3233/IDA-150736⟩
- Publication Year :
- 2015
- Publisher :
- IOS Press, 2015.
-
Abstract
- International audience; In this paper, a new instance selection algorithm is proposed in the context of classification to manage non-trivial database sizes. The algorithm is hybrid and runs with only a few parameters that directly control the balance between the three objectives of classification, i.e. errors, storage requirements and runtime. It comprises different mechanisms involving neighborhood and stratification algorithms that specifically speed up the runtime without significantly degrading efficiency. Instead of applying an IS (Instance Selection) algorithm to the whole database, IS is applied to strata deriving from the regions, each region representing a set of patterns selected from the original training set. The application of IS is conditioned by the purity of each region (i.e. the extent to which different categories of patterns are mixed in the region) and the stratification strategy is adapted to the region components. For each region, the number of delivered instances is firstly limited via the use of an iterative process that takes into account the boundary complexity, and secondly optimized by removing the superfluous ones. The sets of instances determined from all the regions are put together to provide an intermediate instance set that undergoes a dedicated filtering process to deliver the final set. Experiments performed with various synthetic and real data sets demonstrate the advantages of the proposed approach.
- Subjects :
- K-NEAREST NEIGHBORS
SYNTHETIC AND REAL DATA
SUPERVISED CLASSIFICATION
Computer science
Iterative method
Nearest neighbor search
NEAREST NEIGHBOR SEARCH
Boundary (topology)
Context (language use)
02 engineering and technology
computer.software_genre
CLASSIFICATION
Theoretical Computer Science
Set (abstract data type)
CLASSIFICATION (OF INFORMATION)
DIGITAL STORAGE
ALGORITHME
Artificial Intelligence
020204 information systems
INSTANCE SELECTION
0202 electrical engineering, electronic engineering, information engineering
CLUSTERING ALGORITHM
Cluster analysis
ITERATIVE PROCESS
Iterative and incremental development
Database
ITERATIVE METHODS
ALGORITHMS
Process (computing)
DIFFERENT MECHANISMS
BASE DE DONNEES
CLUSTERING ALGORITHMS
DATABASE SYSTEMS
INFORMATIQUE
STORAGE REQUIREMENTS
[SDE]Environmental Sciences
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
Data mining
FILTERING PROCESS
computer
Algorithm
Subjects
Details
- ISSN :
- 15714128 and 1088467X
- Volume :
- 19
- Database :
- OpenAIRE
- Journal :
- Intelligent Data Analysis
- Accession number :
- edsair.doi.dedup.....6b68a4067dfa67f6da9be21abf1a30f6
- Full Text :
- https://doi.org/10.3233/ida-150736