Start Over

A fast and flexible instance selection algorithm adapted to non-trivial database sizes

Authors :: Rachid Harba
Frédéric Ros
Serge Guillaume
Marco Pintore
Laboratoire Pluridisciplinaire de Recherche en Ingénierie des Systèmes, Mécanique et Energétique (PRISME)
Université d'Orléans (UO)-Ecole Nationale Supérieure d'Ingénieurs de Bourges (ENSI Bourges)
aucun
PILA
Information – Technologies – Analyse Environnementale – Procédés Agricoles (UMR ITAP)
Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro)
Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)
Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro)-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA)
Source :: Intelligent Data Analysis, Intelligent Data Analysis, IOS Press, 2015, 19 (3), pp.631-658. ⟨10.3233/IDA-150736⟩
Publication Year :: 2015
Publisher :: IOS Press, 2015.
Abstract: International audience; In this paper, a new instance selection algorithm is proposed in the context of classification to manage non-trivial database sizes. The algorithm is hybrid and runs with only a few parameters that directly control the balance between the three objectives of classification, i.e. errors, storage requirements and runtime. It comprises different mechanisms involving neighborhood and stratification algorithms that specifically speed up the runtime without significantly degrading efficiency. Instead of applying an IS (Instance Selection) algorithm to the whole database, IS is applied to strata deriving from the regions, each region representing a set of patterns selected from the original training set. The application of IS is conditioned by the purity of each region (i.e. the extent to which different categories of patterns are mixed in the region) and the stratification strategy is adapted to the region components. For each region, the number of delivered instances is firstly limited via the use of an iterative process that takes into account the boundary complexity, and secondly optimized by removing the superfluous ones. The sets of instances determined from all the regions are put together to provide an intermediate instance set that undergoes a dedicated filtering process to deliver the final set. Experiments performed with various synthetic and real data sets demonstrate the advantages of the proposed approach.

Subjects :: K-NEAREST NEIGHBORS
SYNTHETIC AND REAL DATA
SUPERVISED CLASSIFICATION
Computer science
Iterative method
Nearest neighbor search
NEAREST NEIGHBOR SEARCH
Boundary (topology)
Context (language use)
02 engineering and technology
computer.software_genre
CLASSIFICATION
Theoretical Computer Science
Set (abstract data type)
CLASSIFICATION (OF INFORMATION)
DIGITAL STORAGE
ALGORITHME
Artificial Intelligence
020204 information systems
INSTANCE SELECTION
0202 electrical engineering, electronic engineering, information engineering
CLUSTERING ALGORITHM
Cluster analysis
ITERATIVE PROCESS
Iterative and incremental development
Database
ITERATIVE METHODS
ALGORITHMS
Process (computing)
DIFFERENT MECHANISMS
BASE DE DONNEES
CLUSTERING ALGORITHMS
DATABASE SYSTEMS
INFORMATIQUE
STORAGE REQUIREMENTS
[SDE]Environmental Sciences
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
Data mining
FILTERING PROCESS
computer
Algorithm

Details

ISSN :: 15714128 and 1088467X
Volume :: 19
Database :: OpenAIRE
Journal :: Intelligent Data Analysis
Accession number :: edsair.doi.dedup.....6b68a4067dfa67f6da9be21abf1a30f6
Full Text :: https://doi.org/10.3233/ida-150736

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A fast and flexible instance selection algorithm adapted to non-trivial database sizes

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A fast and flexible instance selection algorithm adapted to non-trivial database sizes

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources