Start Over

[Untitled]

Authors :: Thierry Pun
Wolfgang Müller
Stéphane Marchand-Maillet
Henning Müller
David McG. Squire
Source :: Multimedia Tools and Applications. 21:55-73
Publication Year :: 2003
Publisher :: Springer Science and Business Media LLC, 2003.
Abstract: Content-based image retrieval (CBIR) has been a very active research area for more than ten years. In the last few years the number of publications and retrieval systems produced has become larger and larger. Despite this, there is still no agreed objective way in which to compare the performance of any two of these systems. This fact is blocking the further development of the field since good or promising techniques can not be identified objectively, and the potential commercial success of CBIR systems is hindered because it is hard to establish the quality of an application. We are thus in the position in which other research areas, such as text retrieval or the database systems, found themselves several years ago. To have serious applications, as well as commercial success, objective proof of system quality is needed: in text retrieval the TREC benchmark is a widely accepted performance measures in the transaction processing field for databases it is the TPC benchmark that has wide support. This paper describes a framework that enables the creation of a benchmark for CBIR. Parts of this framework have already been developed and systems can be evaluated against a small, freely-available database via a web interface. Much work remains to be done with respect to making available large, diverse image databases and obtaining relevance judgments for those large databases. We also need to establish an independent body, accepted by the entire community, that would organize a benchmarking event, give out official results and update the benchmark regularly. The Benchathlon could get this role if it manages to gain the confidence of the field. This should also prevent the negative effects, e.g., “benchmarketing”, experienced with other benchmarks, such as the TPC predecessors. This paper sets out our ideas for an open framework for performance evaluation. We hope to stimulate discussion on evaluation in image retrieval so that systems can be compared on the same grounds. We also identify query paradigms beyond query by example (QBE) that may be integrated into a benchmarking framework, and we give examples of application-based benchmarking areas.