Back to Search
Start Over
Efficient Benchmarking of NLP APIs using Multi-armed Bandits
- Source :
- EACL (1)
- Publication Year :
- 2017
- Publisher :
- Association for Computational Linguistics, 2017.
-
Abstract
- Comparing NLP systems to select the best one for a task of interest, such as named entity recognition, is critical for practitioners and researchers. A rigorous approach involves setting up a hypothesis testing scenario using the performance of the systems on query documents. However, often the hypothesis testing approach needs to send a lot of document queries to the systems, which can be problematic. In this paper, we present an effective alternative based on the multi-armed bandit (MAB). We propose a hierarchical generative model to represent the uncertainty in the performance measures of the competing systems, to be used by Thompson Sampling to solve the resulting MAB. Experimental results on both synthetic and real data show that our approach requires significantly fewer queries compared to the standard benchmarking technique to identify the best system according to F-measure.
- Subjects :
- business.industry
Computer science
02 engineering and technology
Benchmarking
010501 environmental sciences
computer.software_genre
01 natural sciences
Task (project management)
Generative model
Named-entity recognition
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Artificial intelligence
business
Thompson sampling
computer
Natural language processing
0105 earth and related environmental sciences
Statistical hypothesis testing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
- Accession number :
- edsair.doi...........600a017aa7e7821a53a78123596c9e15
- Full Text :
- https://doi.org/10.18653/v1/e17-1039