Back to Search Start Over

On the reproducibility of experiments of indexing repetitive document collections.

Authors :
Fariña, Antonio
Martínez-Prieto, Miguel A.
Claude, Francisco
Navarro, Gonzalo
Lastra-Díaz, Juan J.
Prezza, Nicola
Seco, Diego
Source :
Information Systems. Jul2019, Vol. 83, p181-194. 14p.
Publication Year :
2019

Abstract

This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work Claude et al., (2016). In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections) , that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as reproducibility package. • We summarize the original results and motivate the proposed experimental setup. • We explain the replication framework, including datasets, query patterns, source code and scripts. • We detail all configuration parameters for each solution, explaining the better configurations. • We host the framework at GitHub, and publish it through Mendeley Data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03064379
Volume :
83
Database :
Academic Search Index
Journal :
Information Systems
Publication Type :
Academic Journal
Accession number :
136201893
Full Text :
https://doi.org/10.1016/j.is.2019.03.007