Back to Search Start Over

Analysis of Gallica and Data BnF logs and Modelling of Behaviour Patterns

Authors :
d'Alché-Buc, Florence
Beaudouin, Valérie
Bermès, Emmanuelle
Chevallier, Philippe
Le Moullec-Rieux, Aude
Nouvellet, Adrien
Prieur, Christophe
Roueff, François
Laboratoire Traitement et Communication de l'Information (LTCI)
Institut Mines-Télécom [Paris] (IMT)-Télécom Paris
Télécom ParisTech
Bibliothèque nationale de France (BnF)
Bibliothèque nationale de France, Délégation à la Stratégie et à la recherche (BnF_DSG)
Bibliothèque Nationale de France
Bibliothèque nationale de France, Département des Métadonnées (BnF_MET)
Bibliothèque nationale de France (Paris)
BibliLab
Source :
[Research Report] Bibliothèque nationale de France (Paris); Télécom ParisTech. 2017
Publication Year :
2017
Publisher :
HAL CCSD, 2017.

Abstract

Gallica (http://gallica.bnf.fr) is one of the major digital libraries available for free via the Internet. It provides access to million of documents of any type and receive around 1.5 million visits per month. In the context of a research partnership between the BnF and Télécom ParisTech, an analysis of Gallica servers’ connection logs was carried out, applying machine-learning methods to them. The aim was not to collect information on users or their profiles but rather to use logs, which act as records of usage, as a basis for identifying typical clickstreams. For 15 months, a data clusterisation algorithm was developed, enabling grouping of Gallica sessions with similarities in sequencing and duration of actions . Logs analysed covered a range of durations, from a week to a month, with systematic checking of the stability of models obtained. Such learning methods take advantage of the very factor that undermines traditional methods for gathering information on usage: the extremely high numbers of connections. Despite the power of the algorithms involved, machine learning also requires numerous decisions to be taken, necessitating availability of other sources of knowledge on usages and users. For this reason, the preferred methodological choice was to have statistical models dialogue with results obtained from other approaches (ethnographic observations, interviews, etc.). The interest of the work carried out on the Gallica logs persuaded the BnF and Télécom ParisTech to add a further stage to the research devoted to Data BnF logs as well as clickstreams between Gallica, Data BnF and BnF General Catalogue.

Details

Language :
English
Database :
OpenAIRE
Journal :
[Research Report] Bibliothèque nationale de France (Paris); Télécom ParisTech. 2017
Accession number :
edsair.dedup.wf.001..bd5d4291550a4a8d10c2f1b0cc22ef2c