1. Discovery of usage patterns in digital library web logs using Markov modeling
- Author
-
Nouvellet, Adrien, Beaudouin, Valérie, d'Alché-Buc, Florence, Prieur, Christophe, Roueff, François, Laboratoire Traitement et Communication de l'Information (LTCI), Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Sociologie Information-Communication Design (SID), Institut interdisciplinaire de l’innovation (I3, une unité mixte de recherche CNRS (UMR 9217)), Centre National de la Recherche Scientifique (CNRS)-École polytechnique (X)-Télécom ParisTech-MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)-École polytechnique (X)-Télécom ParisTech-MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Département Sciences Economiques et Sociales (SES), Télécom ParisTech, École polytechnique (X)-Télécom ParisTech-MINES ParisTech - École nationale supérieure des mines de Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[SHS.SOCIO]Humanities and Social Sciences/Sociology ,[SHS.STAT]Humanities and Social Sciences/Methods and statistics ,[INFO.INFO-CY]Computer Science [cs]/Computers and Society [cs.CY] ,[INFO.INFO-WB]Computer Science [cs]/Web ,[SHS.MUSEO]Humanities and Social Sciences/Cultural heritage and museology ,[SHS]Humanities and Social Sciences - Abstract
This paper proposes a family of tools based on Markov modeling to quantitatively analyze how people access the digital collections of the Bibliothèque nationale de France (BnF, the national library of France), through the web platform called Gallica. The aim is to provide the BnF with relevant information about the various usage patterns to help them to better understand their users, improve the mediation efforts and the design of the website, in order to increase the general public use of the 4M-documents collection. For that purpose, the study focuses on the access logs retrieved from the Apache HTTP servers of Gallica that are converted into sequences of actions. In order to study user navigation behaviors, we propose to model the access log data using Markov Models, whether it be Markov chains when considering sequences of actions without duration, or Markov processes when taking into account duration. Our models are either used to capture an average behavior through meaningful statistics or to cluster the data to exhibit various interpretable types of usage. The numerical results bring new insights on the way the users interact with the platform, highlighting the mean duration of some actions such as the interaction with the search engine or the consultation of documents. Even if our approach requires the use of additional information in order to properly interpret the models and the correlations that it highlights, it is able to discover all types of behaviors, including the stealthiest and the most difficult to capture in traditional surveys, giving them their fair weight in terms of audience. We also show how this approach fits into a broader work combining data mining and ethnography.
- Published
- 2019