1. Bringing a Feature Selection Metric from Machine Learning to Complex Networks
- Author
-
Jean-Charles Lamirel, Anthony Perez, Nicolas Dugué, Laboratoire d'Informatique de l'Université du Mans (LIUM), Le Mans Université (UM), Natural Language Processing : representations, inference and semantics (SYNALP), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire d'Informatique Fondamentale d'Orléans (LIFO), Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université d'Orléans (UO), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Graphes, Algorithmes et Modèles de Calcul (GAMoC), Université d'Orléans (UO)-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université d'Orléans (UO)-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Computer science ,business.industry ,Node (networking) ,[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] ,Feature selection ,02 engineering and technology ,Complex network ,[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] ,Machine learning ,computer.software_genre ,[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI] ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,020204 information systems ,Cluster labeling ,Metric (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Centrality ,Cluster analysis ,computer ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience; Introduced in the context of machine learning, the Feature F-measure is a statistical feature selection metric without parameters that allows to describe classes through a set of salient features. It was shown efficient for classification, cluster labeling and clustering model quality measurement. In this paper, we introduce the Node F-measure, its transposition in the context of networks, where it can by analogy be applied to detect salient nodes in communities. This approach benefits from the parameter-free system of Feature F-Measure, its low computational complexity and its well-evaluated performance. Interestingly, we show that in addition to these properties, Node F-measure is correlated with certain centrality measures, and with measures designed to characterize the community roles of nodes. We also observe that the usual community roles measures are strongly dependent from the size of the communities whereas the ones we propose are by definition linked to the density of the community. This hence makes their results comparable from one network to another. Finally, the parameter-free selection process applied to nodes allows for a universal system, contrary to the thresholds previously defined empirically for the establishment of community roles. These results may have applications regarding leadership in scientific communities or when considering temporal monitoring of communities.
- Published
- 2018
- Full Text
- View/download PDF