Start Over

INDUZIONE DI CATEGORIE GRAMMATICALI E LESSICALI

Authors :: D'Errico, M.
Paternesi Melloni, S.
GRANDI, NICOLA
TAMBURINI, FABIO
Francesco DEDÈ, Annamaria BARTOLOTTA, Laura BIONDI, Maria Patrizia BOLOGNA, Maria Margherita CARDELLA, Marina CASTAGNETO, Diego SIDRASCHI, Pierluigi CUZZOLIN, Marianna D’ERRICO, Nicola GRANDI, Serena PATERNESIMELONI, Fabio TAMBURINI, Elisabetta MAGNI, Alberto MANCO, Paolo MILIZIA, Anna POMPEI, Flavia POMPEO, Domenica ROMAGNO, Giancarlo SCHIRRU, Anna M. THORNTON
Francesco Dedè
D'Errico, M.
Grandi, N.
Paternesi Melloni, S.
Tamburini, F.
Publication Year :: 2016
Publisher :: Il Calamo, 2016.
Abstract: The aim of this paper is to give an ‘a-theoretical’ definition of the main parts of speech, extracting the set of categories from the actual distribution of data, or, in other words, from the contexts of occurrence of words. The definitions of the parts of speech obtained in this way depend uniquely on contextual information and on the analysis of distributional similarities among words, and are not conditioned by any theoretical framework. The research hypothesis is that two words which are formally and semantically similar and which share the same syntactic behavior will occur in similar contexts. As a consequence, if we classify words according to their contexts of occurrence, we should expect that formally and semantically similar words will turn up in the same class. So, if we investigate a huge, representative corpus of a language, we should be able to automatically extract all the parts of speech by means of a survey of the contexts of occurrences. In this article we will test this approach on Italian, basing our analysis on CORIS, a representative corpus of written Italian.