Author: "Xavier Sevillano" / Publisher: springer berlin heidelberg - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xavier Sevillano"' showing total 3 results

Start Over Author "Xavier Sevillano" Publisher springer berlin heidelberg

Author: Xavier Sevillano, Germán Cobo, Joan Claudi Socoró, and Francesc Alías
Subjects: Text corpus, ComputingMethodologies_PATTERNRECOGNITION, Brown clustering, Fuzzy clustering, Information retrieval, Computer science, Latent semantic analysis, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Consensus clustering, Search engine indexing, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Document clustering, Cluster analysis
Abstract: Deriving a thematically meaningful partition of an unlabeled text corpus is a challenging task. In comparison to classic term-based document indexing, the use of document representations based on latent thematic generative models can lead to improved clustering. However, determining a priori the optimal indexing technique is not straightforward, as it depends on the clustering problem faced and the partitioning strategy adopted. So as to overcome this indeterminacy, we propose deriving a consensus labeling upon the results of clustering processes executed on several document representations. Experiments conducted on subsets of two standard text corpora evaluate distinct clustering strategies based on latent thematic spaces and highlight the usefulness of consensus clustering to overcome the optimal document indexing indeterminacy.
Published: 2007
Full Text: View/download PDF

Author: Joan Claudi Socoró, Germán Cobo, Xavier Sevillano, and Francesc Alías
Subjects: Fuzzy clustering, Brown clustering, Computer science, business.industry, Single-linkage clustering, Correlation clustering, Constrained clustering, computer.software_genre, Machine learning, CURE data clustering algorithm, Consensus clustering, Data mining, Artificial intelligence, Cluster analysis, business, computer
Abstract: A major problem encountered by text clustering practitioners is the difficulty of determining a priori which is the optimal text representation and clustering technique for a given clustering problem. As a step towards building robust document partitioning systems, we present a strategy based on a hierarchical consensus clustering architecture that operates on a wide diversity of document representations and partitions. The conducted experiments show that the proposed method is capable of yielding a consensus clustering that is comparable to the best individual clustering available even in the presence of a large number of poor individual labelings, outperforming classic nonhierarchical consensus approaches in terms of performance and computational cost.
Published: 2007
Full Text: View/download PDF

Author: Xavier Sevillano, Francesc Alías, and Joan Claudi Socoró
Subjects: Text corpus, Latent semantic analysis, Computer science, business.industry, Artificial intelligence, business, computer.software_genre, Classifier (UML), Independent component analysis, computer, Natural language, Natural language processing
Abstract: This paper introduces a novel approach for improving the reliability of ICA-based text classifiers, attempting to make the most of the independent components of the text data. In this framework, two issues are adressed: firstly, a relative relevance measure for category assignment is presented. And secondly, a reliability control process is included in the classifier, avoiding the classification of documents belonging to none of the categories defined during the training stage. The experiments have been conducted on a journalistic-style text corpus in Catalan, achieving encouraging results in terms of rejection accuracy. However, similar results are obtained when comparing the proposed relevance measure to the classic magnitude-based technique for category assignment.
Published: 2004
Full Text: View/download PDF

Books, media, physical & digital resources

Searchworks