Author: "Raja, S.P." / Topic: ambiguity - Searchworks@Jio Institute Digital Library Search Results

1. Weakly supervised learning for an effective focused web crawler.

Author: Joe Dhanith, P.R., Saeed, Khalid, Rohith, G., and Raja, S.P.
Subjects: *HYPERLINKS, *WEBSITES, *AMBIGUITY, *SEARCH engines, *SCALABILITY
Abstract: Focused crawler traverses the Web to only collect pages that are relevant to a particular topic, and is increasingly considered as a way to get around the scalability issues with current general-purpose search engines. But the data diversity in the Web forces these crawlers to face three significant problems: (i) inconsistency, (ii) ubiquity, and (iii) ambiguity, which causes misguidance in crawling. To handle these issues, this paper proposes a weakly supervised Gated Recurrent Unit (GRU) mechanism for an adaptive focused web crawler framework that matches semantically relevant t o p i c s and w e b p a g e c o n t e n t. This weakly supervised Gated Recurrent Unit model accepts the vector form of the t o p i c and the fetched w e b p a g e as input to produce meaningful s e m a n t i c vectors and incorporates the Manhattan distance rule to compute the topical relevance of the w e b p a g e. The proposed mechanism guides the focused crawler in downloading more relevant web pages by finding the relevant hyperlinks and omitting the irrelevant hyperlinks concerning the topic. The proposed method helps the focused crawler to semantically find, arrange, and index the web pages in a relatively narrow segment of the web to solve the inconsistency, ubiquity, and ambiguity problems of the focused crawlers. The experimental results indicate that the proposed technique outperforms the s t a t e − o f − t h e − a r t approaches in terms of h a r v e s t r a t e , p r e c i s i o n , r e c a l l , h a r m o n i c m e a n , and i r r e l e v a n c e r a t i o. In summary, the strategy described here works well and is important for focused crawlers. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"Raja, S.P."'

1. Weakly supervised learning for an effective focused web crawler.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Publication Type

Database

1 results on '"Raja, S.P."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources