New Generation Focused Crawler

Jitasha Mishra, Debashis Hati,Amritesh Kumar, Lizashree Mishra

Abstract


Vertical search engines use focused crawlers as their key component and develops some specific algorithms to select web pages relevant to some pre-defined set of topics. Therefore, to effectively build up a semantic pattern for specific topics it is extremely important for such search engines. Crawlers are software which can traverse the internet and retrieve web pages by hyperlinks. A focused crawler traverses the web selecting out relevant pages to a predefined topic and neglecting those out for prioritizing the URL queue. While surfing the internet it is difficult to deal with method that analyzes the reference-information among the pages, relevant pages and to predict which links lead to quality pages. In our proposed work we calculate the link score based on page rank and average relevancy score of parent pages (because we know that the parent page is always related to child page which means that for detailed information any author prefers the child page). After finding out the link score, we compare the link score with some threshold value. If link score is greater than or equal to threshold value, then it is relevant link. Otherwise, it is discarded. Focused crawler first fetches that link which has greater value compared to all link scores and threshold

 

Keywords: vertical search engine, focused crawler, page rank, vector space model


Full Text:

PDF


DOI: https://doi.org/10.26483/ijarcs.v3i7.1398

Refbacks

  • There are currently no refbacks.




Copyright (c) 2016 International Journal of Advanced Research in Computer Science