Indexing Size Approximation of WWW Repository with Leading Information Retrieval and Web Filtering Robots

Ijaz Ali Shoukat, Mohsin Iftikhar , Abdul Haseeb


The biggest information system of World Wide Web indexing is critical to estimate. Web is the beneficial and growing scientific
utility like digital library to explore electronic literature to its lovers. Indexing estimation of WWW information is an open problem since 1998.
Yahoo has claimed 19 billion web documents as its indexed size on which Google is not satisfied because in accordance with last published
study by Gulli and Signorini the total “indexed web size” was around 11.5 billion pages. Web is growing hastily; what is the current size of
web? Which search engine possesses large indexing of authentic information (PDF files)? Which search engine provides large indexing of all
types of Web pages? This article provides the answers of all above questions. We estimated the index size of leading search engines (Google,
Yahoo and MSN) under easy and cost effective approach because if easy way persists then why we select tough heuristics. Our technique relies
on querying over the search engines with selected common affixes that can be a part of each and every document or web page. This paper
concludes the total size of current “indexed web contents” and provides comparative analysis to support the scholars; which search engine has
more authentic information and large indexing size.




Keywords: Index Size of Search Engines, Total Web Size, Comparison of Google, Yahoo and MSN, Web Crawlers, Web Robots

Full Text:




  • There are currently no refbacks.

Copyright (c) 2016 International Journal of Advanced Research in Computer Science