GENERATING QUERIES TO CRAWL HIDDEN WEB USING KEYWORD SAMPLING AND RANDOM FOREST CLASSIFIER
Main Article Content
Abstract
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.
References
A comparative study on web crawling for searching hidden web by IJCSIT
Trupti V. Udapure, Ravindra D. Kale and Rajesh C. Dharmik,â€Study of web crawler and its Different typesâ€, IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. VI (Feb. 2014), PP 01-05
Ali Mesbah , Arie van Deursen , Stefan Lenselink, Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes, ACM Transactions on the Web (TWEB), v.6 n.1, p.1-30, March 2012
BERGMAN, M. 2000. The deep Web: Surfacing the hidden value. BrightPlanet, www.completeplanet.com/Tutorials/DeepWeb/index.asp.
BERGMAN, M. 2000. The deep Web: Surfacing the hidden value. BrightPlanet, https://brightplanet.com/2014/03/clearing-confusion-deep-web-vs-dark-web.asp
C. J. Kaufman, Rocky Mountain Research Laboratories, Boulder, Colo., personal communication, 1992. (Personal communication)
A. Bergholz, B. Chidlovskii, “Crawling for Domain- Specific Hidden Web Resources†In the Proc. of the 4th Int. Conf. on Web Information System Engineering,2003
S. Liddle, D. Embley, Del Scott and S. Ho Yau, †Extracting Data Behind Web Forms†In the Proc. of the 28th Int. Conf. on Very Large Data Bases, China, 2005
S. Raghavan and H. Garcia-Molina. Crawling the hidden web. In VLDB, 2001.
LUO Xin; XIA De-lin; YAN Pu-liu. Improved feature selection method and TF-IDF formula based on word frequency differentia. Computer Applications, 2005, 25(9): 2031-2033.
Markus Hegland. The Apriori Algorithm – a Tutorial. CMA, Australian National University, WSPC/Lecture Notes Series, 22-27. March 30, 2005.
L. Barbosa and J. Freire, “Siphoning hidden-web data through keyword-based interfaces,†in Proceedings of the 19th Brazilian Symposium on Databases SBBD, 2004.
Cho, J., Garcia-Molina, H., & Page, L. (1998). Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 30(1–7), 161–172.
De Bra, P.M.E. & Post, R.D.J. (1994). Information retrieval in the World- Wide Web: Making client-based searching feasible. In Proceedings of the First World-Wide Web Conference (pp. 183–192). New York: ACM Press.
L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.