IMPACT SCORE ESTIMATION WITH PRIVACY PRESERVATION IN INFORMATION RETRIEVAL

KINJAL SHETH; Dr. Harshad Bhadka; Dr. Ashish Jani

doi:10.26483/ijarcs.v9i1.5237

PDF

Published: Feb 20, 2018

DOI: https://doi.org/10.26483/ijarcs.v9i1.5237

Keywords:

Information Retrieval (IR), semantic keyword extraction, impact score estimation, novel similarity estimation, Levenshtein distance.

KINJAL SHETH

C U SHAH UNIVERSITY

Dr. Harshad Bhadka

Dr. Ashish Jani

Abstract

Nowadays, Information Retrieval (IR) is becoming more popular technique due to the tremendous growth of resources on the internet. However, the present information retrieval techniques have several limitations such as lack of semantic keyword, more time consumption and vague userâ€™s query, etc. To mitigate these issues, this paper proposed a novel Information Retrieval (IR) framework to achieve effective data access which is available in online. The proposed IR system includes five major steps, at first the documents which are shared as the resources are pre-processed, and domain analysis is made to find the category of the document. Secondly, the keywords are extracted using semantic keyword extraction and indexing, and impact score estimation is obtained to determine the importance of the keyword in each document. Thirdly, the document similarity is estimated using novel similarity estimation algorithm for clustering the documents based on the attained score. Fourth, the documents are ranked based on the similarity score and the impact score of the keywords in the query. Finally, the user needs to register their personal information based on the novel privacy preservation algorithm to maintain the privacy of the querying users. The simulation results of proposed framework achieved significant improvement than existing approaches in terms of average precision, recall, mean average precision and execution time.

Downloads

Download data is not yet available.

Issue

Vol. 9 No. 1 (2018): January-February 2018

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

References

Sy, M. F., Ranwez, S., Montmain, J., Regnault, A., Crampes, M., &Ranwez, V. (2012). User centered and ontology based information retrieval system for life sciences. BMC bioinformatics, 13(Suppl 1), S4.

Sagayam, R., Srinivasan, S., &Roshni, S. (2012). A survey of text mining: Retrieval, extraction and indexing techniques. International Journal of Computational Engineering Research, 2(5).

Wu, Q., Burges, C. J., Svore, K. M., &Gao, J. (2010). Adapting boosting for information retrieval measures. Information Retrieval, 13(3), 254-270.

Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.

Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1), 1.

Roy, D., Paul, D., Mitra, M., &Garain, U. (2016). Using word embeddings for automatic query expansion. arXiv preprint arXiv:1606.07608.

Gan, L., & Hong, H. (2015). Improving query expansion for information retrieval using wikipedia. International Journal of Database Theory and Application, 8(3), 27-40.

Cao, G., Nie, J. Y., Gao, J., & Robertson, S. (2008, July). Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 243-250). ACM.

Lavrenko, V., & Croft, W. B. (2001, September). Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 120-127). ACM.

Gao, J., &Nie, J. Y. (2012, October). Towards concept-based translation models using search logs for query expansion. In Proceedings of the 21st ACM international conference on Information and knowledge management (p. 1). ACM.

Riezler, S., & Liu, Y. (2010). Query rewriting using monolingual statistical machine translation. Computational Linguistics, 36(3), 569-582.

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and TrendsÂ® in Information Retrieval, 3(4), 333-389.

Imhof, M., &Braschler, M. (2017). A study of untrained models for multimodal information retrieval. Information Retrieval Journal, 1-26.

BÃ¼ttcher, S., Clarke, C. L., &Lushman, B. (2006, August). Term proximity scoring for ad-hoc retrieval on very large text collections. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 621-622). ACM.

He, B., Huang, J. X., & Zhou, X. (2011). Modeling term proximity for probabilistic information retrieval models. Information Sciences, 181(14), 3017-3031.

Van Rijsbergen, C. J. (1977). A theoretical basis for the use of co-occurrence data in information retrieval. Journal of documentation, 33(2), 106-119.

Singh, J., &Sharan, A. (2015, February). Co-occurrence and Semantic Similarity Based Hybrid Approach for Improving Automatic Query Expansion in Information Retrieval. In ICDCIT (pp. 415-418).

Robertson, S. E. (1990). On term selection for query expansion. Journal of documentation, 46(4), 359-364.

Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1), 1.

Bigi, B. (2003, April). Using Kullback-Leibler distance for text categorization. In European Conference on Information Retrieval (pp. 305-319). Springer, Berlin, Heidelberg.

PÃ©rez-AgÃ¼era, J. R., & Araujo, L. (2008). Comparing and combining methods for automatic query expansion. arXiv preprint arXiv:0804.2057.

Shaw, J. A., & Fox, E. A. (1995). Combination of multiple searches. NIST SPECIAL PUBLICATION SP, 105-105.

Wei, Z., Gao, W., El-Ganainy, T., Magdy, W., & Wong, K. F. (2014, July). Ranking model selection and fusion for effective microblog search. In Proceedings of the first international workshop on Social media retrieval and analysis (pp. 21-26). ACM.

Singh, J., &Sharan, A. (2017). Rank fusion and semantic genetic notion based automatic query expansion model. Swarm and Evolutionary Computation.

Huang, G., Wang, S., & Zhang, X. (2011). Query expansion based on associated semantic space. Journal of Computers, 6(2), 172-177.

Prieto-Diaz, R., &Arango, G. (1991). Domain analysis and software systems modeling. IEEE Computer Society Press.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References