KINJAL SHETH, Dr. Harshad Bhadka, Dr. Ashish Jani


Nowadays, Information Retrieval (IR) is becoming more popular technique due to the tremendous growth of resources on the internet. However, the present information retrieval techniques have several limitations such as lack of semantic keyword, more time consumption and vague user’s query, etc. To mitigate these issues, this paper proposed a novel Information Retrieval (IR) framework to achieve effective data access which is available in online. The proposed IR system includes five major steps, at first the documents which are shared as the resources are pre-processed, and domain analysis is made to find the category of the document. Secondly, the keywords are extracted using semantic keyword extraction and indexing, and impact score estimation is obtained to determine the importance of the keyword in each document. Thirdly, the document similarity is estimated using novel similarity estimation algorithm for clustering the documents based on the attained score. Fourth, the documents are ranked based on the similarity score and the impact score of the keywords in the query. Finally, the user needs to register their personal information based on the novel privacy preservation algorithm to maintain the privacy of the querying users. The simulation results of proposed framework achieved significant improvement than existing approaches in terms of average precision, recall, mean average precision and execution time.


Information Retrieval (IR), semantic keyword extraction, impact score estimation, novel similarity estimation,Levenshtein distance.

Full Text:



Sy, M. F., Ranwez, S., Montmain, J., Regnault, A., Crampes, M., &Ranwez, V. (2012). User centered and ontology based information retrieval system for life sciences. BMC bioinformatics, 13(Suppl 1), S4.

Sagayam, R., Srinivasan, S., &Roshni, S. (2012). A survey of text mining: Retrieval, extraction and indexing techniques. International Journal of Computational Engineering Research, 2(5).

Wu, Q., Burges, C. J., Svore, K. M., &Gao, J. (2010). Adapting boosting for information retrieval measures. Information Retrieval, 13(3), 254-270.

Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.

Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1), 1.

Roy, D., Paul, D., Mitra, M., &Garain, U. (2016). Using word embeddings for automatic query expansion. arXiv preprint arXiv:1606.07608.

Gan, L., & Hong, H. (2015). Improving query expansion for information retrieval using wikipedia. International Journal of Database Theory and Application, 8(3), 27-40.

Cao, G., Nie, J. Y., Gao, J., & Robertson, S. (2008, July). Selecting good expansion terms for pseudo-relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 243-250). ACM.

Lavrenko, V., & Croft, W. B. (2001, September). Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 120-127). ACM.

Gao, J., &Nie, J. Y. (2012, October). Towards concept-based translation models using search logs for query expansion. In Proceedings of the 21st ACM international conference on Information and knowledge management (p. 1). ACM.

Riezler, S., & Liu, Y. (2010). Query rewriting using monolingual statistical machine translation. Computational Linguistics, 36(3), 569-582.

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, 3(4), 333-389.

Imhof, M., &Braschler, M. (2017). A study of untrained models for multimodal information retrieval. Information Retrieval Journal, 1-26.

Büttcher, S., Clarke, C. L., &Lushman, B. (2006, August). Term proximity scoring for ad-hoc retrieval on very large text collections. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 621-622). ACM.

He, B., Huang, J. X., & Zhou, X. (2011). Modeling term proximity for probabilistic information retrieval models. Information Sciences, 181(14), 3017-3031.

Van Rijsbergen, C. J. (1977). A theoretical basis for the use of co-occurrence data in information retrieval. Journal of documentation, 33(2), 106-119.

Singh, J., &Sharan, A. (2015, February). Co-occurrence and Semantic Similarity Based Hybrid Approach for Improving Automatic Query Expansion in Information Retrieval. In ICDCIT (pp. 415-418).

Robertson, S. E. (1990). On term selection for query expansion. Journal of documentation, 46(4), 359-364.

Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys (CSUR), 44(1), 1.

Bigi, B. (2003, April). Using Kullback-Leibler distance for text categorization. In European Conference on Information Retrieval (pp. 305-319). Springer, Berlin, Heidelberg.

Pérez-Agüera, J. R., & Araujo, L. (2008). Comparing and combining methods for automatic query expansion. arXiv preprint arXiv:0804.2057.

Shaw, J. A., & Fox, E. A. (1995). Combination of multiple searches. NIST SPECIAL PUBLICATION SP, 105-105.

Wei, Z., Gao, W., El-Ganainy, T., Magdy, W., & Wong, K. F. (2014, July). Ranking model selection and fusion for effective microblog search. In Proceedings of the first international workshop on Social media retrieval and analysis (pp. 21-26). ACM.

Singh, J., &Sharan, A. (2017). Rank fusion and semantic genetic notion based automatic query expansion model. Swarm and Evolutionary Computation.

Huang, G., Wang, S., & Zhang, X. (2011). Query expansion based on associated semantic space. Journal of Computers, 6(2), 172-177.

Prieto-Diaz, R., &Arango, G. (1991). Domain analysis and software systems modeling. IEEE Computer Society Press.



  • There are currently no refbacks.

Copyright (c) 2018 International Journal of Advanced Research in Computer Science