PROBABILISTIC LATENT FEATURE DISCOVERY MODEL AND MULTI LABEL CONTENT CATEGORIZATION IN E-LEARNING USING R PACKAGE

Main Article Content

arulselvarani pugazh

Abstract

In an E-learning environment, users have access to huge online documents of various learning materials, hence finding the suitable learning content becomes harder. In this research paper, author uses high level machine learning approach using R packages to propose a method namely probabilistic latent feature discovery approach and multi label content categorization in e-learning. Probabilistic latent feature discovery model is a generative model for multi label e-learning content categorization, which has significant, effects of both accuracy and efficiency. Latent Dirichlet Allocation (LDA) is an effective probabilistic approach to develop a topic models depends on a formal generative model of a document, viable and efficient algorithm in e-learning text modeling. Authors propose a generative LDA-based model within the information retrieval approach, and estimate it on an e-learning environment, training the learning documents via Gibbs sampling. The predictive distribution LDA fit model is used to predict new words. The experimental results on e-learning, multi label content categorization demonstrate the accuracy and effectiveness of the proposed research approach.

Downloads

Download data is not yet available.

Article Details

Section
Articles
Author Biography

arulselvarani pugazh, Govt. arts college, Trichy-22

Ph.D in Computer Science from Mother Teressa Womens University, kodaikanal,Tamilnadu, India.

References

Chong Wang and David M. Blei.Collaborative Topic Modeling for Recommending Scientific Articles.KDD '11 the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, CA, USA — August 21 - 24, 2011

Jason D. M. Rennie, “Improving Multi-class Text Classification with Naive Bayes ,†2001, Massachusetts.Institute of Technology, http://citeseer.ist.psu.edu/cs

Yang Y., Zhang J. and Kisiel B, “A scalability analysis of classifiers in text categorization,†ACM SIGIR'03, 2003.

Canasai Kruengkrai , Chuleerat Jaruskulchai, “A Parallel Learning Algorithm for Text Classification,†TheEighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), Canada, July 2002.

H. Wallach, “Topic modeling: beyond bag-of-words,â€Proceedings of the 23rd International Conference on Machine Learning, (2006)

S. Deerwester, S. Dumais, G. Furnas, T. Landauer and R. Harshman, “Indexing by latent semantic analysis,†Journal of the American Society for Information Science, 41(6), 391–407 (1990).

T. Hofmann, “Probabilistic Latent Semantic Indexing,â€Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 50–57 (1999).

T. Hofmann, Probabilistic latent semantic indexing, in: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval, ACM, New York, NY, USA, 1999, pp. 50–57.

D. Blei, A. Ng, M. Jordan, Latent Dirichlet allocation, The Journal of Machine Learning Research 3 (2003) 993–1022.

D.Cai, X. Wang, X. He, Probabilistic dyadic data analysis with local and global consistency, in: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, NY, USA, 2009.

M. Girolami and A. Kaban, “On an equivalence between PLSI and LDA,†Proceedings of the 26th annual international ACMSIGIR conference on Research and development in information retrieval, 433–434 (2003).

J. Ponte and W. Croft, “A Language Modeling Approach to Information Retrieval,†Proceedings of the21st annual international ACM SIGIR conference onResearch and development in information retrieval 275–281 (1998).

T. Griffiths and M. Steyvers, “Finding scientific topics,†Proceedings of the National Academy of Sciences, 101 5228–5235 (2004).

J. Lasserre, C. Bishop, T. Minka, Principled hybrids of generative and discriminative models, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, 2006.

Bosch, A. Zisserman, X. Munoz, Scene classification using a hybrid generative/discriminative approach, IEEE Transactions on Pattern Analysis and Machine Intelligence 30 (2008) 712.

Jordan, M., Ghahramani, Z., Jaakkola, T. and Saul, L. (1999). Introduction to variational methods for graphical models. Machine Learning 37 183–233.

Wainwright, M. and Jordan, M. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Dept. Statistics, U.C. Berkeley.

Feinerer. An introduction to text mining in R. R News, 8(2):19{22, Oct. 2008. URL http://CRAN.R-project.org/doc/Rnews/.

Feinerer, K. Hornik, and D. Meyer. Text mining infrastructure in R. Journal of Statistical Software, 25(5):1{54, March 2008. ISSN 1548-7660. URL http://www.jstatsoft.org/v25/i05.

Feinerer I (2011). tm: Text Mining Package. R package version 0.5-5., URL http://CRAN. R-project.org/package=tm.

R Development Core Team (2011). R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/

Chang J (2010). lda: Collapsed Gibbs Sampling Methods for Topic Models. R package version 1.2.3, URL http://CRAN.R-project.org/package=lda

Gruhl, D., Guha, R., Liben-Nowell, D., & Tomkins, A. (2004). Information disusion through blogspace. In: Proceedings of international conference on world wide web (pp. 491–501).

Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, USA, (pp. 811–816).

Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C.X. (2007). Topic sentiment mixture: Modeling facets and opinions in weblogs. In: Proceedings of international conference on world wide web

Zeng, J. P., Zhang, S. Y., Wu, C. R., & Ji, X. W. (2009). Modelling the topic propagation over the internet. Mathematical and Computer Modelling of Dynamical Systems., 15(1), 83–93.

Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.

Nigam, K., McCallum, A., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 2/3, 103–134.

Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discov ery and data mining, Seattle, USA, (pp. 811–816).

Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, volume 20, pages 487–494, 2004.

Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of the International Conference on Machine Learning, volume 23, pages 577–584,New York, NY, 2006. ACM.

Chemudugunta, P. Smyth, and M. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In Advances in Neural Information Processing Systems 19, pages 241–248. MIT Press, Cambridge, MA, 2007.

Yanwei Wang;Tsinghua,Xiaoqing Ding;Changsong Liu.Topic Model Adaption for Recogniotion of Homologous offline Handwritten Chinese Text image,IEEE transactions on Volume: 19, Issue:12, 2014.

Jia Zeng;Cheung. W.K;Jiming Liu, Learning Topic Models by Belief Probagation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 35,Issue:5,2013.