Diabetes prediction and validation model using ML classification algorithms

Main Article Content

Subhrapratim Nath
Indrajit Das
Pradyut Nath
Sumagna Dey
Dyuti Mohapatra


Diabetes is now a global wide concern, which can critically impact and disrupt the normal lifestyle and the everyday activities of any individual. Due to the lack of insulin and high glucose content in the body, anyone can get diagnosed with diabetes. Apart from all the medical factors, there are few additional non-medical factors in an individual’s daily life like hypertension, heredity, daily standard activity, smoking habits, body mass index etc. that might play a part in triggering diabetes. Several medical studies reveal that for women sometimes pregnancy frequencies or any kind of heart issues can also trigger diabetes. The paper aims to predict the most critical factor that contributes in triggering diabetes in any individual by using classification and predictive analysis algorithms. Five well known machine learning classification algorithms are used where a filtering scheme based on 75% threshold accuracy rate is employed followed by verification using AUROC metric aiming low error rate and high prediction accuracy. Additionally, the model used Ensemble learning to make predictions and validates the proposed scheme against PIMA Indian Diabetes dataset.


Download data is not yet available.

Article Details



P. Suresh Kumar and V. Umatejaswi, “Diagnosing Diabetes using Data Mining Techniquesâ€, International Journal of Scientific and Research Publications, Vol 7, Issue 6, June 2017.

A.Swain, S. N . Mohanty, A.C . Das “Comparative Risk Analysis on Prediction of Diabetes Mellitus using machine learning approachâ€, International Conference on Electrical , Electronics and Optimization Techniques (ICEEOT) – 2016.

W. Xu, J. Zhang, Q. Zhang, X. Wei,“Risk Prediction of type II diabetes based on random forest modelâ€, 3rd International Conference on Advances in Electrical, Electronics, Information, Communication and Bio – Informatics (AEEICB17), 2017.

L. O. Griva, M. S Basualdo, “Evaluating clinical accuracy of models for predicting glycemic behavior for diabetes careâ€, Argentine Conference on Automatic Control (AADECA), 2018.

J. He, T. He, Y. Wang, “Blood Glucose Concentration Prediction based on Canonical Correlation Analysisâ€, 38th Chinese Control Conference, July, 2019.

C-Y. J Peng, K.L Lee, G.M. Ingersoll, “ An introduction to logistic regression analysis and reportingâ€, The International of Education Research, Vol.96, Issue. 1, 2002.

N. Cristianini and J Shawe-Taylor, 2000 “An introduction to support vector machines: and other kernel-based learning methodsâ€,Cambridge university press.

P.Kaviani, S. Dhotre, “ Short survey on Naïve Bayes Algorithmâ€, International Journal of Advance Research in Computer Science and Management • November 2017.

G. Biau, “ Analysis of a Random forests modelâ€, Journal of Machine Learning Research 13 (2012) 1063-1095.

Y-L. Cai, D. Ji, D-F. Cai, “ A KNN research paperclassification method based on shared nearest neighborâ€, Proceedings of NTCIR-8 Workshop Meeting, June 15–18, 2010.