LIVER CANCER PREDICTION FOR TYPE-II DIABETES USING CLASSIFICATION ALGORITHM

Main Article Content

S. Agilan
J. Kumaran Kumar

Abstract

In recent years, type II diabetes with liver cancer became a serious disease that threatens the health and mind of human. Efficient predictive modelling is required for medical researchers and practitioners. To develop a prediction model using data mining technique for type II diabetes patients with liver cancer within 6 years of diagnosis. Data has been collectedfrom the NHIRD (National Health Insurance Research Database). That selected patients who were newly diagnosed with type II diabetes. In this data 2060 cases were founded and assigned them to a case group (diagnose patients with liver cancer) and control group (diagnosed patients without liver cancer). In This proposal a liver cancer prediction for type II diabetes predictive model based on random forest which aims at analysing some readily available indicator (age, liver diseases, Alcoholic fatty liver diseases, hyperlipidaemia, etc.)using this the risk factor were identified, then chi-square test was conducted on each independent variable to make a differentiate between patients with liver cancer and patients without liver cancer. The dataset were randomly divided into two groups (training group and testing group). The training group contain of 70% of dataset (1442 cases) where the prediction model was done using training dataset. The remaining 30% of dataset is assigned to the test group for model validation. Random forest algorithm uses multiple decision trees to train the samples, and integrates weight of each tree to get the final results. The validation result shows that the random forest algorithm can greatly reduce the problem of modelling error of the single decision tree, and it can effectively predict the impact of these readily available indicators on the risk liver cancer for diabetes patients. Additionally, to get better prediction accuracy in random forest model than using the Artificial Neural Network (ANN), AdaBoost and Logistic Regression algorithm.

Downloads

Download data is not yet available.

Article Details

Section
Articles