PREDICTIVE MODELING FOR SMOKING STATUS AND LUNG CANCER RISK CLASSIFICATION: A MACHINE LEARNING APPROACH

Main Article Content

Nagarjuna Pasupuleti

Abstract

Lung cancer stands as the most fatal cancer worldwide, responsible for an estimated 1.8 million deaths annually, accounting for nearly one in five cancer-related deaths (18.7%). According to the World Health Organization (WHO), it surpassed all other forms of cancer in mortality in 2022, with 2.48 million new cases reported globally. The burden is particularly high in low- and middle-income countries, where healthcare access and early screening programs are limited. Despite advancements in treatment, the survival rate remains low, largely due to late-stage diagnosis and continued tobacco consumption. Smoking is the primary risk factor, linked to approximately 85% of all lung cancer cases.


Beyond its health implications, the economic cost of lung cancer is staggering. In 2023, the global cancer drug market was valued at $223 billion, and lung cancer alone contributed significantly to this figure. The lung cancer treatment market reached $17.65 billion in 2023 and is projected to grow at a compound annual growth rate (CAGR) of 14.21%, potentially exceeding $44.17 billion by 2030. These figures reflect not only the direct cost of treatment but also indirect costs such as loss of productivity, caregiver burden, and long-term disability.


Early identification of smoking behaviour is a critical lever in lung cancer prevention and early detection strategies. However, traditional approaches—relying on self-reporting or delayed clinical diagnostics—are often inconsistent or inaccessible. There is an urgent need for data-driven tools that can proactively classify individuals based on their smoking behaviour and estimate their risk for lung cancer using routine clinical and demographic data.


This white paper introduces a robust machine learning-based predictive framework that addresses this gap. The proposed model utilizes health features such as age, BMI, blood pressure, cholesterol levels, and behavioural indicators to classify smoking status and stratify lung cancer risk. Developed and tested using publicly available datasets, the model achieved high accuracy and interpretability, making it suitable for integration into digital health platforms and primary care systems.

Downloads

Download data is not yet available.

Article Details

Section
Articles