PREDICTIVE MODELING FOR SMOKING STATUS AND LUNG CANCER RISK CLASSIFICATION: A MACHINE LEARNING APPROACH
Main Article Content
Abstract
Lung cancer stands as the most fatal cancer worldwide, responsible for an estimated 1.8 million deaths annually, accounting for nearly one in five cancer-related deaths (18.7%). According to the World Health Organization (WHO), it surpassed all other forms of cancer in mortality in 2022, with 2.48 million new cases reported globally. The burden is particularly high in low- and middle-income countries, where healthcare access and early screening programs are limited. Despite advancements in treatment, the survival rate remains low, largely due to late-stage diagnosis and continued tobacco consumption. Smoking is the primary risk factor, linked to approximately 85% of all lung cancer cases.
Beyond its health implications, the economic cost of lung cancer is staggering. In 2023, the global cancer drug market was valued at $223 billion, and lung cancer alone contributed significantly to this figure. The lung cancer treatment market reached $17.65 billion in 2023 and is projected to grow at a compound annual growth rate (CAGR) of 14.21%, potentially exceeding $44.17 billion by 2030. These figures reflect not only the direct cost of treatment but also indirect costs such as loss of productivity, caregiver burden, and long-term disability.
Early identification of smoking behaviour is a critical lever in lung cancer prevention and early detection strategies. However, traditional approaches—relying on self-reporting or delayed clinical diagnostics—are often inconsistent or inaccessible. There is an urgent need for data-driven tools that can proactively classify individuals based on their smoking behaviour and estimate their risk for lung cancer using routine clinical and demographic data.
This white paper introduces a robust machine learning-based predictive framework that addresses this gap. The proposed model utilizes health features such as age, BMI, blood pressure, cholesterol levels, and behavioural indicators to classify smoking status and stratify lung cancer risk. Developed and tested using publicly available datasets, the model achieved high accuracy and interpretability, making it suitable for integration into digital health platforms and primary care systems.
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.