APPLICATION OF MACHINE LEARNING BASED RANDOM FOREST REGRESSOR IN IMAGE DEHAZING

: Low visibility causes haze in images due to fog or dust particles in the atmosphere. The haze causes color distortion and even blurring in the images captured. Machine learning approach has been considered to provide optimized haze removal results to generate higher quality images from where information can be extracted. In this context, machine learning-based random forest regressor algorithm along with post-processing techniques was proposed as a superior solution for de-hazing images and thereby generating higher quality images in comparison to other direct de-hazing methods.


I. INTRODUCTION
Haze is a significant problem in the imaging industry, which results in reduced visibility of image details [1]. Many methods of eliminating the haze have been developed and lately, machine learning techniques have been introduced as alternative ways to achieve optimal dehazing.
Machine learning is the process of teaching machines what to do and learn from past experiences. By this, the machine learning algorithms are computed from previous data and are unique because it does not rely on predetermined equations models. Machine learning has come a long way in the past two decades, enabling the achievement of self-driving cars and even things as simple and essential as an efficient web search [2]. Machine learning is a crucial factor in the de-hazing of images as it has made milestones in image processing and computer vision that is accompanied by motion and object detection.
Most de-noising algorithms that are not necessarily machine based aim at the restoring all image details degraded by haze [3]. All these algorithms de-noise and de-haze images but a gap is left, whereby the atmospheric light estimation is inaccurate. Haze light estimation might be overfitting or underfitting, resulting in weak scene detail recovery. That is why the Machine learning based approach is a method to consider while de-hazing as its estimation of atmospheric light is superior and incorporates post-processing functions that boost the quality of the works significantly [4] Machine learning is used in the de-hazing of images because the task involves large data quantities and lots of variables like noise and fog which have no specific formula for handling. It is divided into two methods to achieve the optimal goal of de-hazing images which are, unsupervised and supervised machine learning.
Supervised learning is a technique that primarily trains a model using known data. The data becomes the basis on which this model can predict outcomes of different situations reasonably. The supervised learning models use classification and regression techniques to generate various algorithms that are used in the learning. The classification techniques predict discrete responses while regression techniques predict continuously changing answers. Unsupervised learning is different from supervised learning because it finds unknown data and tries to create patterns and structures out of it to predict outcomes. The prediction of outcome is possible through a data clustering technique which has the following different algorithms. The clustering technique usually finds patterns in data to make reasonable predictions like used practically in object recognition. Choosing between the many techniques that are available is not easy due to the non-assurance of success, but in the removal of haze in outdoor images, a regression algorithm model can generate transmission estimates of hazy sceneries. Regression algorithms perform best in theprediction of continuously changing situations like the haziness of patches [5]. The projection of patches ensures that visibility of the images is restored making it easier to extract data.
The study aims at proposing the use of machine learning-based techniques and specifically the random forest regressor algorithm as a superior solution for de-hazing with higher accuracy in estimation of Atmospheric light and using guided filters to achieve excellent results in comparison to other direct de-hazing methods.

II. RESEARCH METHOD
The proposed learning based haze removal model takes places in many steps that include training at first then haze removal and finally processing to eliminate the haze entirely and produce high-quality images. Random decision forest regression approach is used which entails construction of variant decision trees during the machine training and hence generating mean predictions from the variant trees.

A. Data Training
Training data is first prepared to enable the machine to create patterns. The data preparation is done from physical properties of an image ( ) = − ( ) . The images obtained from an outdoor scene which are hazy and haze free are synthesized without reference to scenery depth as image content may be independent of it.
Haze free sample images are first obtained, and the data is prepared by synthesizing a hazy patch ( ) from a clean, hazefree patch ( ) using set values of the variables involved to avoid inconsistencies. These variables are atmospheric light( )[1,1,1] and transmission( ∈ [0,1]) [6].
= + (1 − ) The data from the synthesis of the hazy patch is input in the learning model with random forest. The data includes the haze relevant features which are broken down from the image content by sorting all the values in each image scale hence the content of each output is related to its unique input. The model is, therefore, able to show the similarities and differences between image features and the random transmission.

B. Haze removal using random forest regressor
The dehazing begins with an estimation of atmospheric light (A). A median of 0.1% of pixels with the highest dark channel values is used to arrive at an estimate. The taking of the median of 0.1% of the pixels avoids the distortion of the colors maintaining the quality of the images [7]. The image is then put through testing wherea white-balance correction is applied with the atmospheric light (A) set to the initial image hence the patches get the faint white light same as the machine training procedure. The patch by patch transmission is aggregated to generate a transmission map and a guided filter ( ; ) = min ∈Ω ( ) min ∈ , , ( )/ used to suppress the blocky features using an optimal size of r hence creating a real scene [8].

C. Post-processing
After the real scene is created using the white balance restoration and the guided filter, some of the patches in the images may seem too dim as compared to others. Since the radiance of the scene might have had lower brightness in comparison to that of the atmospheric light hence image processing is necessary to boost the image quality [9] The post-processing uses an application of adaptive atmospheric light as the assumption that atmospheric light is uniform is not practical as the dark image patches become too dark hence the level of image extraction is lower after haze removal [10]. The adaptive changes to the atmospheric light are then made relating to the brightness and radiance of the initial image according to the following equation [11]. The execution of the robust processing equation takes place in two phases which are: Solving the Atmospheric light estimation (A) without the smoothness factor The guided filter ( ) is then applied for smoothening Where the variables are: = 1 − 1, = the smoothening factor.

C. Analysis
There is a great need for the use of machine learning algorithms that provide an automatic processing system that carefully tunes the learning parameters. From the image results, it is visible that the random forest algorithm recovers more details because of its ability to fine tune the settings. This comes about because the estimation of atmospheric light is entirely accurate while using the learning based algorithm unlike the direct de-hazing utilizing the color neutral-hazer for Photoshop hence performing better than other de-hazing methods [12].The model with random forest algorithm considers only one pixel thus is subject to noise, therefore, resulting in distortion of colors which almost similar to other methods of de-hazing [13].The use of the guided filter { ( ; ) = min ∈Ω ( ) min ∈ , , ( )/ } helps to reduce distortion by smoothening the image to recover some of the lost details in the pictures.

D. Evaluation
The transmission of the test patches in the regression technique tends to center around the 45 0 line, unlike the other direct de-hazing method that that is usually lower than the 45 0 range or equal to it. This means that the regression model does better transmission estimation. This is because it does not rely on a single dark channel value like the direct de-hazing [14]. The dark channel features captured by the regression model are of higher visual quality [15].

IV. CONCLUSION
The machine learning process is quite efficient in the dehazing methods as it enables analysis of massive data quantities faster and of high quality meaning that it is minimal or no loss of information in the images.The learning based algorithm based on the random forest is better in comparison to the Photoshop plugin that is considered to be state of the art. The algorithm extracts and evaluates the haze relevant features in images, processed through the dark channel with an accurate estimation haze features ensuring optimal results are obtained [16]. The algorithms major problem is the difficulty in getting proper training information, and most of the data must first undergo synthesis for it to be used,butotherwise, it is the best de-hazing method.

V. FUTURE SCOPE OF THE RESEARCH
De-hazing of images is very important in therestoration of image clarity for ease in data extraction. The quality of images obtained from visual systems like remote sensing systems and aerial imagery operations is compromised by the presence of media like fog and dust particles. The data collected from the optical systems are highly reliable hence the need to establish a stable solution for the problem of poor image clarity is necessary.
The proposed machine learning algorithm based on the random forests techniques will enhance the systems capability to capture images in adverse weather conditions due to the training data it is exposedto. This means that the risk of loss of image data is reduced if not eliminated and the work was done to de-haze the images is reduced as the haze is removed automatically. The proposed machine learning algorithm is more efficient and fast in repairing the transmission map and the post-processing using adaptive atmospheric light improving the graphic structure of the images captured by the outdoor visual systems.

VI. ACKNOWLEDGMENT
We are thankful to Faculty of Computing and Information Technology Rabigh (FCITR) of King Abdulaziz University, Jeddah for supporting and providing the necessary platform for the successful completion of this project.