EFFECT OF TEXTURE FEATURE COMBINATION ON SATELLITE IMAGE CLASSIFICATION

: This paper presents a method for SIC to classify a PatternNet satellite image dataset taken from Google Earth and Google Map Application newly adopted in 2017, this SIC method use firstly a pre-processing step to verify the way of how represent the gray level of each sample image using multiple features based descriptors to handling the problem of selecting the descriptors for SIC. It suggested to use several type of texture feature extraction method, each of these method tested with the Support Vector Machine (SVM) to verify its ability to extract d a discriminative features. Also the feature selection method used to remove the less informative features of each method to get the more relevant features, finally the decision result of each feature extraction method from the classifier tested with feature combination methods, it is used to improve the final decision of the SIC method by combine multi feature extraction method.


I. INTRODUCTION
Remote Sensing (RS) is the art of science that acquiring information about a specific area, object, or phenomena by analysing the obtaining data from the device that is not contact with the area, object or phenomenon. The human eye is like a sensor that responds to the reflected light from an object, and pictures the object according to received light intensities [1]. RS in the classical sense is not a scientific discipline; it is rather a big variety of diagnostic methods by using electromagnetic waves. In some cases where electromagnetic methods fail, then sound waves or other elastic waves are used. It is obvious that RS required in many different techniques and skills. The applications cover a wide range of humanities disciplines, e.g. Botany, archaeology, geology, meteorology and security aspects, etc. [2].
There are several methods and techniques that are used in RS images, the choice of classification method depend on many regards such as: information gathered from different sensor, the sample label is known or not, the training sample nature used in RS image classification, the pixel informative nature of the data, the number of the output for each spatial data element, etc. [3]. Different methods and algorithms are used for RS image classification such as maximum likelihood, minimum distance, Artificial Neural Network (ANN), decision tree, linear discriminant analysis, Support Vector Machine (SVM), whilst the search on the optimal one is still continue in terms of suitability for different applications [4], The feature Selection method used to minimize the number of features [5].

II. LITRETURE REVIEW
There are a great deal of focus was granted to satellite image classification. Numerous methods were developed for achieving more efficient techniques that serving the applications in the field of interest. The most significant literatures are mentioned with details in the following:  Thirteen different descriptors based on color, texture and the structure were used to extract the image features from Very High Resolution (VHR) satellite image retrieved from Google Earth Application, there is a nineteen class of the Airport, beach, bridge, football field, river, etc. This dataset contains fifty samples for each class, which is used for training and testing the classifier. The thirteen feature extraction methods are Scale-Invariant Feature Transform (SIFT), Geometric Blur (GB), GIST, Self-Similarity (SSIM), Local Binary Pattern (LBP), Maximum Response (MR) filter bank, Leung-Malik (LM) filter bank, Gabor filter bank, Color Histogram (Colorhist), Hue, Opponent (OPP) and Color Pool, The Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classifiers used for classification step. Each of these feature methods is fed to the classifier lonely, and then all features are combined to get the cooperation score of them. The result shows that the more accuracy result was achieved when using the SIFT feature extraction method with the SVM classifier in comparison with the use of KNN classifier [6].  New classification system was established to classify multispectral images. The proposed system used multi feature method based on pixel oriented and region oriented methods, then the fusion system is used to make a decision based on just good features. The test includes data segmentation, features extraction, and then SVM based classification. The classification result showed reliable classification for multispectral data [7].  The ANN is used to classify satellite images given by LandSat-8. The image is first encoded by divided it into equal size of blocks, then these blocks are quantized for estimating the code of each block. The features of each block are extracted depending on the probability of that block. These features are then used to train the ANN, which prepare to use a newest image sample in the ANN classification that indicates the validation of the ANN classifier [8].

III. PROPOSED SATELLITE IMAGE CLASSIFICATION
The Proposed Satellite Image Classification (SIC) include four major steps, pre-processing step includes loading several samples of images from specific class that randomly chosen from the used satellite images dataset The conversion from RGB color band into grey scaled image is then applied. Three feature extraction methods are adopted, they are: LBP based on Rotated (RLBP) with different radius, Gray Level Cooccurrence Matrix (GLCM) that applied on grey and RGB bands that used with different block size and Edge Histogram Descriptors (EHD). There methods are employed once individually and then combind with each other, in another time to test its effect on the classification performance. The classification decision based on descriptors combination is stored in a comparable features vector in the database. Then, scatter analysis as a feature selection method is used to test feature vector and remove the unusable descriptors from the database. The SVM classifier is employed to decide the classification of each image sample in terms of the comparison between the features vector of test image sample and those found in the database. Finally the features combination method applied to combine the classification result from each feature extraction method to improve the classification result. As shown in Figure (

A. Pre-Processing Step
The process of converting the RGB color image to grayscale image is the first of pre-processing step. There are two methods work for the conversion have been proposed each of which has the advantage and disadvantage when it is working with a different feature extraction method that require the gray image as shown in Equations 1 and 2. The second pre-processing step is the image enhancement that concerned with increase the contrast of the image by applying a linear fitting model on each pixel in the image to be transformed from the attended scale to the full contrast situation where the pixel values become actually in between 0-255 as shown in Equation (3).
Where Max is the max gray pixel value, Min is the min gray pixel value, G I is the gray image pixel value, Gg represent the enhanced gray image.

B. Feature Extraction Step
Texture features are a type of low level feature, they are not common as color feature, but it used in many application of pattern recognition field and gives good information especially in image classification systems. Such features describe the content of real world images includes trees, clouds, airplanes, and other. Three types of texture features used are GLCM, EHD and RLBP.


The GLCM is firstly quantizing the image into few grey levels, which leads to remove the possible noisy pixels may found in the sample image. Such that, there most dominant color is found in the quantized image that can be used to represent the objects clearly. This requires partitioning the image into a number of blocks with predetermined size. The frequency of each quantized image block is help to compute the probability of appearance each pixel in that block, then these probabilities are substituted in Equations (4-16) to compute the meant textural features of current image block by GLCM. To achieve more description, the computational processes are considered in the four orientations 0 o , 45 o , 90 o , and 135 o . Therefore, there are 13 descriptive features are resulted for each image block. … (4) … (5) Where ω is the angular second moment, P ij is the is the Co-occurrence value at position i and j, C is the contrast, µ i and µ j , σ i and σ j are the means and standard deviation of the rows and columns respectively. … (7) Where σ 2 is the variance, µ is the mean of both µ i and µ j . … (8) … (9) Where is the inverse difference moment, α is the sum of average, P i; x+y is the probability of the co-occurrence matrix which in the coordinates x and y are summing. … (10) … (11) … (12) … (13) … (14) … (15) … (16) Where, S v is the sum variance, S e is the sum entropy, E is the entropy, D v is the difference variance that same as the variance of P x-y , D e is the difference entropy, and, A and B are the information measures of the correlation.
Where, H X and H Y are entropies of P x and P y , which are given as follows:  EHD concerned with computing the texture features, it is used to extract texture features that forming edges of the object shape in the image sample. Filter technique is used to extract the most dominant edge in an image block. So, EHD is applied on the gray image, and then partition the image into 16 equal blocks. For each block, the five edge filters of the EHD are computed in terms of sub block of 2×2 size as shown in Figure (2).  The maximum number between these values is compared with a predetermined threshold value, if it greater than the threshold, then increase histogram bin of that sub block by one, otherwise there is no edge in that sub block. As a result, a histogram of 80 bins (five for each of 16 sub images) is achieved; each bin represents the number of edges in each sub block as shown in Table ( 1). The five bins in the sub image are then normalized separately by the total number of sub block in the sub image. … (20) Where, DD is the maximum neighbor N is the number of neighbors, I c is one of the neighbor pixels for p=0, 1, 2… N, and I c is the center pixel.
The determination of the DD and the maximum value of the neighbors around each of the pixel according to the number of neighbors and the length of the radius, the same way is used to find the threshold value of the LBP is also used to determine the RLBP threshold, the difference is the length of the radius that should be taken into account. Therefore, the computation starts from the DD pixel value and continue to the pixels in the circle around the center pixel.

C. Feature Selection Step
Feature selection methods is important in any classification methodology, these method try to select most relevant and effective features from a set of extracted features and remove as possible the not efficient features. Several method adopt for this work, in proposed SIC method the Scatter analysis approach has been used as a feature selection method. The proposed SIC method makes the use of inter scatter since it will detect the features that can change the right decision of the classifier, the idea is that to after calculate each V vector from and feature extraction method, this vector used as input to the scatter analysis method and calculate the inter scatter I a to the vector through the Equation (21). After that, the result of inter scatter calculation will be sorted and then analysis to detect ineffective features, the highest value of the inter scatter analysis will be removed because it causes distortion to the accuracy result.

… (21)
Where in intra scatter equation, i is the number of class, j is the number of the features. In the inter scatter equation, N is the number of classes, the σ i;j is the standard deviation of the i th class with the j th features that belong to it, the μ i;j is the mean of the i th class with the j th features that belong to it.

D. Classification Step
There are several types of conditions must be considered to select the classifier in SIC systems, they are: classification per pixel or subpixel, the fusion is for multi sensor or multi resolution, the type of data to be classified, spectral resolution, spatial resolution, and others. The most recent common classifiers used in satellite image classification are: Support Vector Machine (SVM), Artificial Neural Network (ANN), Maximum Likelihood (MML), Fuzzy set classifiers, and decision tree classifier. Each of these methods has advantage and disadvantage with its usage.
SVM is one type of machine learning algorithm, it is a supervised classification method proposed to solve the binary classification (two classes only) problem, but later the SVM developed to be used in multiclass classification. In the multiclass classification, there two ways are used to classify data either in one versus the rest class or pair wise classifier. In the first approach the sample in each class is assigned to the one of classes with higher probability, while in the second approach the comparison are made between every two classes and the sample are assigned to the higher votes of each class have been used.

E. Features Combination
The way to combine different feature methods is to get the probability of classifier from test each of these methods separately with training set, then put the weights and combine them to get a single combination feature vector is used to get more accurate classification result. The same classifier can be used to combine features method or it can be used different classifier between them that presents the way of combining the features with the same classifier or with another [41].
Feature combination is an important step used to improve the classification accuracy. When combining multi types of features, more information about the object will be provided, this led to improving the categorization. The ensemble approach is used in the present work with its two ways for combination the mean rule and product rule. The mean rule deals with the probability output of each testing sample from SVM classifier for different feature methods, the mean of each class probability help to find out the maximum probability then assign its class to the sample, finally account the number of the correct class within each sample to get the final classification accuracy.
The product rule the same step with the mean rule, the only difference is that instead to take the mean of the probability take the product of the probability output from different feature methods.

IV. RESULTS AND DISCUSSIONS
The development of the satellite imagery has been led to an easy way to get useful information from land cover, this information can be used in many projects like Security, Geology, Military operations, Meteorological and others. As a result the behavioral performance examined through data analysis of the satellite imagery by using training and testing steps for the classification system. In the present work, the preprocessing step used to convert the input images to gray levels depend on the type of feature extracted method, several feature extraction methods used to describe the information from different classes of satellite imagery, these features extracted from all satellite imagery, then these features has been entered in to feature selection method in order to remove as possible the not useful features, then the selected features divided in to part used to train the classifier and the other part used to test the performance of these extracted features. The post processing step include take the result from the classifier from each feature extraction methods, analysis their performance then used them to for combination method to improve the classification result. In the next paragraphes the detailed explanation of each of the previous steps discussed, and the results are presented in tables and figures, then qualitative and quantitative of the proposed satellite images classification method evaluated. The implementation of the proposed methods was done using C# programming language, which is executed under windows 10 operating system of 64 bit type.

A. Remote Sensing Dataset
There are several Benchmarks dataset proposed to evaluate the performance of many satellite image applications, these datasets used to verify how the systems built in an effective and efficient way. Some of these datasets are UC Merced, WHU-RS19, RSSCN7, Aerial image and NWPU-RESIC45. Each of the benchmarks dataset has several challenges must be taken into the consideration when choose the steps to build the application, some of these challenges are they are land use/ land cover images, spatial resolution , spectral resolution, the number of classes in each data sets and also the number of samples per class, and others. All of these must take when selecting the feature extraction methods, and also the classifier used in the classification system and many other steps in the system [10]. There are different dataset that can be used in different application and used to evaluate the proposed work [11]. In this proposed SIC method PatternNet Dataset used.
PatternNet dataset have been proposed recently in 2017, it consists of a large number of high resolution remote sensing images that collected for RSIR research. These images are collected from Google Maps API (GMA) and Google Earth Imagery (GEI) for United States (US) cities. It contains 38 classes such as Airplane, Baseball field, Beach, Football field, Intersection, Oil gas field, Swimming pool, and others. Each of these classes contains an 800 sample per class. The sample image resolution is 256×256 pixel, the spatial resolution of the dataset samples is varying from one to other, in which the lower value is about 0.062m while the highest one is about 4.693m, Table (2) shows numerical description for the spatial resolution of the PatternNet dataset. This dataset have different varies with spectral and spatial resolutions; also the objects in each sample may have different scale, rotation, illumination, and other situations, which could pose real challenges for the used classification method to be distinguished between such situations. Thus, the used classification method and the choice of employed descriptors should be chosen carefully. The first 30 classes of the PatternNet dataset with 40 samples per class have been adopted for testing the performance of the proposed method, in which 20 samples are used for the training the SVM classifier, and another 20 samples are used for testing and evaluating the proposed method.

B. Pre-Processing Results
The preprocessing step adopt firstly to the sample image in order to convert the input sample image from RGB color space (24 bit) to gray scale image (8 bit), this conversion will be better to the analysis for machine learning, also to decrease the time of computation and effect to the final result of the accuracy of the classification. Two algorithms have been used in this step each of which tested with feature extraction method to verify their ability to extract useful descriptors with the gray image. The Equation (1) which is take the weighted of the three color space (RGB), this algorithm show that most luminance result due to the weighted percentage of the taken input wavelength as shown in figure (4.b) .
The Equation (2) take the relation between the luminance and chrominance of the input image, also this algorithm show that better sharp with the edge of the object, also the contrast and the shadow decrease, and the luminance and chrominance properties of the color image will be retrieved, also the resulted grayscale image show more darkness than the first algorithm as shown in Figure (4.c). Figure (5) show that the result from these two algorithms with the sample image from PatternNet dataset.

C. Feature Extraction Result
Feature extraction is an important stage due to it prepares to analyze the image information and enable to get useful features that processed as descriptors in the classification stage. This stage deals with several types of descriptors to extract effective features from the dataset. The results of the employed texture features.

 GLCM Results
GLCM is applied on the four image bands, the best max quantize level value has been chosen to be 80. Each quantized pixel is interconnected with its four neighbors that ling along the extensions of the four considered orientations 0 o , 45 o , 90 o , and 135 o , this is carried out through five considered values of the distance between the quantized pixel and its neighbors, they are: 1, 2, 3, 4, and 5 pixels. The resulted normalize values of the extracted 13 GLCM features of the used four color bands R G B and Grey and for two distance between the center pixel and its neighbors, so there will be 104 features as result from each sample image.

 EHD Results
The application of EHD used to extract edge texture features, this leads to obtain a behavioral histogram of 80 bins length from the image sample. This histogram describes the detected edge in an image sample according to a specific threshold. The normalized features vector of the EHD histogram of the used Grey band at threshold value is 12 used in the proposed SIC method.

 RLBP Results
RLBP take the consideration of rotation invariant to improve the classification result, as discuss before the idea is to take the max value of the neighbors and start all of the weighted neighbors from this value, different radius has been tested with eight neighbors in order to select the best circular.

D. Feature Selection Results
Scatter analysis method adopts to implement the feature selection function. The use of this method is due to it is an efficient and operated during little time of comparisons between contributed features. The scatter analysis is depends on finding inter (between classes) and intra (with in same class) classes to point out the features that make possible confliction in the classification decision. Therefore, the inter scatter is used to find out the features that may change the right decision of the classifier. So, the extracted features vector V is input to scatter analysis for calculating the inter scatter by Equation (21). The resulted inter scatter is analyzed to detect ineffective features, the ineffective features are those possess highest value of the inter scatter, which candidate to be removed due to some distortion that happen by such features. The features that pointed as useable and can be effectively contributed in the process of classification decision making are stored in list of useable features called Pass Features List.

GLCM Features Selection Results
GLCM features are first analyzed with different values of quantization levels (Q L ) applied on the four adopted color bands R, G, B, and G r . The number of features that have been extracted from each band is 13 features. Whereas, number of obtained features becomes 52 features when they extracted from the four color bands. It is noticeable that the highest classification score is achieved whenever the number of quantization levels is 80.Latter, GLCM features are analyzed with different values of coverage distance, which can take one of the following values: 1, 2, 3, 4, and 5 pixels. The average classification scores for different distances between the central pixel and its neighbors when applied on the four adopted color bands at number of quantization levels is fixed on 80 levels. It is noticeable that the highest classification result is achieved at the case when two distances are used together at eighth run, where these distances are: 1 and 4 pixels so as a result there will be 104 features extracted from each sample image. In addition, best discriminative indication for the chosen two distances is found when the 80 quantization levels are applied on the adopted four color bands (R, G, B, and G r ). This yields 104 features for each used sample in the PatternNet dataset. Table (4) shows the obtained 104 features that input the scatter analysis to be analyzed and then selected the most dominant ones. The information recorded in this table is given from testing 12 SVM classifier runs, in which the image samples are randomly chosen. It is shown that the result of applying the scatter analysis candidates to used just 68 features and discarding 36 others. Therefore, the selected features that can be contributed in the following training and classification stages are the 68 features that pointed in the given table.

EHD Features Selection Results
Several threshold values are tested to choose the best one that gives best classification score, in which the used threshold values are ranged in between (5-15). Table (5) presents the average SVM classifier accuracy of the used 80 EHD features with different edge threshold values applied on 20 samples of airplane image. It is shown that the threshold value of 12 gave best classification results in comparison with other values. Thus, the threshold value is determined to be 12 in the following processing.
Table (6) presents the average classification scores of several SVM classifier tests based on EHD features, in which the 80 extracted features per sample are analyzed by the scatter analysis to select the most discriminative features that shows best classification result. The highest classification score is achieved at the eighth run when the classification depends on a number of contributed features is 69 features. It is noticeable that the classification score increases with discarding the weak feature at each run time till reaching the best score of about 72.66% at the eighth run, and then this score decreases with increasing the number of discarding features. This indicates that the coalition work of the EHD features becomes optimal when the contributed features are just 69 features. 256 features are used to be contributed in the classification decision. In the proposed SIC method, two kinds of the classification accuracy approaches have been adopted, first a single classifier and second the combine of multiple classifiers result. All the experiment results of the previous sections show that a single classifier result which means that the classification accuracy take with separate feature extraction method according to its feature vector. Mainly the previous method show that classes can have more than one template to represent them, so that in order to increase the information of each sample and make the final classification decision depend on multiple feature vector to improve the categorization problem and the final accuracy result.
In the proposed SIC method, the post processing of feature combination is applied by use two types of the fusion system which are the mean and product rule, both of these rule applied to get the experiment performance of the final decision, these two types are both kind of ensemble approach. After testing using different features vector output from the three methods, the best classification decision founded when the combination between the classifier output of three feature extraction methods which are GLCM, RLBP and EHD, Table (9) show the combination results with mean and product rule methods.

V. CONCLUSION
By interpretation the results of from the combination of the three feature extraction methods, the results show that the combination of multiple classifier is better than the output from single classifier. On the other hand the comparison between the two combinations of the fusion methods with their classification accuracy, the product rule show that better result with accuracy 99.516664% and Standard deviation 0.1657362. also the converting of sample image from color to grayscale effect directly the classification accuracy result, The feature selection method work good into select the most informative descriptors from different feature extraction method, as a result it make the classification accuracy result better.

VI. ACKNOWLEDGMENT
I special thanks to my supervisor Prof. Dr. Mohammed S. Altaei who gave me all the supporting, helpfulness, assistance, encouragement, valuable advice, for giving me the major steps to go on to explore the subject, sharing with me the ideas. Grateful Thanks are due to the Head of Computer Science Department, and the staff of the Department at College of Sciences of Al-Nahrain University for their kind attention.