DATA MINING TECHNIQUE TO ANALYZE SOIL NUTRIENTS BASED ON HYBRID CLASSIFICATION

: Data mining methods are greatly admired in the research field of agriculture. The agriculture factors weather, rain, soil, pesticides and fertilizers are the main responsible aspect to raise the production of yields. The fundamental basic key aspect of agriculture is Soil for crop growing. Examination of soil is a noteworthy part of soil asset management in horticulture. The soil investigation is exceptionally useful for cultivators to discover which sort of harvests to be developed in a specific soil condition. The main target of this work is to investigate soil supplements utilizing data mining classification techniques. A large data set of soil nutrients status database was collected from the Department of Agriculture, Cooperation and Farmers Welfare. The database contains measurement of soil nutrients for all different states. This work takes some district of Tamil Nadu in India to analyze the soil nutrients. Distinctive sort's soil has diverse variety of supplements. This paper chooses Nitrogen, Phosphorus, Potassium, Calcium, Magnesium, Sulfur, Iron, Zinc, and so forth, nutrients for investigating the soil supplements utilizing Naïve Bayes, Decision Tree and Hybrid approach of Naïve Bayes and Decision Tree. The performance of the classification algorithms are compared based on the following two factors: accuracy and execution time.


I. INTRODUCTION
Horticulture is for the most part reliant on the soil quality yet over the long haul increasingly agrarian generation brings about the loss of supplements present in the soil. It require recognizing methods that will back off this disposal of supplements and furthermore will restore the required supplements with the soil, so continue getting high caliber and great amount of crop productions [1]. In horticulture, great soil wellbeing implies capacity of soil to forces physical, compound and organic exercises for reliable profitability of high product yield. Great nature of soil guarantees us for maintenance and arrival of water and supplements, upgrade and steady root development while keeping up biotic condition, giving the normal outcome and opposes foulness [2].
Soil is very essential for the plant life. It is composed of solids (minerals and organic matters), liquids (water and dissolved substances), and gases (mostly oxygen and carbon dioxide) and contains living organisms. All these elements provide its physical and chemical properties. Managing the soil properly is necessary in order to preserve its fertility, obtain better yield and respect the environment. Testing the soil on the other hand is a must in order to manage it properly.
A soil test is the investigation of a soil example to discover supplement substance, composition and different attributes. Soil tests are normally performed to decide richness and demonstrate insufficiencies that should be cured [3]. Soil nutrients examination is valuable for agriculturist to figure out which kind of yields to be developed in a specific soil condition. In this work data mining classification methods are utilized to investigation the soil nutrients Data mining is a process of extracting information from a data set and converts it into an understandable structure for further use. Different data investigation methods are accessible for farming exploration field. Classification is one of the data mining techniques that automatically create a model of classes from a set of records that contains class labels. Popular classification techniques include decision trees, neural networks, k-nearest neighbor, SVM, and Naïve Bayesian classifier etc.
Soil has three types of characteristics: physical, chemical, and biological. This paper mainly focuses on chemical characteristics. Chemical properties of soil involve the management of soil nutrients at the most basic level. Mineral nutrients within the soil have either a positive or a negative charge. The nutrients with a positive charge (including calcium, magnesium, potassium, ammonium, and sodium) are held by negatively charged clay particles and organic matter.
The soil supplies the following essential nutrients for proper plant growth and production. These nutrients can originate from weathered minerals or from decomposing organic matter. The nutrients are classified into three forms Primary, Secondary and Micro. The primary nutrients are used in the greatest amount by the plant. Analyzing nutrients is one of main topic in agriculture field. It is mostly helpful for agriculturists to figure out which sort of yields to be developed in a specific soil condition. This paper focuses on classification of the soil nutrients analysis based on the selected nutrients for some district (Ariyalur, Coimbatore, Karur, Salem, Thanjavur and Trichy) in Tamil Nadu.

II. RELATED WORK
Baskar et al., [4] examination the soil data using distinctive algorithms and forecasting method. A report is exhibited using different classification algorithms i.e. Naïve Bayes, J48 (C4.5), JRip with the assistance of data mining tool WEKA. Jay Gholap [5] predicts soil richness using decision tree algorithm. In [6], the author forecasted soil characteristics and examined soil data using classification techniques. Soil properties, for example, pH value, Electrical Conductivity (EC), Potassium, Iron, Copper, etc. were classified using classification algorithms like Naïve Bayes, J48 and JRip.
Suman et al., [7] use KNN, Naïve Bayes and J48 for analyzing soil data. This research suggests the fertilizer based on the level of nutrients found in the soil test set.
Bhuyar [2] focus on classification of soil richness rate using J48, Naïve Bayes, and Random forest algorithm. J48 algorithm gives preferable outcome over other algorithms. Decision tree form by J48 algorithm assists the cultivator and decision makers to identify the soil richness rate and on the premise of nutrients found in the soil specimen, different fertilizers can be suggested.
Hemageetha et al., [8] investigates whether the Salem district soil is appropriate or non suitable for crops development in view of pH value using data mining classification techniques. The outcome demonstrate that accuracy of J48 is high compared to Naive bayes , Jrip and BayesNet. And it also shows that the major part of the Salem locale soil is appropriate for development of many harvests.
Bhargavi et al., [9] argued GATree, Fuzzy Classification rules and Fuzzy C-Means algorithm for classifying soil surface in horticulture soil data. Classification based on Fuzzy rules gives much performance than GATree.
In [10], Naïve Bayes, JRip and J48 algorithms are compared. JRip classification algorithm gives improved outcome and it is efficiently classified into maximum number of instances comparing with the other two.
Ramesh et al., [11] explains comparison of different classifiers and the outcome of this research could improve the management and systems of soil users, throughout a large fields that include agriculture, horticulture, environmental and land use management.
Dildarkhan et al., [12] gives an investigation of the soil data using different classification algorithms and forecasting methods. Soil testing research centers examination the soil and gives the sample data set. It will require a considerable measure of time to characterize the soil datasets manually.
Shivnath et al., [13] an analysis of soil properties using Back Propagation Network. This research uses the gradient descent algorithm in training the samples. Back Propagation algorithm produces the better result.
This work focuses on a method that uses data mining technique to analyze the soil nutrients based on hybrid algorithm.

III. Soil Nutrients Analysis
Data Mining is crucial to determine the agricultural related facts such as soil fertility, yield prediction and soil nutrients analysis. This section analysis soil nutrients based on hybrid classification algorithm. Fig 1 shows the architecture of proposed work.

A. Overview of Dataset
To start with any data mining problem, initially bring all the data together.
The data set contains 13 attributes. The attributes are District, pH, EC, OC, N, P, K, S, Zn, Fe, Cu, Mn, B. Table 1 shows the description of attributes.

B. Methodology
The proposed work starts with preprocessing step. In this step the collected data was preprocessed. Some records have missing attribute values, that records were removed from the dataset. In the data conversion step, the preprocessed data was converted based on the nutrients values. Table 2 shows the value for three levels of nutrients.
After data conversion, the macro and micro nutrients are split into three types. The type-1 contains pH, EC, OC, N attributes. Type-2 contains P, K, S, Zn. And Type-3 contains Fe, Cu, Mn, B attributes.

Naïve Bayes
In machine learning, an undemanding probabilistic classification technique is Naïve Bayes classification. It depends on the bayes theorem with independence attributes [10]. The class labels are predicted based on the probability. For classification, a small amount of training data is expected to predict the class labels. [14].

Decision Tree (J48):
J48 is an upgraded version of C4.5 decision tree algorithm. A decision tree is really a predictive machine learning model, which decides the dependent variable (i.e, Target value) based on various attributes of the available training data set. The internal nodes of a decision tree denote varied attributes, the connecting branches of various nodes give us the likely values of the attributes and the terminal node states the classification of the dependent variable. J48 Decision Tree Classifier uses two phases: Tree construction and Tree Pruning. [15] J48 examine the normalized information gain that actually the outcomes the splitting the data by choosing an attribute. In order to make the decision, the attribute utmost standardized information gain is used. The minor subsets are returned by the algorithm. The splitting methods stop if a subset belongs to the same class in all the instances. J48 constructs a decision node using the expected values of the class. [10] Hybrid Algorithm The hybrid classifier consolidates the Decision Tree Classifiers property to isolate out dependent attributes, and the effective classification by the Naïve Bayes Classifier on independent attributes.
A large data set of soil nutrients status database was extracted from the Department of Agriculture, Cooperation and Farmers Welfare. The database contains measurement of soil nutrients for all different states. This work collects data for selected district of Tamil Nadu. The selected districts are Ariyalur, Coimbatore, Karur, Salem, Thanjavur and Trichy. It contains nearly 3000 records. Table 3 shows the sample dataset.

IV. EXPERIMENTAL RESULT
The collected data set was preprocessed and some records are removed from the data. After preprocessing step, the data set was converted into Low, Medium and High based on nutrients level mentioned in Table 2.The converted data set is shown in Table 4.
Apply Naïve Bayes, Decision Tree and Hybrid classification algorithm to classify the soil nutrients as Very High, High, Medium, Low and Very Low.
The number of correctly classified instances and incorrectly classified instances are given in Fig 2, 3 and 4 for (pH, EC, OC, N), (P,K,S,Zn) and (Fe,Cu,Mn,B). The comparative analysis of classifiers is given in Table 5, 6 and 7. Fig 5 shows the execution time of classification algorithms with three types of nutrients. Fig 6 shows the accuracy rate of classification algorithms with three types of nutrients. It is observed from Fig 5 and Fig 6 that the Hybrid algorithm is able to classify the dataset in less time with better classification accuracy rate.

V. CONCLUSION
Data mining techniques in cultivation will help the agriculturist to enhance the crop productivity. The yield production is chiefly based on the soil nutrients. The investigation of soil nutrients present whether which type crop can be refined in specifically soil. This paper proposed an investigation of soil nutrients using distinctive algorithms. The comparative examination of three classification algorithms like Naïve Bayes, Decision and hybrid algorithm was anticipated. Hybrid classification algorithm gives enhanced result for this dataset and is correctly classified into maximum number of instances comparing with the other two. Hybrid algorithm can be suggested to dissect the soil nutrients.