DATABASE QUERY OPTIMIZATION USING GENETIC ALGORITHM

: The database query optimization technology has been progressively going forward as more advanced query optimizers come in the market. For the new emerging applications, the database management systems must pay attention on the time required for producing the answers to the user-submitted queries. Since the development of DBMS, the cost of execution of queries is of prime importance. The execution cost of queries play an important role in the performance of DBMS. Among the various operations performed, the Index selection, Join Ordering and its evaluation are contributing a lot in the performance of a query execution. In this paper, we have outlined the database query optimization problems and a brief introduction to Genetic Algorithm. Also a literature survey is presented on the previous work done in the area of Join ordering problem.


A. Query Optimization
The query optimization is a process of finding the most effective execution plan for the given user submitted query. Query optimization has been found very useful in increasing the database systems' performance in terms of time.
A query written in a high level language needs to convert into a form that system can understand and perform further processing. In its internal form, the relational algebra expression, there are number of variations available for representation. Also the various query optimization strategies and algorithms are available to compute the answer. Researchers have worked with various operations of the query to find out the most efficient query execution plan, various techniques to choose the optimal solution among the various methods, etc. Different query optimization techniques have been applied like rule based optimization, cost-based query optimization, deterministic optimization, randomized optimization and their variations.
Results of query optimization can be used by different emerging database management systems. The database users can get benefits of the optimization by getting the results to the query in a timely and predictable manner. Database vendors can use them to improve the efficiency of their DBMS which will provide support to the upcoming huge amount of data. On the other hand database designers can use them to decide which algorithms to use in certain situations, which limitations to cope up with, etc.[1] [2].

B. Genetic Algorithm
Genetic Algorithms (GAs) are the search algorithms based on the mechanics of Natural Selection and Natural Genetics. Potential solutions to the problem are encoded as a simple chromosome-like data structure and some recombination operators are applied to them. This activity helps in preserving the important information of the chromosome.
These algorithms are computationally simple yet powerful in their search for improvement. Genetic algorithms are finding more widespread applications in business, scientific and engineering circles.
GA works with a coding of the parameter set whereas the traditional techniques work directly with the parameter set. The search in the GA starts with a population of points but in the other procedures it starts with a single point. The objective functions are used as the information in GA while in the traditional methods some declarative or auxiliary knowledge is used. In the usual methods, deterministic rules are applied as transition rules whereas in the GA probabilistic rules are applied on the information.
Traversing the decision space is of prime importance. In most of the optimization techniques, some transition rules are used to start with a single point and determine the next point. It is mostly used for locating false peaks in multi-model (many peaked) search spaces. Whereas GA starts with a bucket of points simultaneously, i.e. a population of strings, climbing many peaks in parallel, this reduces the probability of finding a false peak.
Many search techniques require much auxiliary information in order to work properly. By contrast, GAs has no need for all this auxiliary information. To perform an effective search for better and better structures, they only require payoff values (objective function values) associated with individual strings. [3] proposed a GA for solving the Index Selection Problem (ISP). The results obtained by the implementation of GA indicated its reliability and efficiency in the area of optimization.

Jozef Kratica, Ivana Ljubic and Dusan Tosic
Anita Thengade and Rucha Dondal [4] addressed the basic functionality of GA and its operators. The paper also presented the comparison of GA with other problem solving techniques. The details of labs working on GA with the current working projects are also included.
A new version of genetic algorithm for parallel architecture was designed by Kristin Bennett, Michael C. Ferris and Yannis E. Ioannidis [5] of University of Wisconsin and obtained significant computational savings over the randomized methods by parallel implementation. A set of different queries, with size of each query consisting up to 16 joins, was tested on System-R algorithm and GA. The experiment found that GA works relatively better than System-R optimizer.
Michael L. Rupley, Jr [6] modified some of the basic existing techniques of query processing and optimization in his MiniDatabase Engine Application and compared identical queries on both existing and new version of MiniDatabase Engine application. Six new query execution speed enhancements were implemented on a test database consisting of several thousand records.
Prof. M. A. Pund, S. R. Jadhav and P. D. Thakare [7] applied Iterative Improvement method of Randomized Algorithm for solving the Join Ordering Problem and found that Randomized Algorithm and Genetic Algorithm are superior to dynamic programming in terms of running time.
Surajit Chaudhuri [8] provided an excellent introduction to the System-R optimizer in the context of Select-Project-Join queries. A brief overview of extensible optimizers-Starburst and Volcano/Cascades-was given.
N. Satyanarayana, SK. Sharfuddin and SK. Jan Bhasha [9] proposed a new dynamic query optimization algorithm based on the greedy algorithm that uses the randomized strategies. The execution cost of queries and system resources requirements were reduced significantly and applicable to both distributed and centralized database systems.
Pravin Chandra, Anurag Jain and Manoj Kr. Gupta [10] discussed the general query optimization techniques like CBO, RBO. Also presented the techniques used by the Oracle.
Surajit Chaudhuri and Kyuseok Shim [11] proposed greedy conservative heuristic as a technique to optimize single block of SQL with group-by. The implementation was with a System-R style optimizer. This approach extended the traditional optimization algorithms for Multi-block queries using pull-up as well as pull-down transformations. The paper also discussed the join-aggregate class of nested queries and queries containing views with aggregates.
The GA was applied to the Join Ordering Problem in the context of query optimization by Ishtiaq Ahmed, M. Rizwan Beg, Kapil Kumar Gupta and Mohd.Isha Mansoori [12]. It is found that GAs is the emerging techniques as higher probability of getting the best solutions for large query optimization problems. The results proved the applicability of GAs to the optimization problem.
Applications of GAs to query optimization have been analyzed by M. Sinha and S. V. Chande [13] and presented a framework for genetic query optimizer. Also genetic join order with various parameters is carried out along with a comparative analysis. Sushail S. J. Owais, Pavel Kromer and Vaclave Snasel [14] investigated the use of GA in the area of optimizing a Boolean query in IR system. The study concluded that the quality of initial population has a very great impact to have best results of Genetic Programming process.
Tansel Dokeroglu [15] developed a set of Parallel GAs for multi-way chain join queries of Distributed Database as his PhD work and compared the results with a Sequential GA, Sequential Dynamic Programming and a Parallel Exhaustive algorithm. Left-deep tree search space was used in the implementation.
Prof. S. V. Chande and Dr. Madhavi Sinha [16] presented a survey on applicability of GA in diverse fields. The paper also focuses on the use of GA in Join Problem and Index selection problem.
Michael Steinbrunn, Guido Moerkotte and Alfons Kemper [17] applied and compared several algorithms for the optimization of join expressions and concluded that Randomized and GA are much better suited for join operations; they require a longer running time but the results are far better. S. Vellev [18] reviewed a set of Join Ordering Problem approached by several classes of algorithms and their relative advantages and applicability.
Julian Aron Prenner [19] has given an explanation about query processing, declarative and procedural optimization steps involved and their working with the help of examples. This paper also explains the working of various planners like System-R, SQLite's planner and PostgreSQL's Genetic Planner.
The details involved in generating query evaluation plans and estimating them is presented in the paper by Christian [20]. The main emphasis is given on the application of heuristics for optimizer. Use of Pipelining, pushing selection and considering the columns having index on them can eventually help in better query optimization. The paper also explains the fundamental concepts of database query optimization and genetic algorithm.
Majid Khan and M. N. A. Khan [21] have addressed the importance of query optimization in the production database. They have reviewed the various query optimization techniques and approaches for both centralized and distributed database systems. A summary of these techniques along with their strengths and limitations has been reviewed.
Grzegorz Wojarnik [22] did a comparative analysis of the performances of databases like SQLite, MS SQL Server 2014, Firebird 2.5, etc. using genetic algorithm. The test dataset used for experiment is Warsaw Stock Exchange. The conclusion of the experiment is that SQLite database could be a best choice for using GA.
Stillger and Spiliopoulou [23] presented Genetic programming model for query optimization and Genetic Programming operators. They applied this model for parallel query optimization. The number joins considered for the experiment was 10-100 from the database tables having 10^3-10^6 tuples. The results are encouraging to use genetic programming for QO. Dr. P.K.Butey, Prof. Shweta Meshram and Dr. R.L. Sonolikar [24] implemented GA for database query optimization for solving the large join query problem. A basic overview of the Carquinyoli Genetic Optimizer based on Genetic Programming concludes that the use of selection method and best fitness function for processing individuals decreases the query processing time and CPU cost with respect to the number of joins involved in the query.

III. CONCLUSIONS
The Genetic Algorithm (GA) is widely accepted technique in solving the Join Ordering Problem. This study shows that the GA can be used to create a model that can be used to find an optimal or near-optimal solution to the join ordering problem.
The various literatures studied; guarantee that a new system, with GA operators can be build that will most likely improve the performance of large join query.

IV. ACKNOWLEDGMENT
I would like to acknowledge all the people assisted me from the beginning of the way in my research work. I am very honored to express my thanks to my family members and friends for encouraging me and their time to time suggestions for the improvement of the project.