Optimisation of ETL Process Using Partitioning and Parallelization Techniques
Main Article Content
Abstract
The key asset of an organization is organized data in a proper format which helps the decision makers to take critical business decisions. Organizations have multiple business functions that are generating data and it is increasingly important for decision makers to analyze and act on these data. Extract-Transformation-Loading (ETL) is a process in data warehousing which converts the structure of data and enables decision makers as well as other applications to access it. As the volume of data is growing so quickly, ETL processes have to deal with a large amount of data and manage workloads through many different data flows, they consume a significant amount of time in order to move data from source to target system. However in most of the systems the effectiveness of the decisions matters on how quickly the decision has been made. In this work, we have analyzed the behaviour of various operations to extract, transform and load the data. Since ETL processes have to complete their execution within a specified time window in order to meet SLA given by the organization. In this paper, we delve into the optimization methods of ETL processes in order to minimize the execution time of ETL jobs. We identify the bottlenecks which cause the delay in execution of ETL process. We provide the techniques using partition, parallelization, and multi-threading which will help us to optimize the execution of ETL process.
Keywords: ETL; Dataflow Partitioning; Parallelization; Optimization.
Keywords: ETL; Dataflow Partitioning; Parallelization; Optimization.
Downloads
Download data is not yet available.
Article Details
Section
Articles
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.