Optimisation of ETL Process Using Partitioning and Parallelization Techniques

Kumari Deepika

Abstract


The key asset of an organization is organized data in a proper format which helps the decision makers to take critical business decisions. Organizations have multiple business functions that are generating data and it is increasingly important for decision makers to analyze and act on these data. Extract-Transformation-Loading (ETL) is a process in data warehousing which converts the structure of data and enables decision makers as well as other applications to access it. As the volume of data is growing so quickly, ETL processes have to deal with a large amount of data and manage workloads through many different data flows, they consume a significant amount of time in order to move data from source to target system. However in most of the systems the effectiveness of the decisions matters on how quickly the decision has been made. In this work, we have analyzed the behaviour of various operations to extract, transform and load the data. Since ETL processes have to complete their execution within a specified time window in order to meet SLA given by the organization. In this paper, we delve into the optimization methods of ETL processes in order to minimize the execution time of ETL jobs. We identify the bottlenecks which cause the delay in execution of ETL process. We provide the techniques using partition, parallelization, and multi-threading which will help us to optimize the execution of ETL process.
Keywords: ETL; Dataflow Partitioning; Parallelization; Optimization.

Full Text:

PDF


DOI: https://doi.org/10.26483/ijarcs.v8i3.3093

Refbacks

  • There are currently no refbacks.




Copyright (c) 2017 International Journal of Advanced Research in Computer Science