A Study on Privacy Preserving Big Data Mining: Techniques and Challenges

Main Article Content

anuradha dahiya


The basic goal of data mining algorithms is to extract previously undiscovered patterns from the data. When mining the data, sensitive and confidential information should be secured simultaneously to protect privacy. Due to the widespread use of information technology, enormous amounts of data are being produced at an exponential rate by several organisations, including hospitals, insurance providers, banks, e-commerce, and stock exchanges, making privacy a crucial concern in data mining. Anonymization, Perturbation, Generalization, and Cryptography are some of the privacy-preserving data mining techniques that have been proposed in the literature. In this study, we have reviewed all of these state of art techniques and presented a tabular comparison of work done by different authors as well as discussed the challenges of privacy preserving data mining.


Download data is not yet available.

Article Details



M. Chen, S. Mao, and Y. Liu, ‘Big Data: A Survey’, Mob. Netw. Appl., vol. 19, no. 2, pp. 171–209, Apr. 2014, doi: 10.1007/s11036-013-0489-0.

S. Yu, ‘Big Privacy: Challenges and Opportunities of Privacy Study in the Age of Big Data’, IEEE Access, vol. 4, pp. 2751–2763, 2016, doi: 10.1109/ACCESS.2016.2577036.

‘The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf’. Accessed: Jul. 06, 2022. [Online]. Available: http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-Kaufmann-Series-in-Data-Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-Concepts-and-Techniques-3rd-Edition-Morgan-Kaufmann-2011.pdf

R. Mendes and J. P. Vilela, ‘Privacy-Preserving Data Mining: Methods, Metrics, and Applications’, IEEE Access, vol. 5, pp. 10562–10582, 2017, doi: 10.1109/ACCESS.2017.2706947.

J. Marques and J. Bernardino, ‘Analysis of Data Anonymization Techniques’:, in Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Budapest, Hungary, 2020, pp. 235–241. doi: 10.5220/0010142302350241.

P. Samarati and L. Sweeney, ‘Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression’, p. 19.

A. Kiran and N. Shirisha, ‘K-Anonymization approach for privacy preservation using data perturbation techniques in data mining’, Mater. Today Proc., Jun. 2022, doi: 10.1016/j.matpr.2022.05.117.

S. Madan and P. Goswami, ‘Adaptive Privacy Preservation Approach for Big Data Publishing in Cloud using k-anonymization’, Recent Adv. Comput. Sci. Commun. Former. Recent Pat. Comput. Sci., vol. 14, no. 8, pp. 2678–2688, Oct. 2021, doi: 10.2174/2666255813999200630114256.

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, ‘L -diversity: Privacy beyond k -anonymity’, ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, p. 3, Mar. 2007, doi: 10.1145/1217299.1217302.

B. B. Mehta and U. P. Rao, ‘Improved l-diversity: Scalable anonymization approach for Privacy Preserving Big Data Publishing’, J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1423–1430, Apr. 2022, doi: 10.1016/j.jksuci.2019.08.006.

O. Temuujin, J. Ahn, and D.-H. Im, ‘Efficient L-Diversity Algorithm for Preserving Privacy of Dynamically Published Datasets’, IEEE Access, vol. 7, pp. 122878–122888, 2019, doi: 10.1109/ACCESS.2019.2936301.

N. Li, T. Li, and S. Venkatasubramanian, ‘t-Closeness: Privacy Beyond k-Anonymity and l-Diversity’, in 2007 IEEE 23rd International Conference on Data Engineering, Apr. 2007, pp. 106–115. doi: 10.1109/ICDE.2007.367856.

D. Roy and S. Jena, ‘Determining t in t-closeness using Multiple Sensitive Attributes’, Int. J. Comput. Appl., vol. 70, pp. 47–51, May 2013, doi: 10.5120/12179-8291.

N. Nasiri and M. Keyvanpour, ‘Classification and Evaluation of Privacy Preserving Data Mining Methods’, in 2020 11th International Conference on Information and Knowledge Technology (IKT), Dec. 2020, pp. 17–22. doi: 10.1109/IKT51791.2020.9345620.

D. Liestyowati, ‘Public Key Cryptography’, J. Phys. Conf. Ser., vol. 1477, no. 5, p. 052062, Mar. 2020, doi: 10.1088/1742-6596/1477/5/052062.

K. Munjal and R. Bhatia, ‘A systematic review of homomorphic encryption and its contributions in healthcare industry’, Complex Intell. Syst., May 2022, doi: 10.1007/s40747-022-00756-z.

J. Liu, Y. Tian, Y. Zhou, Y. Xiao, and N. Ansari, ‘Privacy preserving distributed data mining based on secure multi-party computation’, Comput. Commun., vol. 153, pp. 208–216, Mar. 2020, doi: 10.1016/j.comcom.2020.02.014.

N. Patel and S. Patel, ‘A Study on Data Perturbation Techniques in Privacy Preserving Data Mining’, vol. 02, no. 09, p. 6.

A. Shah and R. Gulati, ‘Evaluating applicability of perturbation techniques for privacy preserving data mining by descriptive statistics’, in 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Sep. 2016, pp. 607–613. doi: 10.1109/ICACCI.2016.7732113.

K. Chen and L. Liu, ‘Geometric data perturbation for privacy preserving outsourced data mining’, Knowl. Inf. Syst., vol. 29, no. 3, pp. 657–695, Dec. 2011, doi: 10.1007/s10115-010-0362-4.

A. Siddhpura and P. D. V. Vekariya, ‘An approach of Privacy Preserving Data mining using Perturbation & Cryptography Technique’, Int. J. Future Revolut. Comput. Sci. Commun. Eng., vol. 4, no. 4, Art. no. 4, Apr. 2018.

J. Vaidya, B. Shafiq, W. Fan, D. Mehmood, and D. Lorenzi, ‘A Random Decision Tree Framework for Privacy-Preserving Data Mining’, IEEE Trans. Dependable Secure Comput., vol. 11, no. 5, pp. 399–411, Sep. 2014, doi: 10.1109/TDSC.2013.43.

R. Kaur and M. Bansal, ‘Transformation approach for boolean attributes in privacy preserving data mining’, in 2015 1st International Conference on Next Generation Computing Technologies (NGCT), Sep. 2015, pp. 644–648. doi: 10.1109/NGCT.2015.7375200.

A. S. M. T. Hasan, Q. Jiang, J. Luo, C. Li, and L. Chen, ‘An effective value swapping method for privacy preserving data publishing: An effective value swapping method for privacy preserving data publishing’, Secur. Commun. Netw., vol. 9, Jul. 2016, doi: 10.1002/sec.1527.

K. Abrar Ahmed, Department of Computer Science and Engineering, Manonmaniam Sundaranar University, Chennai – 600017, Tamil Nadu, India, H. Abdul Rauf, and Sree Sastha Institute of Engineering and Technology, Chennai – 600113, Tamil Nadu, India, ‘Privacy Preserving Data using Fuzzy Hybrid Data Transformation Technique’, Indian J. Sci. Technol., vol. 10, no. 24, pp. 1–6, Jun. 2017, doi: 10.17485/ijst/2017/v10i24/114039.

G. Li and R. Xue, ‘A New Privacy-Preserving Data Mining Method Using Non-negative Matrix Factorization and Singular Value Decomposition’, Wirel. Pers. Commun., vol. 102, no. 2, pp. 1799–1808, Sep. 2018, doi: 10.1007/s11277-017-5237-5.

A. Kiran and D. D. Vasumathi, ‘Data Mining: Random Swapping based Data Perturbation Technique for Privacy Preserving in Data Mining’, DATA Min., vol. 8, no. 1, p. 15, 2019.

D. Vashi, H. B. Bhadka, K. Patel, and S. Garg, ‘An Efficient Hybrid Approach of Attribute Based Encryption For Privacy Preserving Through Horizontally Partitioned Data’, Procedia Comput. Sci., vol. 167, pp. 2437–2444, Jan. 2020, doi: 10.1016/j.procs.2020.03.296.

N. Kousika and K. Premalatha, ‘An improved privacy-preserving data mining technique using singular value decomposition with three-dimensional rotation data perturbation’, J. Supercomput., vol. 77, no. 9, pp. 10003–10011, Sep. 2021, doi: 10.1007/s11227-021-03643-5.

T. Jahan, G. R. Reddy, K. Shekhar, and M. Swapna, ‘Novel hybrid geometric data perturbation technique by means of sampling data intervals’, Mater. Today Proc., Jul. 2021, doi: 10.1016/j.matpr.2021.06.420.

S. A. Abdelhameed, S. M. Moussa, N. L. Badr, and M. Essam Khalifa, ‘The Generic Framework of Privacy Preserving Data Mining Phases: Challenges & Future Directions’, in 2021 Tenth International Conference on Intelligent Computing and Information Systems (ICICIS), Dec. 2021, pp. 341–347. doi: 10.1109/ICICIS52592.2021.9694174.