PPS-FPCM: Privacy-Preserving Semi-Fuzzy Possibilistic c-means

Main Article Content

Mohamed Abdelhafeez Mahfouz
Zeinab Abd El Haliem
Ehab Emam Zakaria

Abstract

Applying traditional clustering techniques to big data on the cloud while preserving the privacy of the data is a challenge due to the required division and exponential operations in each iteration, which complicate its implementation on encrypted data. Several existing approaches are based on approximating the formulas of centers, weights, and memberships as three polynomial functions according to the multivariate Taylor formula. However, they usually suffer an increase in complexity and a slight drop in accuracy. In this paper, a novel Privacy-Preserving semi-fuzzy clustering algorithm based on the possibilistic paradigm, termed PPS-FPCM, is presented. Its main feature is that it avoids exponentiation and division operations, at each iteration, without losing accuracy. By restricting the typicality to an ordered set of discrete values between zero and one decided by the data owner (DO), the computation is simplified. The second key idea is the use of this soft typicality to detect outliers and compute the corresponding semi-fuzzy memberships, which is used to increase the in-between cluster distance. However, the initial typicality requires a magnitude relation comparison, which is still difficult to do over encrypted data. In this research study, we show how the existing incomplete re-encryption method can be used to tackle this problem.  In each iteration, centers and distances to the new centers are computed on a calculator cloud server (CaCS) which is responsible for storing the cipher texts of the (DO)’s data and processing them. Then, CaCS sends the incompletely re-encrypted difference between these distances and iteratively updated bin values that correspond to the discrete possibilistic memberships that are initially decided by the (DO) to the comparator cloud server (CoCS). CoCS decrypts the difference and returns the results of comparisons. When CaCS receives the results of comparison from CoCS, it decides on an appropriate soft typicality or resends the difference of the same distance to another bin value. The required number of comparisons is O(log the number of bins). CaCS iteratively computes the corresponding semi-fuzzy memberships, computes the refined memberships, and updates the centers. In the end, CaCS sends the final soft memberships and centers to the (DO).  The proposed algorithm is applicable to normal data and homomorphically encrypted data, is more effective than several related algorithms, and can produce accurate results using large enough (16 or more) discrete values with a high reduction on runtime as the number of comparisons is much less complex than exponential and division operations with added communication cost between CaCS and CoCS.

Downloads

Download data is not yet available.

Article Details

Section
Articles
Author Biographies

Mohamed Abdelhafeez Mahfouz, Associate Professor, Faculty of Computer Science, October University for Modern Sciences and Arts - MSA

Associate Professor, Faculty of Computer Science, October University for Modern Science and Arts - MSA,

Associate Editor for the journal of Intelligent and Fuzzy Systems

Zeinab Abd El Haliem, College of Computer Science, MSA University, Egypt

Lecturer at the College of Computer Science, MSA University, Egypt

Ehab Emam Zakaria, College of Computer Science, MSA University, Egypt

Associate Professor at the College of Computer Science, MSA University, Egypt

References

Al-Fuqaha, A., et al., Internet of things: A survey on enabling technologies, protocols, and applications. IEEE communications surveys & tutorials, 2015. 17(4): p. 2347-2376.

Deng, X., et al., Confident information coverage hole healing in hybrid industrial wireless sensor networks. IEEE Transactions on Industrial Informatics, 2017. 14(5): p. 2220-2229.

Zhao, Z., et al., Link-correlation-aware data dissemination in wireless sensor networks. IEEE Transactions on Industrial Electronics, 2015. 62(9): p. 5747-5757.

ul Islam, F.M.M. and M. Lin, Hybrid DVFS scheduling for real-time systems based on reinforcement learning. IEEE Systems Journal, 2015. 11(2): p. 931-940.

Zhang, Q., et al., An improved deep computation model based on canonical polyadic decomposition. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017. 48(10): p. 1657-1666.

Zhang, Q., et al., A survey on deep learning for big data. Information Fusion, 2018. 42: p. 146-157.

Khanmohammadi, S., N. Adibeig, and S. Shanehbandy, An improved overlapping k-means clustering method for medical applications. Expert Systems with Applications, 2017. 67: p. 12-18.

Bezdek, J.C., Objective function clustering, in Pattern recognition with fuzzy objective function algorithms. 1981, Springer. p. 43-93.

Krishnapuram, R. and J.M. Keller, The possibilistic c-means algorithm: insights and recommendations. IEEE transactions on Fuzzy Systems, 1996. 4(3): p. 385-393.

Krishnapuram, R. and J.M. Keller, A possibilistic approach to clustering. IEEE transactions on fuzzy systems, 1993. 1(2): p. 98-110.

Rumelhart, D.E. and D. Zipser, Feature discovery by competitive learning. Cognitive science, 1985. 9(1): p. 75-112.

Yang, M.-S., W.-L. Hung, and D.-H. Chen, Self-organizing map for symbolic data. Fuzzy Sets and Systems, 2012. 203: p. 49-73.

Wang, W., J. Yang, and R. Muntz. STING: A statistical information grid approach to spatial data mining. in VLDB. 1997.

Ester, M., et al. A density-based algorithm for discovering clusters in large spatial databases with noise. in Kdd. 1996.

Murtagh, F. and P. Contreras, Algorithms for hierarchical clustering: an overview, II. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2017. 7(6): p. e1219.

Yu, H. and J. Fan, Cutset-type possibilistic c-means clustering algorithm. Applied Soft Computing, 2018. 64: p. 401-422.

Xenaki, S.D., K.D. Koutroumbas, and A.A. Rontogiannis, A novel adaptive possibilistic clustering algorithm. IEEE Transactions on Fuzzy Systems, 2015. 24(4): p. 791-810.

Armbrust, M., et al., A view of cloud computing. Communications of the ACM, 2010. 53(4): p. 50-58.

Zhang, Q. and Z. Chen, A weighted kernel possibilistic câ€means algorithm based on cloud computing for clustering big data. International Journal of Communication Systems, 2014. 27(9): p. 1378-1391.

Havens, T.C., et al., Fuzzy c-means algorithms for very large data. IEEE Transactions on Fuzzy Systems, 2012. 20(6): p. 1130-1146.

Zhang, Q., et al., A High-Order Possibilistic $ C $-Means Algorithm for Clustering Incomplete Multimedia Data. IEEE Systems Journal, 2015. 11(4): p. 2160-2169.

Zhang, Q., et al., Privacy-preserving double-projection deep computation model with crowdsourcing on cloud for big data feature learning. IEEE Internet of Things Journal, 2017. 5(4): p. 2896-2903.

Zhang, Q., et al., Secure weighted possibilistic c-means algorithm on cloud for clustering big data. Information Sciences, 2019. 479: p. 515-525.

Selim, S.Z. and M.A. Ismail, Soft clustering of multidimensional data: a semi-fuzzy approach. Pattern Recognition, 1984. 17(5): p. 559-568.

Lu, Q., et al. Secure collaborative outsourced data mining with multi-owner in cloud computing. in 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications. 2012. IEEE.

Mahfouz, M.A. and M. Ismail. Fuzzy relatives of the CLARANS algorithm with application to text clustering. in Proceedings of World Academy of Science, Engineering and Technology. 2009. Citeseer.

Mahfouz, M.A. and M.A. Ismail. Efficient soft relational clustering based on randomized search applied to selection of bio-basis for amino acid sequence analysis. in 2012 Seventh International Conference on Computer Engineering & Systems (ICCES). 2012. IEEE.

Mahfouz, M.A. and M.A. Ismail. Soft flexible overlapping biclustering utilizing hybrid search strategies. in International Conference on Advanced Machine Learning Technologies and Applications. 2012. Springer.

Jiang, L., et al., An effective comparison protocol over encrypted data in cloud computing. Journal of Information Security and Applications, 2019. 48: p. 102367.

Mohamed, A., SPCM: Efficient semi-possibilistic c-means clustering algorithm. 2022.

Zhang, Y., et al., Anonymous attributeâ€based proxy reâ€encryption for access control in cloud computing. Security and Communication Networks, 2016. 9(14): p. 2397-2411.

Xu, P., et al., Conditional identity-based broadcast proxy re-encryption and its application to cloud email. IEEE Transactions on Computers, 2015. 65(1): p. 66-79.

Khan, A.N., et al., Incremental proxy re-encryption scheme for mobile cloud computing environment. The Journal of Supercomputing, 2014. 68(2): p. 624-651.

Malkin, T., I. Teranishi, and M. Yung. Efficient circuit-size independent public key encryption with KDM security. in Annual International Conference on the Theory and Applications of Cryptographic Techniques. 2011. Springer.

Applebaum, B., Key-dependent message security: Generic amplification and completeness. Journal of cryptology, 2014. 27(3): p. 429-451.

Dheeru, D. and E.K. Taniskidou, UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. 2017.

Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. Available from: https://scikit-learn.org/stable/about.html.

UC Irvine Machine Learning Repository available at: http://archive.ics.uci.edu/ml/datasets.