Incorporating density in k-nearest neighbors regression
Main Article Content
Abstract
The application of the traditional k-nearest neighbours in regression analysis suffers from several difficulties when only a limited number of samples are available.  In this paper, two decision models based on density are proposed. In order to reduce testing time, a k-nearest neighbours table (kNN-Table) is maintained to keep the neighbours of each object x along with their weighted Manhattan distance to x and a binary vector representing the increase or the decrease in each dimension compared to x’s values. In the first decision model, if the unseen sample having a distance to one of its neighbours x less than the farthest neighbour of x’s neighbour then its label is estimated using linear interpolation otherwise linear extrapolation is used. In the second decision model, for each neighbour x of the unseen sample, the distance of the unseen sample to x and the binary vector are computed. Also, the set S of nearest neighbours of x are identified from the kNN-Table. For each sample in S, a normalized distance to the unseen sample is computed using the information stored in the kNN-Table and it is used to compute the weight of each neighbor of the neighbors of the unseen object. In the two models, a weighted average of the computed label for each neighbour is assigned to the unseen object. The diversity between the two proposed decision models and the traditional kNN regressor motivates us to develop an ensemble of the two proposed models along with traditional kNN regressor. The ensemble is evaluated and the results showed that the ensemble achieves significant increase in the performance compared to its base regressors and several related algorithms.
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.
References
Benedetti, J.K., On the nonparametric estimation of regression functions. Journal of the Royal Statistical Society: Series B (Methodological), 1977. 39(2): p. 248-253.
Stone, C.J., Consistent nonparametric regression. The annals of statistics, 1977: p. 595-620.
Buza, K., A. Nanopoulos, and G. Nagy, Nearest neighbor regression in the presence of bad hubs. Knowledge-Based Systems, 2015. 86: p. 250-260.
Cover, T. and P. Hart, Nearest neighbor pattern classification. IEEE transactions on information theory, 1967. 13(1): p. 21-27.
Hu, C., et al., Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery. Applied Energy, 2014. 129: p. 49-55.
Keller, J.M., M.R. Gray, and J.A. Givens, A fuzzy k-nearest neighbor algorithm. IEEE transactions on systems, man, and cybernetics, 1985(4): p. 580-585.
Chen, H.-L., et al., An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach. Expert systems with applications, 2013. 40(1): p. 263-271.
Yu, S., S. De Backer, and P. Scheunders, Genetic feature selection combined with composite fuzzy nearest neighbor classifiers for hyperspectral satellite imagery. Pattern Recognition Letters, 2002. 23(1-3): p. 183-190.
Mailagaha Kumbure, M. and P. Luukka, A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance. Granular Computing, 2022. 7(3): p. 657-671.
Rastin, N., M.Z. Jahromi, and M. Taheri, A generalized weighted distance k-nearest neighbor for multi-label problems. Pattern Recognition, 2021. 114: p. 107526.
Nguyen, B., C. Morell, and B. De Baets, Large-scale distance metric learning for k-nearest neighbors regression. Neurocomputing, 2016. 214: p. 805-814.
Koloseni, D., J. Lampinen, and P. Luukka, Optimized distance metrics for differential evolution based nearest prototype classifier. Expert Systems With Applications, 2012. 39(12): p. 10564-10570.
Koloseni, D., J. Lampinen, and P. Luukka, Differential evolution based nearest prototype classifier with optimized distance measures for the features in the data sets. Expert Systems with Applications, 2013. 40(10): p. 4075-4082.
Shirkhorshidi, A.S., S. Aghabozorgi, and T.Y. Wah, A comparison study on similarity and dissimilarity measures in clustering continuous data. PloS one, 2015. 10(12): p. e0144059.
Cai, L., et al., A sample-rebalanced outlier-rejected $ k $-nearest neighbor regression model for short-term traffic flow forecasting. IEEE access, 2020. 8: p. 22686-22696.
Zhou, Y., M. Huang, and M. Pecht, Remaining useful life estimation of lithium-ion cells based on k-nearest neighbor regression with differential evolution optimization. Journal of Cleaner Production, 2020. 249: p. 119409.
Durbin, M., et al., K-nearest neighbors regression for the discrimination of gamma rays and neutrons in organic scintillators. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 2021. 987: p. 164826.
Yao, Z. and W.L. Ruzzo. A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data. in BMC bioinformatics. 2006. BioMed Central.
Chen, J. and H.Y. Lau. Learning the inverse kinematics of tendon-driven soft manipulators with K-nearest Neighbors Regression and Gaussian Mixture Regression. in 2016 2nd International Conference on Control, Automation and Robotics (ICCAR). 2016. IEEE.
Adege, A.B., et al. Indoor localization using K-nearest neighbor and artificial neural network back propagation algorithms. in 2018 27th Wireless and Optical Communication Conference (WOCC). 2018. IEEE.
Chen, Y. and Y. Hao, A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction. Expert Systems with Applications, 2017. 80: p. 340-355.
Dell'Acqua, P., et al., Time-aware multivariate nearest neighbor regression methods for traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, 2015. 16(6): p. 3393-3402.
Cheng, C.-H., C.-P. Chan, and Y.-J. Sheu, A novel purity-based k nearest neighbors imputation method and its application in financial distress prediction. Engineering Applications of Artificial Intelligence, 2019. 81: p. 283-299.
Guillén, A., et al., New method for instance or prototype selection using mutual information in time series prediction. Neurocomputing, 2010. 73(10-12): p. 2030-2038.
Biau, G., et al., An Affine Invariant k-Nearest Neighbor Regression Estimate.
Bergamasco, L.C.C. and F.L. Nunes, Intelligent retrieval and classification in three-dimensional biomedical images—a systematic mapping. Computer Science Review, 2019. 31: p. 19-38.
Rodrigues, É.O., Combining Minkowski and Chebyshev: New distance proposal and survey of distance metrics using k-nearest neighbours classifier. Pattern Recognition Letters, 2018. 110: p. 66-71.
Huo, J., et al., Mahalanobis distance based similarity regression learning of NIRS for quality assurance of tobacco product with different variable selection methods. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 2021. 251: p. 119364.
Goguen, J., LA Zadeh. Fuzzy sets. Information and control, vol. 8 (1965), pp. 338–353.-LA Zadeh. Similarity relations and fuzzy orderings. Information sciences, vol. 3 (1971), pp. 177–200. The Journal of Symbolic Logic, 1973. 38(4): p. 656-657.
Zeng, S., S.-M. Chen, and M.O. Teng, Fuzzy forecasting based on linear combinations of independent variables, subtractive clustering algorithm and artificial bee colony algorithm. Information Sciences, 2019. 484: p. 350-366.
Nikoo, M.R., R. Kerachian, and M.R. Alizadeh, A fuzzy KNN-based model for significant wave height prediction in large lakes. Oceanologia, 2018. 60(2): p. 153-168.
Liu, X., et al., Effects of temperature on life history traits of Eodiaptomus japonicus (Copepoda: Calanoida) from Lake Biwa (Japan). Limnology, 2014. 15(1): p. 85-97.
Dheeru, D. and E.K. Taniskidou, UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. 2017.
Derrac, J., et al., Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput, 2015. 17.
Kurz-Kim, J.-R. and M. Loretan, On the properties of the coefficient of determination in regression models with infinite variance variables. Journal of econometrics, 2014. 181(1): p. 15-24.
Karahoca, A., Advances in data mining knowledge discovery and applications. 2012: BoD–Books on Demand.
Chen, H.-L., et al. An adaptive fuzzy k-nearest neighbor method based on parallel particle swarm optimization for bankruptcy prediction. in Pacific-asia conference on knowledge discovery and data mining. 2011. Springer.
Yang, W., K. Wang, and W. Zuo, Neighborhood component feature selection for high-dimensional data. J. Comput., 2012. 7(1): p. 161-168.
Yang, W., K. Wang, and W. Zuo, Fast neighborhood component analysis. Neurocomputing, 2012. 83: p. 31-37.
Qin, C., et al., Unsupervised neighborhood component analysis for clustering. Neurocomputing, 2015. 168: p. 609-617.
Xiong, R., et al., A data-driven method for extracting aging features to accurately predict the battery health. Energy Storage Materials, 2023. 57: p. 460-470.
Drucker, H., et al., Support vector regression machines. Advances in neural information processing systems, 1996. 9.
Tibshirani, R., Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 1996. 58(1): p. 267-288.
Jammalamadaka, S.R., Introduction to linear regression analysis. 2003, Taylor & Francis.