A New Imputation Method based on GMDH. Changzheng He, Bing Zhu

Abstract. Most existing imputation methods do not take noise into consideration, whichisrarely the case in reality. In this paper, wecombines Group Method of Data Handling (GMDH) and the well-known Expectation Maximization (EM) algorithmand propose a new imputation algorithmto deal with missing values in noisy dataNumerous experiments and comparative studies on fourUCI datasets showthat our method GMDH imputation is more robust to noise than the otherimputation methods used as benchmarkat high noise level.

Keywords. Missing values, Noise, imputation,GMDH.

References.

1. Williams, D., et al., On classification with incomplete data. Ieee Transactions on Pattern Analysis and Machine Intelligence, 2007. 29(3): p. 427-436.

2. Lim, C.P., J.H. Leong, and M.M. Kuan, A hybrid neural network system for pattern classification tasks with missing features. Ieee Transactions on Pattern Analysis and Machine Intelligence, 2005. 27(4): p. 648- 653.

3. Saar-Tsechansky, M. and F. Provost, Handling missing values when applying classification models. Journal of Machine Learning Research, 2007. 8: p. 1625-1657.

4. Hathaway, R.J. and J.C. Bezdek, Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm. Pattern Recognition Letters,2002. 23(1-3): p. 151-160.

5. Aussem, A. and S.R.d. Morais. A Conservative Feature Subset Selection Algorithm with Missing Data. in Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on. 2008.

6. Little, R.J.A. and D.B. Rubin, Statistical analysis with missing data. 2002, New York: Wiley.

7. Tsikriktsis, N., A review of techniques for treating missing data in OM survey research. Journal of Operations Management, 2005. 24(1): p. 53-62.

8. Zhu, X. and X. Wu, Class Noise vs. Attribute Noise: A Quantitative Study. Artificial Intelligence Review, 2004. 22(3): p. 177-210.

9. Ivakhnenko, A.G., The Group Method of Data Handling in Prediction Problems. Soviet Automatic Control, 1976. 9(6): p. 21-30.

10. Ivakhnenko, A. and V. Stepashko, Noise Stabilityof Modeling. 1985, Kiev: Naukova Dumka.

11. Stepashko, V.S., Noise-immunity of model selection based on prediction balance criterion. Automatics, 1984(5).

12. Nijman, M.J. and H.J. Kappen, Symmetry breaking and training from incomplete data with Radial Basis Boltzmann Machines. Int J Neural Syst, 1997. 8(3): p. 301-15.

13. Ghahramani, Z., et al., Supervised learning from incomplete data via an EMapproach. Advances in Neural Information Processing Systems, 1994. 6: p. 120-127.

14. Blake, C.L. and C.J. Merz, UCI repository of machine learning databases. 1998.

15. Miller, R., Beyond ANOVA: basics of applied statistics. 1997, Boca Raton, FL: Chapman & Hall.

Last modified by Gleb on 10/29/09 15:27:06 (3 years ago)

Attachments