E-Book Content
Link¨oping Studies in Science and Technology Dissertation No. 1160
Channel-Coded Feature Maps for Computer Vision and Machine Learning Erik Jonsson
Department of Electrical Engineering Link¨ opings universitet, SE-581 83 Link¨oping, Sweden
Link¨oping February 2008
Channel-Coded Feature Maps for Computer Vision and Machine Learning Erik Jonsson Department of Electrical Engineering Link¨ oping University SE-581 83 Link¨ oping Sweden Link¨ oping Studies in Science and Technology Dissertation No. 1160 c 2008 Erik Jonsson Copyright ISBN 978-91-7393-988-1 ISSN 0345-7524 Back cover illustration by Nikolina Oreˇskovi´c Printed by LiU-Tryck, Link¨ oping 2008
iii
To Helena for patience, love and understanding
iv
v
Abstract This thesis is about channel-coded feature maps applied in view-based object recognition, tracking, and machine learning. A channel-coded feature map is a soft histogram of joint spatial pixel positions and image feature values. Typical useful features include local orientation and color. Using these features, each channel measures the co-occurrence of a certain orientation and color at a certain position in an image or image patch. Channel-coded feature maps can be seen as a generalization of the SIFT descriptor with the options of including more features and replacing the linear interpolation between bins by a more general basis function. The general idea of channel coding originates from a model of how information might be represented in the human brain. For example, different neurons tend to be sensitive to different orientations of local structures in the visual input. The sensitivity profiles tend to be smooth such that one neuron is maximally activated by a certain orientation, with a gradually decaying activity as the input is rotated. This thesis extends previous work on using channel-coding ideas within computer vision and machine learning. By differentiating the channel-coded feature maps with respect to transformations of the underlying image, a method for image registration and tracking is constructed. By using piecewise polynomial basis functions, the channel coding can be computed more efficiently, and a general encoding method for N-dimensional feature spaces is presented. Furthermore, I argue for using channel-coded feature maps in view-based pose estimation, where a continuous pose parameter is estimated from a query image given a number of training views with known pose. The optimization of position, rotation and scale of the object in the image plane is then included in the optimization problem, leading to a simultaneous tracking and pose estimation algorithm. Apart from objects and poses, the thesis examines the use of channel coding in connection with Bayesian networks. The goal here is to avoid the hard discretizations usually required when Markov random fields are used on intrinsically continuous signals like depth for stereo vision or color values in image restoration. Channel coding has previously been used to design machine learning algorithms that are robust to outliers, ambiguities, and discontinuities in the training data. This is obtained by finding a linear mapping between channel-coded input and output values. This thesis extends this method with an incremental version and identifies and analyzes a key feature of the method – that it is able to handle a learning situation where the correspondence structure between the input and output space is not completely known. In contrast to a traditional supervised learning setting, the training examples are groups of unordered input-output points, where the correspondence structure within each group is unknown. This behavior is studied theoretically and the effect of outliers and convergence properties are analyzed. All presented methods have been evaluated experimentally. The work has been co