Joint Segmentation and Recognition of Categorized Objects from Noisy Web Image Collection

IEEE Trans Image Process. 2014 Sep;23(9):4070-4086. doi: 10.1109/TIP.2014.2339196. Epub 2014 Jul 14.

Abstract

The segmentation of categorized objects addresses the problem of joint segmentation of a single category of object across a collection of images, where categorized objects are referred to objects in the same category. Most existing methods of segmentation of categorized objects made the assumption that all images in the given image collection contain the target object. In other words, the given image collection is noise free. Therefore, they may not work well when there are some noisy images which are not in the same category, such as those image collections gathered by a text query from modern image search engines. To overcome this limitation, we propose a method for automatic segmentation and recognition of categorized objects from noisy Web image collections. This is achieved by cotraining an automatic object segmentation algorithm that operates directly on a collection of images, and an object category recognition algorithm that identifies which images contain the target object. The object segmentation algorithm is trained on a subset of images from the given image collection which are recognized to contain the target object with high confidence, while training the object category recognition model is guided by the intermediate segmentation results obtained from the object segmentation algorithm. This way, our co-training algorithm automatically identifies the set of true positives in the noisy Web image collection, and simultaneously extracts the target objects from all the identified images. Extensive experiments validated the efficacy of our proposed approach on four datasets: 1) the Weizmann horse dataset, 2) the MSRC object category dataset, 3) the iCoseg dataset, and 4) a new 30-categories dataset including 15,634 Web images with both hand-annotated category labels and ground truth segmentation labels. It is shown that our method compares favorably with the state-of-the-art, and has the ability to deal with noisy image collections.