Automatic Image Annotation of News Images with Large Vocabularies and Low Quality Training Data

ACM Multimedia Conference(2004)

引用 40|浏览17
A traditional approach to retrieving images is to manually annotate the image with textual keywords and then retrieve images using these keywords. Manual annotation is expen- sive and recently a few approaches have been proposed for automatically annotating images. These techniques usually learn a statistical model using a training set of images an- notated with keywords and use this model to automatically annotate test images. While promising, these techniques have generally been tested on a few thousand images, with vocabularies of a few hundred words or less and using rel- atively high quality training data where the keywords are categories/objects and are directly correlated with the vi- sual data. Here, we investigate the problem of automatically annotat- ing a large dataset of news photographs using low quality training data and a large vocabulary. We use 56,117 im- ages and captions from Yahoo News Photos for our training and test data. The captions in the training portion of this data often contain a great deal of text most of which does not directly describe the image and as labels are, therefore noisy. We use the Normalized Continuous Relevance Models for our annotation and discuss how to speed up the model (by a factor of 10) using a voting technique. An improved distance measure also improves precision. To handle noisy text data and the large vocabulary of 4073 words, we inves- tigate using dieren t kinds of words for training and show that words which describe the content of the picture are sig- nican tly more useful for annotating images. Previous work on annotating images has largely dealt with high quality
. keywords image annotation,relevance models,image retrieval,automatic image annotation,image annotation,statistical model
AI 理解论文
Chat Paper