Open Vocabulary Scene Parsing

Hang Zhao1 , Xavier Puig1, Bolei Zhou1, Sanja Fidler2, Antonio Torralba1,
1Massachusetts Institute of Technology, 2University of Toronto

Abstract

Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets. In this paper, we propose a new task that aims at parsing scene with a large and open vocabulary, and several evaluation metrics are explored for this problem. Our proposed approach to this problem is a joint image pixel and word concept embeddings framework, where word concepts are connected by semantic relations. We validate the open vocabulary prediction ability of our framework on ADE20K dataset which covers a wide variety of scenes and objects. We further explore the trained joint embedding space to show its interpretability.

Explore the data


Click the following images or enter the image name inside the dataset


Paper and Dataset


Read our ICCV paper HERE.
Download the concept graph for ADE20K dataset HERE.

Citation

          @inproceedings{openvoc2017,
            title = {Open Vocabulary Scene Parsing},
            author = {Zhao, Hang and Puig, Xavier and Zhou, Bolei and Fidler, Sanja and Torralba, Antonio,
            booktitle = {International Conference on Computer Vision (ICCV)},
            year = {2017}}
Acknowledgement: This work was supported by Samsung and NSF grant No.1524817 to AT. SF acknowledges the support from NSERC. BZ is supported by Facebook Fellowship. We thank Wei-Chiu Ma and Yusuf Aytar for insightful discussions.