Visual Question Answering and Dialog Workshop
Location: Long Beach Convention & Entertainment Center
at CVPR 2019, Long Beach, California, USA


The primary goal of this workshop is two-fold. First is to benchmark progress in Visual Question Answering and Visual Dialog by hosting three challenges:

  • VQA Challenge: The 4th edition of VQA Challenge will be hosted on the VQA v2.0 dataset introduced in Goyal et al., CVPR 2017. The 2nd and 3rd edition of the VQA Challenge were organised at CVPR 2017 and CVPR 2018 on the VQA v2.0 dataset, and the 1st edition was organised at CVPR 2016 on the VQA v1.0 dataset introduced in Antol et al., ICCV 2015. VQA v2.0 is more balanced and reduces language biases over VQA v1.0, and is about twice the size of VQA v1.0.
    Challenge Link:

  • TextVQA Challenge: The 1st edition of TextVQA Challenge will be hosted on the TextVQA dataset. TextVQA requires algorithms to look at an image, read text in the image, reason about it, and answer a given question.
    Challenge Link: Coming soon!

  • Visual Dialog Challenge: The 2nd edition of Visual Dialog Challenge will be hosted on the VisDial v1.0 dataset introduced in Das et al., CVPR 2017. The 1st edition of Visual Dialog Challenge was organised on the VisDial v1.0 dataset at ECCV 2018. Visual Dialog is a novel task that requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history (consisting of the image caption and a sequence of previous questions and answers), the agent has to answer a follow-up question in the dialog.
    Challenge Link:

  • The second goal of this workshop is to continue to bring together researchers interested in visually-grounded question answering, dialog systems, and language in general to share state-of-the-art approaches, best practices, and future directions in multi-modal AI. In addition to invited talks from established researchers, we invite submissions of extended abstracts of at most 2 pages describing work in the relevant areas including: Visual Question Answering, Visual Dialog, (Textual) Question Answering, (Textual) Dialog Systems, Commonsense knowledge, Vision + Language, etc. All accepted abstracts will be presented as posters at the workshop to disseminate ideas. The workshop will be held at the IEEE Conference on Computer Vision and Pattern Recognition, 2019.

Invited Speakers

Alex Schwing
University of Illinois at Urbana-Champaign

He He
Amazon Web Services  --> New York University

Lisa Hendricks
University of California, Berkeley

More speakers to be added.


Jan 2019 Challenge Announcement
mid-May 2019 Challenge Submission
Mar 15, 2019 Workshop Paper Submission
Apr 1, 2019 Notiļ¬cation to Authors
Apr 10, 2019 Camera Ready Submission
Jun 2019 Workshop


Abhishek Das
Georgia Tech

Karan Desai
Georgia Tech

Ayush Shrivastava
Georgia Tech

Yash Goyal
Georgia Tech

Aishwarya Agrawal
Georgia Tech

Amanpreet Singh
Facebook AI Research

Meet Shah
Facebook AI Research


Stefan Lee
Georgia Tech

Peter Anderson
Georgia Tech

Xinlei Chen
Facebook AI Research

Marcus Rohrbach
Facebook AI Research

Dhruv Batra
Georgia Tech / Facebook AI Research

Devi Parikh
Georgia Tech / Facebook AI Research