Visual Question Answering and Dialog Workshop
at CVPR 2020, Seattle, Washington, USA




Introduction

The primary goal of this workshop is two-fold. First is to benchmark progress in Visual Question Answering and Visual Dialog.

    Visual Question Answering
    There will be four tracks in the Visual Question Answering Challenge this year.

  • VQA: This track is the 5th challenge on the VQA v2.0 dataset introduced in Goyal et al., CVPR 2017. The 2nd, 3rd and 4th editions were organised at CVPR 2017, CVPR 2018 and CVPR 2019 on the VQA v2.0 dataset, and the 1st edition was organised at CVPR 2016 on the VQA v1.0 dataset introduced in Antol et al., ICCV 2015. VQA v2.0 is more balanced and reduces language biases over VQA v1.0, and is about twice the size of VQA v1.0.

  • TextVQA: This track is the 2nd challenge on the TextVQA dataset introduced in Singh et al., CVPR 2019. TextVQA requires algorithms to look at an image, read text in the image, reason about it, and answer a given question. The 1st edition of the TextVQA Challenge was organised at CVPR 2019.

  • GQA: This track is the 2nd challenge on the GQA dataset introduced in Hudson et al., 2019. GQA focuses on real-world compositional reasoning. The dataset contains 20M image and question pairs, each of them comes with an underlying structured representation of their semantics. The dataset is complemented with a suite of new evaluation metrics to test consistency, validity and grounding. The 1st edition of the GQA Challenge was organised at CVPR 2019.
    To know more about GQA, visit https://visualreasoning.org.

  • VizWiz: This track is the 2nd VQA challenge on the VizWiz dataset introduced in Gurari et al., CVPR 2018. The 1st edition was organized at ECCV2018 on a deprecated version.  This track focuses on answering visual questions that originate from a real use case where blind people were submitting images with recorded spoken questions in order to learn about their physical surroundings.

    Challenge link: https://vizwiz.org/tasks-and-datasets/vqa/
    Evaluation Server: Coming soon
    Submission Deadline: Friday, May 15, 2020 5:59:59 CST

  • Visual Dialog
  • The 3rd edition of the Visual Dialog Challenge will be hosted on the VisDial v1.0 dataset introduced in Das et al., CVPR 2017. The 1st and 2nd editions of the Visual Dialog Challenge was organised on the VisDial v1.0 dataset at ECCV 2018 and CVPR 2019. Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history (consisting of the image caption and a sequence of previous questions and answers), the agent has to answer a follow-up question in the dialog.

  • All challenges will be announced soon!

    The second goal of this workshop is to continue to bring together researchers interested in visually-grounded question answering, dialog systems, and language in general to share state-of-the-art approaches, best practices, and future directions in multi-modal AI. In addition to invited talks from established researchers, we invite submissions of extended abstracts of at most 2 pages describing work in the relevant areas including: Visual Question Answering, Visual Dialog, (Textual) Question Answering, (Textual) Dialog Systems, Commonsense knowledge, Vision + Language, etc. All accepted abstracts will be presented as posters at the workshop to disseminate ideas. The workshop is organised at the IEEE Conference on Computer Vision and Pattern Recognition, 2020.


Invited Speakers


Danna Gurari
University of Texas at Austin


Felix Hill
DeepMind


Douwe Kiela
Facebook AI Research


Anna Rohrbach
UC Berkeley


Amanpreet Singh
Facebook AI Research


Nassim Parvin
Georgia Tech


Ani Kembhavi
Allen Institute for Artificial Intelligence


Jiasen Lu
Georgia Tech


Raquel Fernández
University of Amsterdam


Submission Instructions

We invite submissions of extended abstracts of at most 2 pages describing work in areas such as: Visual Question Answering, Visual Dialog, (Textual) Question Answering, (Textual) Dialog Systems, Commonsense knowledge, Video Question Answering, Video Dialog, Vision + Language, and Vision + Language + Action (Embodied Agents). Accepted submissions will be presented as posters at the workshop. The extended abstract should follow the CVPR formatting guidelines and be emailed as a single PDF to the email id mentioned below.


    Dual Submissions
    We encourage submissions of relevant work that has been previously published, or is to be presented at the main conference. The accepted abstracts will not appear in the official IEEE proceedings.

    Where to Submit?
    Please send your abstracts to visualqa.workshop@gmail.com


Dates

January 2020 Challenge Announcements
March 15, 2020 Workshop Paper Submission
April 1, 2020 Notification to Authors
mid-May 2020 Challenge Submission Deadlines
June 2020 Workshop


Organizers


Ayush Shrivastava
Georgia Tech


Drew Hudson
Stanford University


Vishvak Murahari
Georgia Tech


Abhishek Das
Georgia Tech


Satwik Kottur
Facebook AI


Dhruv Batra
Georgia Tech / Facebook AI Research


Devi Parikh
Georgia Tech / Facebook AI Research


Aishwarya Agrawal
DeepMind


Sponsors

This work is supported by grants awarded to Dhruv Batra and Devi Parikh.


Contact: visualqa@gmail.com