logo

logo


The VQA Challenge Winners and Honorable Mentions will be revealed at the VQA Challenge Workshop
where they will be awarded GPUs sponsored by NVIDIA!

What is VQA?

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

  • 254,721 images (MSCOCO and abstract scenes)
  • 3 questions per image (764,163 total)
  • 10 ground truth answers per question
  • 3 plausible (but likely incorrect) answers per question
  • Open-ended and multiple-choice answering tasks
  • Automatic evaluation metric



}

9,934,119 total


Subscribe to our group for updates!

Dataset

Details on downloading the latest dataset may be found on the download webpage.

October 2015: Full release (v1.0)

Real Images
  • 204,721 MSCOCO images
    (all of current train/val/test)
  • 614,163 questions
  • 6,141,630 ground truth answers
  • 1,842,489 plausible answers
Abstract Scenes
  • 50,000 abstract scenes
  • 150,000 questions
  • 1,500,000 ground truth answers
  • 450,000 plausible answers
  • 250,000 captions
  • July 2015: Beta v0.9 release

  • June 2015: Beta v0.1 release

Paper



Download the paper

BibTeX

Papers reporting results on the VQA dataset should --

1) Report test-standard accuracies, which can be calculated using either of the non-test-dev phases, i.e., "test2015" or "Challenge test2015" on the following links: [oe-real | oe-abstract | mc-real | mc-abstract].

2) Compare their test-standard accuracies with those on the corresponding test2015 leaderboards [oe-real-leaderboard | oe-abstract-leaderboard | mc-real-leaderboard | mc-abstract-leaderboard].

For more details, please see the challenge page.