logo

logo


Introducing VQA v2.0: A More Balanced and Bigger VQA Dataset!

What is VQA?

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.

  • 254,721 images (MSCOCO and abstract scenes)
  • 3 questions per image (764,163 total)
  • 10 ground truth answers per question
  • 3 plausible (but likely incorrect) answers per question
  • Open-ended and multiple-choice answering tasks
  • Automatic evaluation metric



}

9,934,119 total


Subscribe to our group for updates!

Dataset

Details on downloading the latest dataset may be found on the download webpage.

October 2015: Full release (v1.0)

Real Images
  • 204,721 MSCOCO images
    (all of current train/val/test)
  • 614,163 questions
  • 6,141,630 ground truth answers
  • 1,842,489 plausible answers
Abstract Scenes
  • 50,000 abstract scenes
  • 150,000 questions
  • 1,500,000 ground truth answers
  • 450,000 plausible answers
  • 250,000 captions
  • July 2015: Beta v0.9 release

  • June 2015: Beta v0.1 release

Paper



Download the paper

BibTeX