Announcing VQA v2.0
Elevating the Role of Image Understanding in Visual Question Answering

Answer distribution for unbalanced dataset - VQA v1.0

Answer distribution for balanced dataset - VQA v2.0

Figures above compare the distribution of answers per question-type in our new balanced VQA dataset with the original (unbalanced) VQA dataset. We notice several interesting trends.

First, binary questions (e.g., "is the", "is this", "is there", "are", "does") have a significantly more balanced distribution over "yes" and "no" answers in our balanced dataset compared to VQA v1.0.

"baseball" is now slightly more popular than "tennis" under "what sport", and more importantly, overall "baseball" and "tennis" dominate less in the answer distribution. Several other sports like "frisbee", "skiing", "soccer", "skateboarding", "snowboard" and "surfing" are more visible in the answer distribution in the balanced dataset, suggesting that it contains heavier tails. Similar trends can be seen across the board with colors, animals, numbers, etc.

Quantitatively, we find that the entropy of answer distributions averaged across various question types (weighted by frequency of question types) increases by 56% after balancing, confirming the heavier tails in the answer distribution.

As the statistics show, while our balanced dataset is not perfectly balanced, it is significantly more balanced than the original VQA dataset.