Results Format Overview

This page describes the results format used by the VQA evaluation code.

Results Format

results = [result] result{ "question_id": int, "answer": str }

We have provided an example result JSON file here.

We introduce a new evaluation metric which is robust to inter-human variability in phrasing the answers:

In order to be consistent with ‘human accuracies’, machine accuracies are averaged over all 10 choose 9 sets of human annotators.

Before evaluating machine generated answers, we do the following processing:

Making all characters lowercase
Removing periods except if it occurs as decimal
Converting number words to digits
Removing articles (a, an, the)
Adding apostrophe if a contraction is missing it (e.g., convert "dont" to "don't")
Replacing all punctuation (except apostrophe and colon) with a space character. We do not remove apostrophe because it can incorrectly change possessives to plural, e.g., “girl’s” to “girls” and colons because they often refer to time, e.g., 2:50 pm. In case of comma, no space is inserted if it occurs between digits, e.g., convert 100,978 to 100978. (This processing step is done for ground truth answers as well.)

A demo script of the evaluation code is available here.