Papers reporting results on the VQA v2.0 dataset should --
1) Report test-standard accuracies, which can be calculated using either of the non-test-dev phases, i.e., "test2018" or "Challenge test2018".
2) Compare their test-standard accuracies with those on the test2018 leaderboard and test2017 leaderboard.
We are pleased to announce the Visual Question Answering (VQA) Challenge 2018. Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.
The links to Submission and Leaderboard pages can be found here:
Submission and Leaderboard.
The VQA v2.0 train, validation and test sets, containing more than 250K images and 1.1M questions, are available on the download page. All questions are annotated with 10 concise, open-ended answers each. Annotations on the training and validation sets are publicly available.
VQA Challenge 2018 is the third edition of the VQA Challenge. Previous two versions of the VQA Challenge were organized in past two years, and the results were announced at VQA Challenge Workshop in CVPR 2017 and CVPR 2016. More details about past challenges can be found here: VQA Challenge 2017 and VQA Challenge 2016.
Answers to some common questions about the challenge can be found in the FAQ section.
After the challenge deadline, all challenge participant results on test-standard split will be made public on a test-standard leaderboard.
Following COCO, we have divided the test set for VQA v2.0 into a number of splits, including test-dev, test-standard, test-challenge, and test-reserve, to limit overfitting while giving researchers more flexibility to test their system. Test-dev is used for debugging and validation experiments and allows for maximum 10 submissions per day (according to UTC timezone). Test-standard is the default test data for the VQA competition. When comparing to the state of the art (e.g., in papers), results should be reported on test-standard. Test-standard is also used to maintain a public leaderboard that is updated upon submission. Test-reserve is used to protect against possible overfitting. If there are substantial differences between a method's scores on test-standard and test-reserve, this will raise a red-flag and prompt further investigation. Results on test-reserve will not be publicly revealed. Finally, test-challenge is used to determine the winners of the challenge.
The evaluation page lists detailed information regarding how submissions will be scored. The evaluation servers are open. Following last year, we are hosting the evaluation servers on a new Evaluation Platform called EvalAI, developed by the CloudCV team. EvalAI is an open-source web platform designed for organizing and participating in challenges to push the state of the art on AI tasks. Click here to know more about EvalAI. We encourage people to first submit to "test-dev2018" phase to make sure that you understand the submission procedure, as it is identical to the full test set submission procedure. Note that the "test-dev2018" and "Challenge test2018" evaluation servers do not have public leaderboards.
To enter the competition, first you need to create an account on EvalAI. We allow people to enter our challenge either privately or publicly. Any submissions to the "Challenge test2018" phase will be considered to be participating in the challenge. For submissions to the "test2018" phase, only ones that were submitted before the challenge deadline and posted to the public leaderboard will be considered to be participating in the challenge.
Before uploading your results to EvalAI, you will need to create a JSON file containing your results in the correct format as described on the evaluation page.
To submit your JSON file to the VQA evaluation servers, click on the “Submit” tab on the VQA Challenge 2018 page on EvalAI. Select the phase ("test-dev2018" or "test2018" or "Challenge test2018"). Please select the JSON file to upload and fill in the required fields such as "method name" and "method description" and click “Submit”. After the file is uploaded, the evaluation server will begin processing. To view the status of your submission please go to “My Submissions” tab and choose the phase to which the results file was uploaded. Please be patient, the evaluation may take quite some time to complete (~4 min). If the status of your submission is “Failed” please check your "Stderr File" for the corresponding submission.
After evaluation is complete and the server shows a status of “Finished”, you will have the option to download your evaluation results by selecting “Result File” for the corresponding submission. The "Result File" will contain the aggregated accuracy on the corresponding test-split (test-dev split for "test-dev2018" phase, test-standard and test-dev splits for both "test2018" and "Challenge test2018" phases). If you want your submission to appear on the public leaderboard, please submit to "test2018" phase and check the box under "Show on Leaderboard" for the corresponding submission.
Please limit the number of entries to the challenge evaluation server to a reasonable number, e.g., one entry per paper. To avoid overfitting, the number of submissions per user is limited to 1 upload per day (according to UTC timezone) and a maximum of 5 submissions per user. It is not acceptable to create multiple accounts for a single project to circumvent this limit. The exception to this is if a group publishes two papers describing unrelated methods, in this case both sets of results can be submitted for evaluation. However, test-dev allows for 10 submissions per day.
The download page contains links to all VQA v2.0 train/val/test images, questions, and associated annotations (for train/val only). Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.
Results must be submitted to the evaluation server by the challenge deadline. Competitors' algorithms will be evaluated according to the rules described on the evaluation page. Challenge participants with the most successful and innovative methods will be invited to present.
Tools and Instructions
We provide API support for the VQA annotations and evaluation code. To download the VQA API, please visit our GitHub repository. For an overview of how to use the API, please visit the download page and consult the section entitled VQA API. To obtain API support for COCO images, please visit the COCO download page. To obtain API support for abstract scenes, please visit the GitHub repository.
Frequently Asked Questions (FAQ)
As a reminder, any submissions before the challenge deadline whose results are made publicly visible on the "test2018" leaderboard OR are submitted to the "Challenge test2018" phase will be enrolled in the challenge. For further clarity, we answer some common questions below:
- Q: What do I do if I want to make my test-standard results public and participate in the challenge? A: Making your results public (i.e., visible on the leaderboard) on the "test2018" phase implies that you are participating in the challenge.
- Q: What do I do if I want to make my test-standard results public, but I do not want to participate in the challenge? A: We do not allow for this option.
- Q: What do I do if I want to participate in the challenge, but I do not want to make my test-standard results public yet? A: Submit to the "Challenge test2018" phase. This phase was created for this scenario.
- Q: When will I find out my test-challenge accuracies? A: We will reveal challenge results some time after the deadline. Results will first be announced at our VQA Challenge and Visual Dialog workshop at CVPR 2018.
- Q: Can I participate from more than one EvalAI team in the VQA challenge? A: No, you are allowed to participate from one team only.
- Q: Can I add other members to my EvalAI team? A: Yes.
- Q: Is the daily/overall submission limit for a user or for a team? A: It is for a team.
EvalAI: A New Evaluation Platform!
We are hosting VQA Challenge 2018 on EvalAI. EvalAI is an open-source web platform that aims to help researchers, students and data scientists create, collaborate and participate in various AI challenges. To overcome the increasing difficulties related to slower execution, flexibility and portability of similar platforms in benchmarking algorithms solving various AI tasks, EvalAI was started as an open-source initiative by the CloudCV team. By providing swift and robust backends based on map-reduce frameworks that speed up evaluation on the fly, EvalAI aims to make it easier for researchers to reproduce results from technical papers and perform reliable and accurate analyses. On the front-end, these are represented via active central leaderboards and dynamic submission interfaces that abstract the need for users to write time-consuming evaluation scripts on their end to single-click uploads. For more details, visit https://evalai.cloudcv.org.
If you are interested in contributing to EvalAI to make a bigger impact on the AI research community, please visit EvalAI's Github Repository.