We are pleased to announce the VATEX Captioning Challenge 2019! The challenge will be hosted at the 3rd Workshop on Closing the Loop Between Vision and Language, ICCV 2019.
Please stay tuned for more information!
We will have several prizes and rewards for the winning teams! Details will be released soon.
The VATEX dataset is a new large-scale multilingual video description dataset, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSRVTT dataset, VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. Please refer to our ICCV paper for more details. This VATEX Captioning Challenge aims to benchmark progress towards models that can describe the videos in various languages such as English and Chinese.
Please refer to the details at the Download page. You can download English/Chinese captions and video features from the page.
The challenge is hosted at the CodaLab. Please go to the challenge page to submit your models.