We are pleased to announce the VATEX Captioning Challenge 2019! The challenge will be hosted at the 3rd Workshop on Closing the Loop Between Vision and Language, ICCV 2019.
Please stay tuned for more information!
The 1st VATEX Captioning Challenge has ended! We plan to archive the competition results from CodaLab to the official VATEX website and further contribute to the vision-and-language research community. Meanwhile, we have several rewards for the winning teams, which will be announced at the 3rd CLVL workshop on Oct 28th, 2019. To be eligible for result archives and consideration for awards, we kindly request you to send the following information to firstname.lastname@example.org using your main contact email:
The VATEX dataset is a new large-scale multilingual video description dataset, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSRVTT dataset, VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. Please refer to our ICCV paper for more details. This VATEX Captioning Challenge aims to benchmark progress towards models that can describe the videos in various languages such as English and Chinese.
Please refer to the details at the Download page. You can download English/Chinese captions and video features from the page.
The challenge is hosted at the CodaLab. Please go to the challenge page to submit your models.
|1||Baseline Shared Encoder||28.4||21.7||47.0||45.1|
|3||Baseline Shared Encoder-Decoder||27.9||21.6||46.8||44.2|