We are pleased to announce the first Video-guided Machine Translation (VMT) Challenge! The challenge will be hosted at the Workshop on Advances in Language and Vision Research (ALVR), ACL 2020.
Please stay tuned for more information!
To be eligible for result archives and consideration for awards, we kindly request you to send the following information to firstname.lastname@example.org using your main contact email:
The VATEX dataset is a new large-scale multilingual video description dataset, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSRVTT dataset, VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. Please refer to our ICCV paper for more details. This Video-guided Machine Translation Challenge aims to benchmark progress towards models that translate source language sentence into the target language with video information as the additional spatiotemporal context.
The starter code for video-guided machine translation is released here, including the baseline VMT model, the preparation of the data and features, and the submission file generation.
The challenge is hosted at the CodaLab. Please go to the Challenge page to submit your models.