Meet VATEX!

A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research.

Why VATEX?



MULTILINGUAL

Both English and Chinese captions.



LARGE-SCALE

826K captions for 41.3K video clips.



VIDEO COVERAGE

Comprehensive and representative video content from 600 fine-grained human activities.



LEXICAL DIVERSITY

Unique and lexically-richer annotations to empower more natural and diverse caption generation.



Comparison






Paper


Please cite our paper as below if you use the VATEX dataset.

@article{wang2019vatex,
  title={VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research},
  author={Wang, Xin and Wu, Jiawei and Chen, Junkun and Li, Lei and Wang, Yuan-Fang and Wang, William Yang},
  journal={arXiv preprint arXiv:1904.03493},
  year={2019}
}
                        

Contact



Have any questions or suggestions? Feel free to contact us!