VATEX v1.0


Note: we will hold the annotations of the test set for the challenge use, but you can submit the results to our VATEX Captioning Challenge for testing.


Training Set
  • 25,991 Videos
  • 259,910 English Captions
  • 259,910 Chinese Captions

(v1.0, 57.3 MB)

Validation Set
  • 3,000 Videos
  • 30,000 English Captions
  • 30,000 Chinese Captions

(v1.0, 6.6 MB)

Public Test Set
  • 6,000 Videos


(v1.0, 0.25 MB)


Pretrained Video Features


Note: Due to the legal and privacy concerns, we cannot directly share the downloaded videos or clips from YouTube in any way (including but not limited to email, online drives and GitHub). However, there are many open-source tools to download the original clips (e.g., [Tool #1] and [Tool #2]). Some videos might be unavailable (deleted or hidden by either YouTube or the users) at this moment, but they were available when we collected the dataset. Considering that it is an extremely small percentage, we expect that it won't have a significant impact on the performance.



In addition to the YouTube video ids, we provide the pretrained video features below for quick development. The features including all the videos are extracted using a pretrained I3D model [here]. Each video is represented by a numpy array of size (1, num_of_segments, 1024).



I3D Features on AWS S3:

Annotation Format



{
    'videoID': 'YouTubeID_StartTime_EndTime',
    'enCap': 
        [
            'Regular English Caption #1',
            'Regular English Caption #2',
            'Regular English Caption #3',
            'Regular English Caption #4',
            'Regular English Caption #5',
            'Parallel English Caption #1',
            'Parallel English Caption #2',
            'Parallel English Caption #3',
            'Parallel English Caption #4',
            'Parallel English Caption #5'
        ],
    'chCap': 
        [
            'Regular Chinese Caption #1',
            'Regular Chinese Caption #2',
            'Regular Chinese Caption #3',
            'Regular Chinese Caption #4',
            'Regular Chinese Caption #5',
            'Parallel Chinese Caption #1',
            'Parallel Chinese Caption #2',
            'Parallel Chinese Caption #3',
            'Parallel Chinese Caption #4',
            'Parallel Chinese Caption #5'
        ]
}