pytorchvideo.data.charades¶

Dataset loaders and supporting classes for Charades dataset stored as frames

class pytorchvideo.data.charades.Charades(*args, **kwds)[source]¶

Action recognition video dataset for Charades stored as image frames. <https://prior.allenai.org/projects/charades>

This dataset handles the parsing of frames, loading and clip sampling for the videos. All io reading is done with PathManager, enabling non-local storage uri’s to be used.

__init__(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', frames_per_clip=None)[source]¶

Parameters

data_path (str) –
Path to the data file. This file must be a space separated csv with the format:

original_vido_id video_id frame_id path labels
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Optional[Callable]) –
This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. The clip output is a dictionary with the following format:

{
‘video’: <video_tensor>, ‘label’: <index_label> for clip-level label, ‘video_label’: <index_label> for video-level label, ‘video_index’: <video_index>, ‘clip_index’: <clip_index>, ‘aug_index’: <aug_index>, augmentation index as augmentations

might generate multiple views for one clip.

}

If transform is None, the raw clip output in the above format is returned unmodified.
video_path_prefix (str) – prefix path to add to all paths from data_path.
frames_per_clip (Optional[int]) – The number of frames per clip to sample.

Return type

None

__next__()[source]¶

Retrieves the next clip based on the clip sampling strategy and video sampler.

Returns

A video clip with the following format if transform is None –

{
‘video’: <video_tensor>, ‘label’: <index_label> for clip-level label, ‘video_label’: <index_label> for video-level label, ‘video_index’: <video_index>, ‘clip_index’: <clip_index>, ‘aug_index’: <aug_index>, augmentation index as augmentations

might generate multiple views for one clip.

}

Otherwise, the transform defines the clip output.

Return type

dict