Shortcuts

pytorchvideo.data.charades

Dataset loaders and supporting classes for Charades dataset stored as frames

class pytorchvideo.data.charades.Charades(*args, **kwds)[source]

Action recognition video dataset for Charades stored as image frames. <https://prior.allenai.org/projects/charades>

This dataset handles the parsing of frames, loading and clip sampling for the videos. All io reading is done with PathManager, enabling non-local storage uri’s to be used.

__init__(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', frames_per_clip=None)[source]
Parameters
  • data_path (str) –

    Path to the data file. This file must be a space separated csv with the format:

    original_vido_id video_id frame_id path labels

  • clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.

  • video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.

  • transform (Optional[Callable]) –

    This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. The clip output is a dictionary with the following format:

    {

    ‘video’: <video_tensor>, ‘label’: <index_label> for clip-level label, ‘video_label’: <index_label> for video-level label, ‘video_index’: <video_index>, ‘clip_index’: <clip_index>, ‘aug_index’: <aug_index>, augmentation index as augmentations

    might generate multiple views for one clip.

    }

    If transform is None, the raw clip output in the above format is returned unmodified.

  • video_path_prefix (str) – prefix path to add to all paths from data_path.

  • frames_per_clip (Optional[int]) – The number of frames per clip to sample.

Return type

None

__next__()[source]

Retrieves the next clip based on the clip sampling strategy and video sampler.

Returns

A video clip with the following format if transform is None

{

‘video’: <video_tensor>, ‘label’: <index_label> for clip-level label, ‘video_label’: <index_label> for video-level label, ‘video_index’: <video_index>, ‘clip_index’: <clip_index>, ‘aug_index’: <aug_index>, augmentation index as augmentations

might generate multiple views for one clip.

}

Otherwise, the transform defines the clip output.

Return type

dict