pytorchvideo.data.encoded_video_dataset¶
Dataset loaders and supporting classes for encode video datasets (Ex: Kinetics, HmDB51, UCF101, etc)
-
class
pytorchvideo.data.encoded_video_dataset.EncodedVideoDataset(*args, **kwds)[source]¶ EncodedVideoDataset handles the storage, loading, decoding and clip sampling for a video dataset. It assumes each video is stored as an encoded video (e.g. mp4, avi).
-
__init__(labeled_video_paths, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, decode_audio=True, decoder='pyav')[source]¶ - Parameters
List[Tuple[str (labeled_video_paths) – List containing video file paths and associated labels
Optional[dict]]]]) – List containing video file paths and associated labels
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) –
This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. The clip output is a dictionary with the following format:
- {
‘video’: <video_tensor> ‘label’: <index_label> ‘video_index’: <video_index> ‘clip_index’: <clip_index> ‘aug_index’: <aug_index>, augmentation index as augmentations
might generate multiple views for one clip.
}
If transform is None, the raw clip output in the above format is returned unmodified.
decoder (str) – Defines what type of decoder used to decode a video.
decode_audio (bool) –
- Return type
-
__next__()[source]¶ Retrieves the next clip based on the clip sampling strategy and video sampler.
- Returns
A video clip with the following format if transform is None –
- {
‘video’: <video_tensor>, ‘label’: <index_label>, ‘video_index’: <video_index> ‘clip_index’: <clip_index> ‘aug_index’: <aug_index>, augmentation index as augmentations
might generate multiple views for one clip.
}
Otherwise, the transform defines the clip output.
- Return type
-
-
pytorchvideo.data.encoded_video_dataset.labeled_encoded_video_dataset(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', decode_audio=True, decoder='pyav')[source]¶ A helper function to create EncodedVideoDataset object for Ucf101 and Kinectis datasets.
- Parameters
data_path (pathlib.Path) – Path to the data. The path type defines how the
should be read (data) –
- For a file path, the file is read and each line is parsed into a
video path and label.
- For a directory, the directory structure defines the classes
(i.e. each subdirectory is a class).
the LabeledVideoPaths class documentation for specific formatting (See) –
and examples. (details) –
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) –
This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. The clip output is a dictionary with the following format:
- {
‘video’: <video_tensor>, ‘label’: <index_label>, ‘video_index’: <video_index> ‘clip_index’: <clip_index> ‘aug_index’: <aug_index>, augmentation index as augmentations
might generate multiple views for one clip.
}
If transform is None, the raw clip output in the above format is returned unmodified.
video_path_prefix (str) – Path to root directory with the videos that are loaded in EncodedVideoDataset. All the video paths before loading are prefixed with this path.
decoder (str) – Defines what type of decoder used to decode a video.
decode_audio (bool) –
- Return type
pytorchvideo.data.encoded_video_pyav¶
-
class
pytorchvideo.data.encoded_video_pyav.EncodedVideoPyAV(file, video_name=None, decode_audio=True)[source]¶ EncodedVideoPyAV is an abstraction for accessing clips from an encoded video using PyAV as the decoding backend. It supports selective decoding when header information is available.
-
property
name¶ Returns: name: the name of the stored video if set.
-
property
duration¶ Returns: duration: the video’s duration/end-time in seconds.
-
get_clip(start_sec, end_sec)[source]¶ Retrieves frames from the encoded video at the specified start and end times in seconds (the video always starts at 0 seconds).
- Parameters
- Returns
clip_data – A dictionary mapping the entries at “video” and “audio” to a tensors.
”video”: A tensor of the clip’s RGB frames with shape: (channel, time, height, width). The frames are of type torch.float32 and in the range [0 - 255].
”audio”: A tensor of the clip’s audio samples with shape: (samples). The samples are of type torch.float32 and in the range [0 - 255].
Returns None if no video or audio found within time range.
- Return type
Dict[str, Optional[torch.Tensor]]
-
property
pytorchvideo.data.encoded_video_torchvision¶
-
class
pytorchvideo.data.encoded_video_torchvision.EncodedVideoTorchVision(file, video_name=None, decode_audio=True)[source]¶ Accessing clips from an encoded video using Torchvision video reading API (torch.ops.video_reader.read_video_from_memory) as the decoding backend.
-
property
name¶ Returns: name: the name of the stored video if set.
-
property
duration¶ Returns: duration: the video’s duration/end-time in seconds.
-
get_clip(start_sec, end_sec)[source]¶ Retrieves frames from the encoded video at the specified start and end times in seconds (the video always starts at 0 seconds).
- Parameters
- Returns
clip_data – A dictionary mapping the entries at “video” and “audio” to a tensors.
”video”: A tensor of the clip’s RGB frames with shape: (channel, time, height, width). The frames are of type torch.float32 and in the range [0 - 255].
”audio”: A tensor of the clip’s audio samples with shape: (samples). The samples are of type torch.float32 and in the range [0 - 255].
Returns None if no video or audio found within time range.
- Return type
Dict[str, Optional[torch.Tensor]]
-
property
pytorchvideo.data.encoded_video¶
-
pytorchvideo.data.encoded_video.select_video_class(decoder)[source]¶ Select the class for accessing clips based on provided decoder string
- Parameters
decoder (str) – Defines what type of decoder used to decode a video.
- Return type
-
class
pytorchvideo.data.encoded_video.EncodedVideo(file, video_name=None, decode_audio=True, decoder='pyav')[source]¶ EncodedVideo is an abstraction for accessing clips from an encoded video. It supports selective decoding when header information is available.
-
classmethod
from_path(file_path, decode_audio=True, decoder='pyav')[source]¶ Fetches the given video path using PathManager (allowing remote uris to be fetched) and constructs the EncodedVideo object.
-
property
name¶ Returns: name: the name of the stored video if set.
-
property
duration¶ Returns: duration: the video’s duration/end-time in seconds.
-
get_clip(start_sec, end_sec)[source]¶ Retrieves frames from the encoded video at the specified start and end times in seconds (the video always starts at 0 seconds).
- Parameters
- Returns
clip_data – A dictionary mapping the entries at “video” and “audio” to a tensors.
”video”: A tensor of the clip’s RGB frames with shape: (channel, time, height, width). The frames are of type torch.float32 and in the range [0 - 255].
”audio”: A tensor of the clip’s audio samples with shape: (samples). The samples are of type torch.float32 and in the range [0 - 255].
Returns None if no video or audio found within time range.
- Return type
Dict[str, Optional[torch.Tensor]]
-
classmethod