Shortcuts

pytorchvideo.data.labeled_video_paths

class pytorchvideo.data.labeled_video_paths.LabeledVideoPaths(paths_and_labels, path_prefix='')[source]

LabeledVideoPaths contains pairs of video path and integer index label.

classmethod from_path(data_path)[source]

Factory function that creates a LabeledVideoPaths object depending on the path type. - If it is a directory path it uses the LabeledVideoPaths.from_directory function. - If it’s a file it uses the LabeledVideoPaths.from_csv file. :param file_path: The path to the file to be read. :type file_path: str

Parameters

data_path (str) –

Return type

pytorchvideo.data.labeled_video_paths.LabeledVideoPaths

classmethod from_csv(file_path)[source]

Factory function that creates a LabeledVideoPaths object by reading a file with the following format:

<path> <integer_label> … <path> <integer_label>

Parameters

file_path (str) – The path to the file to be read.

Return type

pytorchvideo.data.labeled_video_paths.LabeledVideoPaths

classmethod from_directory(dir_path)[source]

Factory function that creates a LabeledVideoPaths object by parsing the structure of the given directory’s subdirectories into the classification labels. It expects the directory format to be the following:

dir_path/<class_name>/<video_name>.mp4

Classes are indexed from 0 to the number of classes, alphabetically.

E.g.

dir_path/class_x/xxx.ext dir_path/class_x/xxy.ext dir_path/class_x/xxz.ext dir_path/class_y/123.ext dir_path/class_y/nsdf3.ext dir_path/class_y/asd932_.ext

Would produce two classes labeled 0 and 1 with 3 videos paths associated with each.

Parameters

dir_path (str) – Root directory to the video class directories .

Return type

pytorchvideo.data.labeled_video_paths.LabeledVideoPaths

__init__(paths_and_labels, path_prefix='')[source]
Parameters
  • [ (paths_and_labels) – a list of tuples containing the video path and integer label.

  • paths_and_labels (List[Tuple[str, Optional[int]]]) –

Return type

None

__getitem__(index)[source]
Parameters

index (int) – the path and label index.

Returns

The path and label tuple for the given index.

Return type

Tuple[str, int]

__len__()[source]
Returns

The number of video paths and label pairs.

Return type

int

pytorchvideo.data.frame_video

class pytorchvideo.data.frame_video.FrameVideo(duration, fps, video_frame_to_path_fn=None, video_frame_paths=None, multithreaded_io=False)[source]

FrameVideo is an abstractions for accessing clips based on their start and end time for a video where each frame is stored as an image. PathManager is used for frame image reading, allowing non-local uri’s to be used.

__init__(duration, fps, video_frame_to_path_fn=None, video_frame_paths=None, multithreaded_io=False)[source]
Parameters
  • duration (float) – the duration of the video in seconds.

  • fps (float) – the target fps for the video. This is needed to link the frames to a second timestamp in the video.

  • video_frame_to_path_fn (Callable[[int], str]) – a function that maps from a frame index integer to the file path where the frame is located.

  • video_frame_paths (List[str]) – Dictionary of frame paths for each index of a video.

  • multithreaded_io (bool) – controls whether parllelizable io operations are performed across multiple threads.

Return type

None

classmethod from_frame_paths(video_frame_paths, fps=30.0, multithreaded_io=False)[source]
Parameters
  • video_frame_paths (List[str]) – a list of paths to each frames in the video.

  • fps (float) – the target fps for the video. This is needed to link the frames to a second timestamp in the video.

  • multithreaded_io (bool) – controls whether parllelizable io operations are performed across multiple threads.

property duration

Returns: duration: the video’s duration/end-time in seconds.

get_clip(start_sec, end_sec, frame_filter=None)[source]

Retrieves frames from the stored video at the specified start and end times in seconds (the video always starts at 0 seconds). Given that PathManager may be fetching the frames from network storage, to handle transient errors, frame reading is retried N times.

Parameters
  • start_sec (float) – the clip start time in seconds

  • end_sec (float) – the clip end time in seconds

  • frame_filter (Optional[Callable[List[int], List[int]]]) – function to subsample frames in a clip before loading. If None, no subsampling is peformed.

Returns

clip_frames

A tensor of the clip’s RGB frames with shape:

(channel, time, height, width). The frames are of type torch.float32 and in the range [0 - 255]. Raises an exception if unable to load images.

clip_data:

”video”: A tensor of the clip’s RGB frames with shape: (channel, time, height, width). The frames are of type torch.float32 and in the range [0 - 255]. Raises an exception if unable to load images.

”frame_indices”: A list of indices for each frame relative to all frames in the video.

Returns None if no frames are found.

Return type

Dict[str, Optional[torch.Tensor]]

pytorchvideo.data.clip_sampling

class pytorchvideo.data.clip_sampling.ClipInfo(clip_start_sec, clip_end_sec, clip_index, aug_index, is_last_clip)[source]
Named-tuple for clip information with:

clip_start_sec (float): clip start time. clip_end_sec (float): clip end time. clip_index (int): clip index in the video. aug_index (int): augmentation index for the clip. Different augmentation methods

might generate multiple views for the same clip.

is_last_clip (bool): a bool specifying whether there are more clips to be

sampled from the video.

property clip_start_sec

Alias for field number 0

property clip_end_sec

Alias for field number 1

property clip_index

Alias for field number 2

property aug_index

Alias for field number 3

property is_last_clip

Alias for field number 4

class pytorchvideo.data.clip_sampling.ClipSampler(clip_duration)[source]

Interface for clip sampler’s which take a video time, previous sampled clip time, and returns a named-tuple ClipInfo.

pytorchvideo.data.clip_sampling.make_clip_sampler(sampling_type, *args)[source]

Constructs the clip samplers found in this module from the given arguments. :param sampling_type: choose clip sampler to return. It has two options:

  • uniform: constructs and return UniformClipSampler

  • random: construct and return RandomClipSampler

Parameters
  • *args – the args to pass to the chosen clip sampler constructor

  • sampling_type (str) –

Return type

pytorchvideo.data.clip_sampling.ClipSampler

class pytorchvideo.data.clip_sampling.UniformClipSampler(clip_duration)[source]

Evenly splits the video into clips of size clip_duration.

__call__(last_clip_time, video_duration)[source]
Parameters
  • last_clip_time (float) – the last clip end time sampled from this video. This should be 0.0 if the video hasn’t had clips sampled yet. segments, clip_index is the segment index to sample.

  • video_duration (float) – (float): the duration of the video that’s being sampled in seconds

Returns

a named-tuple ClipInfo

includes the clip information of (clip_start_time,

clip_end_time, clip_index, aug_index, is_last_clip), where the times are in seconds and is_last_clip is False when there is still more of time in the video to be sampled.

Return type

pytorchvideo.data.clip_sampling.ClipInfo

class pytorchvideo.data.clip_sampling.RandomClipSampler(clip_duration)[source]

Randomly samples clip of size clip_duration from the videos.

__call__(last_clip_time, video_duration)[source]
Parameters
  • last_clip_time (float) – Not used for RandomClipSampler.

  • video_duration (float) – (float): the duration (in seconds) for the video that’s being sampled

Returns

a named-tuple ClipInfo

includes the clip information of (clip_start_time,

clip_end_time, clip_index, aug_index, is_last_clip). The times are in seconds. clip_index, aux_index and is_last_clip are always 0, 0 and True, respectively.

Return type

pytorchvideo.data.clip_sampling.ClipInfo

class pytorchvideo.data.clip_sampling.ConstantClipsPerVideoSampler(clip_duration, clips_per_video, augs_per_clip=1)[source]

Evenly splits the video into clips_per_video increments and samples clips of size clip_duration at these increments.

__call__(last_clip_time, video_duration)[source]
Parameters
  • last_clip_time (float) – Not used for ConstantClipsPerVideoSampler.

  • video_duration (float) – (float): the duration (in seconds) for the video that’s being sampled.

Returns

a named-tuple ClipInfo

includes the clip information of (clip_start_time,

clip_end_time, clip_index, aug_index, is_last_clip). The times are in seconds. is_last_clip is True after clips_per_video clips have been sampled or the end of the video is reached.

Return type

pytorchvideo.data.clip_sampling.ClipInfo

pytorchvideo.data.video

class pytorchvideo.data.video.Video(file, video_name=None, decode_audio=True)[source]

Video provides an interface to access clips from a video container.

classmethod from_path(file_path, decode_audio=True)[source]

Fetches the given video path using PathManager (allowing remote uris to be fetched) and constructs the EncodedVideo object.

Parameters
  • file_path (str) – a PathManager file-path.

  • decode_audio (bool) –

abstract property duration

Returns: duration of the video in seconds

abstract get_clip(start_sec, end_sec)[source]

Retrieves frames from the internal video at the specified start and end times in seconds (the video always starts at 0 seconds).

Parameters
  • start_sec (float) – the clip start time in seconds

  • end_sec (float) – the clip end time in seconds

Returns

video_data_dictonary

A dictionary mapping strings to tensor of the clip’s

underlying data.

Return type

Dict[str, Optional[torch.Tensor]]

abstract __init__(file, video_name=None, decode_audio=True)[source]
Parameters
  • file (BinaryIO) – a file-like object (e.g. io.BytesIO or io.StringIO) that contains the encoded video.

  • video_name (Optional[str]) –

  • decode_audio (bool) –

Return type

None

pytorchvideo.data.utils

pytorchvideo.data.utils.thwc_to_cthw(data)[source]

Permute tensor from (time, height, weight, channel) to (channel, height, width, time).

Parameters

data (torch.Tensor) –

Return type

torch.Tensor

pytorchvideo.data.utils.secs_to_pts(time_in_seconds, time_base, start_pts)[source]

Converts a time (in seconds) to the given time base and start_pts offset presentation time.

Returns

pts (float) – The time in the given time base.

Parameters
Return type

float

pytorchvideo.data.utils.pts_to_secs(time_in_seconds, time_base, start_pts)[source]

Converts a present time with the given time base and start_pts offset to seconds.

Returns

time_in_seconds (float) – The corresponding time in seconds.

Parameters
Return type

float

class pytorchvideo.data.utils.MultiProcessSampler(*args, **kwds)[source]

MultiProcessSampler splits sample indices from a PyTorch Sampler evenly across workers spawned by a PyTorch DataLoader.

__iter__()[source]
Returns

Iterator for underlying PyTorch Sampler indices split by worker id.

pytorchvideo.data.utils.optional_threaded_foreach(target, args_iterable, multithreaded)[source]

Applies ‘target’ function to each Tuple args in ‘args_iterable’. If ‘multithreaded’ a thread is spawned for each function application.

Parameters
  • target (Callable) – A function that takes as input the parameters in each args_iterable Tuple.

  • args_iterable (Iterable[Tuple]) – An iterable of the tuples each containing a set of parameters to pass to target.

  • multithreaded (bool) – Whether or not the target applications are parallelized by thread.

class pytorchvideo.data.utils.DataclassFieldCaster[source]

Class to allow subclasses wrapped in @dataclass to automatically cast fields to their relevant type by default.

Also allows for an arbitrary intialization function to be applied for a given field.

static complex_initialized_dataclass_field(field_initializer, **kwargs)[source]

Allows for the setting of a function to be called on the named parameter associated with a field during initialization, after __init__() completes.

Parameters
  • field_initializer (Callable) – The function to be called on the field

  • **kwargs – To be passed downstream to the dataclasses.field method

Returns

(dataclasses.Field) that contains the field_initializer and kwargs infoÎ

Return type

dataclasses.Field

pytorchvideo.data.utils.load_dataclass_dict_from_csv(input_csv_file_path, dataclass_class, dict_key_field, list_per_key=False)[source]
Parameters
  • input_csv_file_path (str) – File path of the csv to read from

  • dataclass_class (type) – The dataclass to read each row into.

  • dict_key_field (str) – The field of ‘dataclass_class’ to use as the dictionary key.

  • list_per_key (bool) – If the output data structure

  • a list of dataclass objects per key (contains) –

  • than a (rather) –

  • unique dataclass object. (single) –

Returns

Dict[Any, Union[Any, List[Any]] mapping from the dataclass value at attr = dict_key_field to either:

if ‘list_per_key’, a list of all dataclass objects that have equal values at attr = dict_key_field, equal to the key

if not ‘list_per_key’, the unique dataclass object for which the value at attr = dict_key_field is equal to the key

Raises
  • AssertionError – if not ‘list_per_key’ and there are

  • dataclass obejcts with equal values at attr = dict_key_field

Return type

Dict[Any, Union[Any, List[Any]]]

pytorchvideo.data.utils.save_dataclass_objs_to_headered_csv(dataclass_objs, file_name)[source]

Saves a list of @dataclass objects to the specified csv file.

Parameters
  • dataclass_objs (List[Any]) – A list of @dataclass objects to be saved.

  • file_name (str) – file_name to save csv data to.

Return type

None