pytorchvideo.data.labeled_video_paths¶

class pytorchvideo.data.labeled_video_paths.LabeledVideoPaths(paths_and_labels, path_prefix='')[source]¶

LabeledVideoPaths contains pairs of video path and integer index label.

classmethod from_path(data_path)[source]¶

Factory function that creates a LabeledVideoPaths object depending on the path type. - If it is a directory path it uses the LabeledVideoPaths.from_directory function. - If it’s a file it uses the LabeledVideoPaths.from_csv file. :param file_path: The path to the file to be read. :type file_path: str

Parameters: data_path (str) –
Return type: pytorchvideo.data.labeled_video_paths.LabeledVideoPaths

classmethod from_csv(file_path)[source]¶

Factory function that creates a LabeledVideoPaths object by reading a file with the following format:

<path> <integer_label> … <path> <integer_label>

Parameters: file_path (str) – The path to the file to be read.
Return type: pytorchvideo.data.labeled_video_paths.LabeledVideoPaths

classmethod from_directory(dir_path)[source]¶

Factory function that creates a LabeledVideoPaths object by parsing the structure of the given directory’s subdirectories into the classification labels. It expects the directory format to be the following:

dir_path/<class_name>/<video_name>.mp4

Classes are indexed from 0 to the number of classes, alphabetically.

E.g.: dir_path/class_x/xxx.ext dir_path/class_x/xxy.ext dir_path/class_x/xxz.ext dir_path/class_y/123.ext dir_path/class_y/nsdf3.ext dir_path/class_y/asd932_.ext

Would produce two classes labeled 0 and 1 with 3 videos paths associated with each.

Parameters: dir_path (str) – Root directory to the video class directories .
Return type: pytorchvideo.data.labeled_video_paths.LabeledVideoPaths

__init__(paths_and_labels, path_prefix='')[source]¶

Parameters

[ (paths_and_labels) – a list of tuples containing the video path and integer label.
paths_and_labels (List[Tuple[str, Optional[int]]]) –

Return type

None

__getitem__(index)[source]¶

Parameters: index (int) – the path and label index.
Returns: The path and label tuple for the given index.
Return type: Tuple[str, int]

__len__()[source]¶

Returns: The number of video paths and label pairs.
Return type: int

pytorchvideo.data.frame_video¶

class pytorchvideo.data.frame_video.FrameVideo(duration, fps, video_frame_to_path_fn=None, video_frame_paths=None, multithreaded_io=False)[source]¶

FrameVideo is an abstractions for accessing clips based on their start and end time for a video where each frame is stored as an image. PathManager is used for frame image reading, allowing non-local uri’s to be used.

__init__(duration, fps, video_frame_to_path_fn=None, video_frame_paths=None, multithreaded_io=False)[source]¶

Parameters

duration (float) – the duration of the video in seconds.
fps (float) – the target fps for the video. This is needed to link the frames to a second timestamp in the video.
video_frame_to_path_fn (Callable[[int], str]) – a function that maps from a frame index integer to the file path where the frame is located.
video_frame_paths (List[str]) – Dictionary of frame paths for each index of a video.
multithreaded_io (bool) – controls whether parllelizable io operations are performed across multiple threads.

Return type

None

classmethod from_frame_paths(video_frame_paths, fps=30.0, multithreaded_io=False)[source]¶

Parameters

video_frame_paths (List[str]) – a list of paths to each frames in the video.
fps (float) – the target fps for the video. This is needed to link the frames to a second timestamp in the video.
multithreaded_io (bool) – controls whether parllelizable io operations are performed across multiple threads.

property duration¶: Returns: duration: the video’s duration/end-time in seconds.

get_clip(start_sec, end_sec, frame_filter=None)[source]¶

Retrieves frames from the stored video at the specified start and end times in seconds (the video always starts at 0 seconds). Given that PathManager may be fetching the frames from network storage, to handle transient errors, frame reading is retried N times.

Parameters

start_sec (float) – the clip start time in seconds
end_sec (float) – the clip end time in seconds
frame_filter (Optional[Callable[List[int], List[int]]]) – function to subsample frames in a clip before loading. If None, no subsampling is peformed.

Returns

clip_frames –

A tensor of the clip’s RGB frames with shape:

(channel, time, height, width). The frames are of type torch.float32 and in the range [0 - 255]. Raises an exception if unable to load images.

clip_data:

”video”: A tensor of the clip’s RGB frames with shape: (channel, time, height, width). The frames are of type torch.float32 and in the range [0 - 255]. Raises an exception if unable to load images.

”frame_indices”: A list of indices for each frame relative to all frames in the video.

Returns None if no frames are found.

Return type

Dict[str, Optional[torch.Tensor]]

pytorchvideo.data.clip_sampling¶

class pytorchvideo.data.clip_sampling.ClipInfo(clip_start_sec, clip_end_sec, clip_index, aug_index, is_last_clip)[source]¶

Named-tuple for clip information with:

clip_start_sec (float): clip start time. clip_end_sec (float): clip end time. clip_index (int): clip index in the video. aug_index (int): augmentation index for the clip. Different augmentation methods

might generate multiple views for the same clip.

is_last_clip (bool): a bool specifying whether there are more clips to be: sampled from the video.

property clip_start_sec¶: Alias for field number 0

property clip_end_sec¶: Alias for field number 1

property clip_index¶: Alias for field number 2

property aug_index¶: Alias for field number 3

property is_last_clip¶: Alias for field number 4

class pytorchvideo.data.clip_sampling.ClipSampler(clip_duration)[source]¶: Interface for clip sampler’s which take a video time, previous sampled clip time, and returns a named-tuple ClipInfo.

pytorchvideo.data.clip_sampling.make_clip_sampler(sampling_type, *args)[source]¶

Constructs the clip samplers found in this module from the given arguments. :param sampling_type: choose clip sampler to return. It has two options:

uniform: constructs and return UniformClipSampler

random: construct and return RandomClipSampler

Parameters

*args – the args to pass to the chosen clip sampler constructor
sampling_type (str) –

Return type

pytorchvideo.data.clip_sampling.ClipSampler

class pytorchvideo.data.clip_sampling.UniformClipSampler(clip_duration)[source]¶

Evenly splits the video into clips of size clip_duration.

__call__(last_clip_time, video_duration)[source]¶

Parameters

last_clip_time (float) – the last clip end time sampled from this video. This should be 0.0 if the video hasn’t had clips sampled yet. segments, clip_index is the segment index to sample.
video_duration (float) – (float): the duration of the video that’s being sampled in seconds

Returns

a named-tuple ClipInfo –

includes the clip information of (clip_start_time,: clip_end_time, clip_index, aug_index, is_last_clip), where the times are in seconds and is_last_clip is False when there is still more of time in the video to be sampled.

Return type

pytorchvideo.data.clip_sampling.ClipInfo

class pytorchvideo.data.clip_sampling.RandomClipSampler(clip_duration)[source]¶

Randomly samples clip of size clip_duration from the videos.

__call__(last_clip_time, video_duration)[source]¶

Parameters

last_clip_time (float) – Not used for RandomClipSampler.
video_duration (float) – (float): the duration (in seconds) for the video that’s being sampled

Returns

a named-tuple ClipInfo –

includes the clip information of (clip_start_time,: clip_end_time, clip_index, aug_index, is_last_clip). The times are in seconds. clip_index, aux_index and is_last_clip are always 0, 0 and True, respectively.

Return type

pytorchvideo.data.clip_sampling.ClipInfo

class pytorchvideo.data.clip_sampling.ConstantClipsPerVideoSampler(clip_duration, clips_per_video, augs_per_clip=1)[source]¶

Evenly splits the video into clips_per_video increments and samples clips of size clip_duration at these increments.

__call__(last_clip_time, video_duration)[source]¶

Parameters

last_clip_time (float) – Not used for ConstantClipsPerVideoSampler.
video_duration (float) – (float): the duration (in seconds) for the video that’s being sampled.

Returns

a named-tuple ClipInfo –

includes the clip information of (clip_start_time,: clip_end_time, clip_index, aug_index, is_last_clip). The times are in seconds. is_last_clip is True after clips_per_video clips have been sampled or the end of the video is reached.

Return type

pytorchvideo.data.clip_sampling.ClipInfo

pytorchvideo.data.video¶

class pytorchvideo.data.video.Video(file, video_name=None, decode_audio=True)[source]¶

Video provides an interface to access clips from a video container.

classmethod from_path(file_path, decode_audio=True)[source]¶

Fetches the given video path using PathManager (allowing remote uris to be fetched) and constructs the EncodedVideo object.

Parameters

file_path (str) – a PathManager file-path.
decode_audio (bool) –

abstract property duration¶: Returns: duration of the video in seconds

abstract get_clip(start_sec, end_sec)[source]¶

Retrieves frames from the internal video at the specified start and end times in seconds (the video always starts at 0 seconds).

Parameters

start_sec (float) – the clip start time in seconds
end_sec (float) – the clip end time in seconds

Returns

video_data_dictonary –

A dictionary mapping strings to tensor of the clip’s: underlying data.

Return type

Dict[str, Optional[torch.Tensor]]

abstract __init__(file, video_name=None, decode_audio=True)[source]¶

Parameters

file (BinaryIO) – a file-like object (e.g. io.BytesIO or io.StringIO) that contains the encoded video.
video_name (Optional[str]) –
decode_audio (bool) –

Return type

None

pytorchvideo.data.utils¶

pytorchvideo.data.utils.thwc_to_cthw(data)[source]¶

Permute tensor from (time, height, weight, channel) to (channel, height, width, time).

Parameters: data (torch.Tensor) –
Return type: torch.Tensor

pytorchvideo.data.utils.secs_to_pts(time_in_seconds, time_base, start_pts)[source]¶

Converts a time (in seconds) to the given time base and start_pts offset presentation time.

Returns

pts (float) – The time in the given time base.

Parameters

time_in_seconds (float) –
time_base (float) –
start_pts (float) –

Return type

float

pytorchvideo.data.utils.pts_to_secs(time_in_seconds, time_base, start_pts)[source]¶

Converts a present time with the given time base and start_pts offset to seconds.

Returns

time_in_seconds (float) – The corresponding time in seconds.

Parameters

time_in_seconds (float) –
time_base (float) –
start_pts (float) –

Return type

float

class pytorchvideo.data.utils.MultiProcessSampler(*args, **kwds)[source]¶

MultiProcessSampler splits sample indices from a PyTorch Sampler evenly across workers spawned by a PyTorch DataLoader.

__iter__()[source]¶

Returns: Iterator for underlying PyTorch Sampler indices split by worker id.

pytorchvideo.data.utils.optional_threaded_foreach(target, args_iterable, multithreaded)[source]¶

Applies ‘target’ function to each Tuple args in ‘args_iterable’. If ‘multithreaded’ a thread is spawned for each function application.

Parameters

target (Callable) – A function that takes as input the parameters in each args_iterable Tuple.
args_iterable (Iterable[Tuple]) – An iterable of the tuples each containing a set of parameters to pass to target.
multithreaded (bool) – Whether or not the target applications are parallelized by thread.

class pytorchvideo.data.utils.DataclassFieldCaster[source]¶

Class to allow subclasses wrapped in @dataclass to automatically cast fields to their relevant type by default.

Also allows for an arbitrary intialization function to be applied for a given field.

static complex_initialized_dataclass_field(field_initializer, **kwargs)[source]¶

Allows for the setting of a function to be called on the named parameter associated with a field during initialization, after __init__() completes.

Parameters

field_initializer (Callable) – The function to be called on the field
**kwargs – To be passed downstream to the dataclasses.field method

Returns

(dataclasses.Field) that contains the field_initializer and kwargs infoÎ

Return type

dataclasses.Field

pytorchvideo.data.utils.load_dataclass_dict_from_csv(input_csv_file_path, dataclass_class, dict_key_field, list_per_key=False)[source]¶

Parameters

input_csv_file_path (str) – File path of the csv to read from
dataclass_class (type) – The dataclass to read each row into.
dict_key_field (str) – The field of ‘dataclass_class’ to use as the dictionary key.
list_per_key (bool) – If the output data structure
a list of dataclass objects per key (contains) –
than a (rather) –
unique dataclass object. (single) –

Returns

Dict[Any, Union[Any, List[Any]] mapping from the dataclass value at attr = dict_key_field to either:

if ‘list_per_key’, a list of all dataclass objects that have equal values at attr = dict_key_field, equal to the key

if not ‘list_per_key’, the unique dataclass object for which the value at attr = dict_key_field is equal to the key

Raises

AssertionError – if not ‘list_per_key’ and there are
dataclass obejcts with equal values at attr = dict_key_field –

Return type

Dict[Any, Union[Any, List[Any]]]

pytorchvideo.data.utils.save_dataclass_objs_to_headered_csv(dataclass_objs, file_name)[source]¶

Saves a list of @dataclass objects to the specified csv file.

Parameters

dataclass_objs (List[Any]) – A list of @dataclass objects to be saved.
file_name (str) – file_name to save csv data to.

Return type

None