pytorchvideo.transforms.transforms¶

class pytorchvideo.transforms.transforms.ApplyTransformToKey(key, transform)[source]¶

Applies transform to key of dictionary input.

Parameters

key (str) – the dictionary key the transform is applied to
transform (callable) – the transform that is applied

Example

>>>   transforms.ApplyTransformToKey(
>>>       key='video',
>>>       transform=UniformTemporalSubsample(num_video_samples),
>>>   )

class pytorchvideo.transforms.transforms.UniformTemporalSubsample(num_samples)[source]¶: nn.Module wrapper for pytorchvideo.transforms.functional.uniform_temporal_subsample.

class pytorchvideo.transforms.transforms.ShortSideScale(size)[source]¶: nn.Module wrapper for pytorchvideo.transforms.functional.short_side_scale.

class pytorchvideo.transforms.transforms.RandomShortSideScale(min_size, max_size)[source]¶: nn.Module wrapper for pytorchvideo.transforms.functional.short_side_scale. The size parameter is chosen randomly in [min_size, max_size].

class pytorchvideo.transforms.transforms.UniformCropVideo(size, video_key='video', aug_index_key='aug_index')[source]¶: nn.Module wrapper for pytorchvideo.transforms.functional.uniform_crop.

pytorchvideo.transforms.functional¶

pytorchvideo.transforms.functional.uniform_temporal_subsample(x, num_samples, temporal_dim=1)[source]¶

Uniformly subsamples num_samples indices from the temporal dimension of the video. When num_samples is larger than the size of temporal dimension of the video, it will sample frames based on nearest neighbor interpolation.

Parameters

x (torch.Tensor) – A video tensor with dimension larger than one with torch tensor type includes int, long, float, complex, etc.
num_samples (int) – The number of equispaced samples to be selected
temporal_dim (int) – dimension of temporal to perform temporal subsample.

Returns

An x-like Tensor with subsampled temporal dimension.

Return type

torch.Tensor

pytorchvideo.transforms.functional.short_side_scale(x, size, interpolation='bilinear', backend='pytorch')[source]¶

Determines the shorter spatial dim of the video (i.e. width or height) and scales it to the given size. To maintain aspect ratio, the longer side is then scaled accordingly.

Parameters

x (torch.Tensor) – A video tensor of shape (C, T, H, W) and type torch.float32.
size (int) – The size the shorter side is scaled to.
interpolation (str) – Algorithm used for upsampling, options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’
backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181

Returns

An x-like Tensor with scaled spatial dims.

Return type

torch.Tensor

pytorchvideo.transforms.functional.repeat_temporal_frames_subsample(frames, frame_ratios, temporal_dim=1)[source]¶

Prepare output as a list of tensors subsampled from the input frames. Each tensor: maintain a unique copy of subsampled frames, which corresponds to a unique pathway.

Parameters

frames (tensor) – frames of images sampled from the video. Expected to have torch tensor (including int, long, float, complex, etc) with dimension larger than one.
frame_ratios (tuple) – ratio to perform temporal down-sampling for each pathways.
temporal_dim (int) – dimension of temporal.

Returns

frame_list (tuple) – list of tensors as output.

Return type

Tuple[torch.Tensor]

pytorchvideo.transforms.functional.convert_to_one_hot(targets, num_class)[source]¶

This function converts target class indices to one-hot vectors, given the number of classes.

Parameters

targets (torch.Tensor) –
num_class (int) –

Return type

torch.Tensor

pytorchvideo.transforms.functional.uniform_crop(frames, size, spatial_idx=1)[source]¶

Perform uniform spatial sampling on the frames based on three-crop setting.: If width is larger than height, take left, center and right crop. If height is larger than width, take top, center, and bottom crop.

Parameters

frames (tensor) – A video tensor of shape (C, T, H, W) to perform uniform crop.
size (int) – Desired height and weight size to crop the frames.
spatial_idx (int) – 0, 1, or 2 for left, center, and right crop if width is larger than height. Or 0, 1, or 2 for top, center, and bottom crop if height is larger than width.

Returns

cropped (tensor) – A cropped video tensor of shape (C, T, size, size).

Return type

torch.Tensor