pytorchvideo.transforms.transforms¶
-
class
pytorchvideo.transforms.transforms.ApplyTransformToKey(key, transform)[source]¶ Applies transform to key of dictionary input.
- Parameters
key (str) – the dictionary key the transform is applied to
transform (callable) – the transform that is applied
Example
>>> transforms.ApplyTransformToKey( >>> key='video', >>> transform=UniformTemporalSubsample(num_video_samples), >>> )
-
class
pytorchvideo.transforms.transforms.UniformTemporalSubsample(num_samples)[source]¶ nn.Module wrapper for pytorchvideo.transforms.functional.uniform_temporal_subsample.
-
class
pytorchvideo.transforms.transforms.ShortSideScale(size)[source]¶ nn.Module wrapper for pytorchvideo.transforms.functional.short_side_scale.
pytorchvideo.transforms.functional¶
-
pytorchvideo.transforms.functional.uniform_temporal_subsample(x, num_samples, temporal_dim=1)[source]¶ Uniformly subsamples num_samples indices from the temporal dimension of the video. When num_samples is larger than the size of temporal dimension of the video, it will sample frames based on nearest neighbor interpolation.
- Parameters
x (torch.Tensor) – A video tensor with dimension larger than one with torch tensor type includes int, long, float, complex, etc.
num_samples (int) – The number of equispaced samples to be selected
temporal_dim (int) – dimension of temporal to perform temporal subsample.
- Returns
An x-like Tensor with subsampled temporal dimension.
- Return type
-
pytorchvideo.transforms.functional.short_side_scale(x, size, interpolation='bilinear', backend='pytorch')[source]¶ Determines the shorter spatial dim of the video (i.e. width or height) and scales it to the given size. To maintain aspect ratio, the longer side is then scaled accordingly.
- Parameters
x (torch.Tensor) – A video tensor of shape (C, T, H, W) and type torch.float32.
size (int) – The size the shorter side is scaled to.
interpolation (str) – Algorithm used for upsampling, options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’
backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181
- Returns
An x-like Tensor with scaled spatial dims.
- Return type
-
pytorchvideo.transforms.functional.repeat_temporal_frames_subsample(frames, frame_ratios, temporal_dim=1)[source]¶ - Prepare output as a list of tensors subsampled from the input frames. Each tensor
maintain a unique copy of subsampled frames, which corresponds to a unique pathway.
- Parameters
- Returns
frame_list (tuple) – list of tensors as output.
- Return type
Tuple[torch.Tensor]
-
pytorchvideo.transforms.functional.convert_to_one_hot(targets, num_class)[source]¶ This function converts target class indices to one-hot vectors, given the number of classes.
- Parameters
targets (torch.Tensor) –
num_class (int) –
- Return type
-
pytorchvideo.transforms.functional.uniform_crop(frames, size, spatial_idx=1)[source]¶ - Perform uniform spatial sampling on the frames based on three-crop setting.
If width is larger than height, take left, center and right crop. If height is larger than width, take top, center, and bottom crop.
- Parameters
frames (tensor) – A video tensor of shape (C, T, H, W) to perform uniform crop.
size (int) – Desired height and weight size to crop the frames.
spatial_idx (int) – 0, 1, or 2 for left, center, and right crop if width is larger than height. Or 0, 1, or 2 for top, center, and bottom crop if height is larger than width.
- Returns
cropped (tensor) – A cropped video tensor of shape (C, T, size, size).
- Return type