Shortcuts

pytorchvideo.data.domsev

class pytorchvideo.data.domsev.ActivityData(video_id, start_time, stop_time, start_frame, stop_frame, activity_id, activity_name)[source]

Class representing a contiguous activity video segment from the DoMSEV dataset.

pytorchvideo.data.domsev.seconds_to_frame_index(time_in_seconds, fps, zero_indexed=True)[source]

Converts a point in time (in seconds) within a video clip to its closest frame indexed (rounding down), based on a specified frame rate.

Parameters
  • time_in_seconds (float) – The point in time within the video.

  • fps (int) – The frame rate (frames per second) of the video.

  • zero_indexed (Optional[bool]) – Whether the returned frame should be zero-indexed (if True) or one-indexed (if False).

Returns

(int) The index of the nearest frame (rounding down to the nearest integer).

Return type

int

pytorchvideo.data.domsev.frame_index_to_seconds(frame_index, fps, zero_indexed=True)[source]

Converts a frame index within a video clip to the corresponding point in time (in seconds) within the video, based on a specified frame rate.

Parameters
  • frame_index (int) – The index of the frame within the video.

  • fps (int) – The frame rate (frames per second) of the video.

  • zero_indexed (Optional[bool]) – Whether the specified frame is zero-indexed (if True) or one-indexed (if False).

Returns

(float) The point in time within the video.

Return type

float

pytorchvideo.data.domsev.get_overlap_for_time_range_pair(t1_start, t1_stop, t2_start, t2_stop)[source]

Calculates the overlap between two time ranges, if one exists.

Returns

(Optional[Tuple]) A tuple of <overlap_start_time, overlap_stop_time> if an overlap is found, or None otherwise.

Parameters
Return type

Optional[Tuple[float, float]]

class pytorchvideo.data.domsev.DomsevDataset(*args, **kwds)[source]

Egocentric activity classification video dataset for DoMSEV stored as an encoded video (with frame-level labels). <https://www.verlab.dcc.ufmg.br/semantic-hyperlapse/cvpr2018-dataset/>

This dataset handles the loading, decoding, and configurable clip sampling for the videos.

__getitem__(index)[source]

Samples a video clip associated to the given index.

Parameters

index (int) – index for the video clip.

Returns

A video clip with the following format if transform is None

{{

‘video_id’: <str>, ‘video’: <video_tensor>, ‘audio’: <audio_tensor>, ‘activities’: <activities_tensor>, ‘start_time’: <float>, ‘stop_time’: <float>

}}

Otherwise, the transform defines the clip output.

Return type

Dict[str, Any]

__len__()[source]
Returns

The number of video clips in the dataset.

Return type

int