pytorchvideo.data.domsev¶

class pytorchvideo.data.domsev.ActivityData(video_id, start_time, stop_time, start_frame, stop_frame, activity_id, activity_name)[source]¶: Class representing a contiguous activity video segment from the DoMSEV dataset.

pytorchvideo.data.domsev.seconds_to_frame_index(time_in_seconds, fps, zero_indexed=True)[source]¶

Converts a point in time (in seconds) within a video clip to its closest frame indexed (rounding down), based on a specified frame rate.

Parameters

time_in_seconds (float) – The point in time within the video.
fps (int) – The frame rate (frames per second) of the video.
zero_indexed (Optional[bool]) – Whether the returned frame should be zero-indexed (if True) or one-indexed (if False).

Returns

(int) The index of the nearest frame (rounding down to the nearest integer).

Return type

int

pytorchvideo.data.domsev.frame_index_to_seconds(frame_index, fps, zero_indexed=True)[source]¶

Converts a frame index within a video clip to the corresponding point in time (in seconds) within the video, based on a specified frame rate.

Parameters

frame_index (int) – The index of the frame within the video.
fps (int) – The frame rate (frames per second) of the video.
zero_indexed (Optional[bool]) – Whether the specified frame is zero-indexed (if True) or one-indexed (if False).

Returns

(float) The point in time within the video.

Return type

float

pytorchvideo.data.domsev.get_overlap_for_time_range_pair(t1_start, t1_stop, t2_start, t2_stop)[source]¶

Calculates the overlap between two time ranges, if one exists.

Returns

(Optional[Tuple]) A tuple of <overlap_start_time, overlap_stop_time> if an overlap is found, or None otherwise.

Parameters

t1_start (float) –
t1_stop (float) –
t2_start (float) –
t2_stop (float) –

Return type

Optional[Tuple[float, float]]

class pytorchvideo.data.domsev.DomsevDataset(*args, **kwds)[source]¶

Egocentric activity classification video dataset for DoMSEV stored as an encoded video (with frame-level labels). <https://www.verlab.dcc.ufmg.br/semantic-hyperlapse/cvpr2018-dataset/>

This dataset handles the loading, decoding, and configurable clip sampling for the videos.

__getitem__(index)[source]¶

Samples a video clip associated to the given index.

Parameters

index (int) – index for the video clip.

Returns

A video clip with the following format if transform is None –

{{
‘video_id’: <str>, ‘video’: <video_tensor>, ‘audio’: <audio_tensor>, ‘activities’: <activities_tensor>, ‘start_time’: <float>, ‘stop_time’: <float>

}}

Otherwise, the transform defines the clip output.

Return type

Dict[str, Any]

__len__()[source]¶

Returns: The number of video clips in the dataset.
Return type: int