Shortcuts

Model Zoo and Benchmarks

PyTorchVideo provides reference implementation of a large number of video understanding approaches. In this document, we also provide comprehensive benchmarks to evaluate the supported models on different datasets using standard evaluation setup. All the models can be downloaded from the provided links.

Kinetics-400

arch depth pretrain frame length x sample rate top 1 top 5 Flops (G) Params (M) Model
C2D R50 - 8x8 71.46 89.68 25.89 24.33 link
I3D R50 - 8x8 73.27 90.70 37.53 28.04 link
Slow R50 - 4x16 72.40 90.18 27.55 32.45 link
Slow R50 - 8x8 74.58 91.63 54.52 32.45 link
SlowFast R50 - 4x16 75.34 91.89 36.69 34.48 link
SlowFast R50 - 8x8 76.94 92.69 65.71 34.57 link
SlowFast R101 - 8x8 77.90 93.27 127.20 62.83 link
CSN R101 - 32x2 77.00 92.90 75.62 22.21 link
R(2+1)D R50 - 16x4 76.01 92.23 76.45 28.11 link
X3D XS - 4x12 69.12 88.63 0.91 3.79 link
X3D S - 13x6 73.33 91.27 2.96 3.79 link
X3D M - 16x5 75.94 92.72 6.72 3.79 link

Something-Something V2

arch depth pretrain frame length x sample rate top 1 top 5 Flops (G) Params (M) Model
Slow R50 Kinetics 400 8x8 60.04 85.19 55.10 31.96 link
SlowFast R50 Kinetics 400 8x8 61.68 86.92 66.60 34.04 link

Charades

arch depth pretrain frame length x sample rate MAP Flops (G) Params (M) Model
Slow R50 Kinetics 400 8x8 34.72 55.10 31.96 link
SlowFast R50 Kinetics 400 8x8 37.24 66.60 34.00 link

Using PytorchVideo model zoo

We provide several different ways to use PyTorchVideo model zoo.

  • The models have been integrated into TorchHub, so could be loaded with TorchHub with or without pre-trained models. Additionally, we provide a tutorial which goes over the steps needed to load models from TorchHub and perform inference.

  • PyTorchVideo models/datasets are also supported in PySlowFast. You can use PySlowFast workflow to train or test PyTorchVideo models/datasets.

  • You can also use PyTorch Lightning to build training/test pipeline for PyTorchVideo models and datasets. Please check this tutorial for more information.

Notes:

PytorchVideo Accelerator Model Zoo

Accelerator model zoo provides a set of efficient models on target device with pretrained checkpoints. To learn more about how to build model, load checkpoint and deploy, please refer to Use PytorchVideo/Accelerator Model Zoo.

Efficient Models for mobile CPU All top1/top5 accuracies are measured with 10-clip evaluation. Latency is benchmarked on Samsung S8 phone with 1s input clip length.

model model builder top 1 top 5 latency (ms) params (M) checkpoint
X3D_XS models. accelerator. mobile_cpu. efficient_x3d. EfficientX3d (expansion="XS") 68.5 88.0 233 3.8 link
X3D_S models. accelerator. mobile_cpu. efficient_x3d. EfficientX3d (expansion="S") 73.0 90.6 764 3.8 link