Model Zoo and Benchmarks¶

PyTorchVideo provides reference implementation of a large number of video understanding approaches. In this document, we also provide comprehensive benchmarks to evaluate the supported models on different datasets using standard evaluation setup. All the models can be downloaded from the provided links.

Kinetics-400¶

arch	depth	pretrain	frame length x sample rate	top 1	top 5	Flops (G)	Params (M)	Model
C2D	R50	-	8x8	71.46	89.68	25.89	24.33	link
I3D	R50	-	8x8	73.27	90.70	37.53	28.04	link
Slow	R50	-	4x16	72.40	90.18	27.55	32.45	link
Slow	R50	-	8x8	74.58	91.63	54.52	32.45	link
SlowFast	R50	-	4x16	75.34	91.89	36.69	34.48	link
SlowFast	R50	-	8x8	76.94	92.69	65.71	34.57	link
SlowFast	R101	-	8x8	77.90	93.27	127.20	62.83	link
CSN	R101	-	32x2	77.00	92.90	75.62	22.21	link
R(2+1)D	R50	-	16x4	76.01	92.23	76.45	28.11	link
X3D	XS	-	4x12	69.12	88.63	0.91	3.79	link
X3D	S	-	13x6	73.33	91.27	2.96	3.79	link
X3D	M	-	16x5	75.94	92.72	6.72	3.79	link

Something-Something V2¶

arch	depth	pretrain	frame length x sample rate	top 1	top 5	Flops (G)	Params (M)	Model
Slow	R50	Kinetics 400	8x8	60.04	85.19	55.10	31.96	link
SlowFast	R50	Kinetics 400	8x8	61.68	86.92	66.60	34.04	link

Charades¶

arch	depth	pretrain	frame length x sample rate	MAP	Flops (G)	Params (M)	Model
Slow	R50	Kinetics 400	8x8	34.72	55.10	31.96	link
SlowFast	R50	Kinetics 400	8x8	37.24	66.60	34.00	link

Using PytorchVideo model zoo¶

We provide several different ways to use PyTorchVideo model zoo.

The models have been integrated into TorchHub, so could be loaded with TorchHub with or without pre-trained models. Additionally, we provide a tutorial which goes over the steps needed to load models from TorchHub and perform inference.
PyTorchVideo models/datasets are also supported in PySlowFast. You can use PySlowFast workflow to train or test PyTorchVideo models/datasets.
You can also use PyTorch Lightning to build training/test pipeline for PyTorchVideo models and datasets. Please check this tutorial for more information.

Notes:

The above benchmarks are conducted by PySlowFast workflow using PyTorchVideo datasets and models.
For more details on the data preparation, you can refer to PyTorchVideo Data Preparation.

PytorchVideo Accelerator Model Zoo¶

Accelerator model zoo provides a set of efficient models on target device with pretrained checkpoints. To learn more about how to build model, load checkpoint and deploy, please refer to Use PytorchVideo/Accelerator Model Zoo.

Efficient Models for mobile CPU All top1/top5 accuracies are measured with 10-clip evaluation. Latency is benchmarked on Samsung S8 phone with 1s input clip length.

model	model builder	top 1	top 5	latency (ms)	params (M)	checkpoint
X3D_XS	models. accelerator. mobile_cpu. efficient_x3d. EfficientX3d (expansion="XS")	68.5	88.0	233	3.8	link
X3D_S	models. accelerator. mobile_cpu. efficient_x3d. EfficientX3d (expansion="S")	73.0	90.6	764	3.8	link