3D ResNet

  Example Usage

  Imports

  Load the model:

  Import remaining functions:

  Setup

  Set the model to eval mode and move to desired device.

  Download the id to label mapping for the Kinetics 400 dataset on which the torch hub models were trained. This will be used to get the category label names from the predicted class ids.

  Define input transform

  Run Inference

  Download an example video.

  Load the video and transform it to the input format required by the model.

  Get Predictions

  Model Description

  The model architecture is based on [1] with pretrained weights using the 8x8 setting on the Kinetics dataset. arch depth frame length x sample rate top 1 top 5 Flops (G) Params (M) Slow R50 8x8 74.58 91.63 54.52 32.45

  References

  [1] Christoph Feichtenhofer et al, “SlowFast Networks for Video Recognition” https://arxiv.org/pdf/1812.03982.pdf