• Docs >
  • 2. Reconstruct a cat from a single video
Shortcuts

2. Reconstruct a cat from a single video

Previously, we’ve reconstructed a rigid body (a car). In this example, we show how to reconstruct a deformable object (a cat!).

Get pre-processed data

First, download and extract pre-processeed data:

bash scripts/download_unzip.sh "https://www.dropbox.com/s/mb7zgk73oomix4s/cat-pikachu-0.zip"

To use custom videos, see the preprocessing tutorial.

Training

To optimize the dynamic neural fields:

# Args: training script, gpu id, input args
bash scripts/train.sh lab4d/train.py 0 --seqname cat-pikachu-0 --logname fg-skel --fg_motion skel-quad

The difference from the previous example is that we model the object motion with a skeleton-based deformation field, instead of treating it as a rigid body.

You may choose fg_motion from one of the following motion fields:
  • rigid: rigid motion field (i.e., root body motion only, no deformation)

  • dense: dense motion field (similar to D-NeRF)

  • bob: bag-of-bones motion field (neural blend skinning in BANMo)

  • skel-human/quad: human or quadruped skeleton motion field (in RAC)

  • comp_skel-human/quad_dense: composed motion field (with skeleton-based deformation and soft deformation in RAC)

Note

The optimization uses 13G GPU memory and takes around 21 minutes on a 3090 GPU. You may find the list of flags at lab4d/config.py. To get higher quality, train for more iterations by adding –num_rounds 120.

To run on a machine with less GPU memory, you may reduce the –imgs_per_gpu.

Visualization during training

Please use tensorboard to monitor losses and intermediate renderings.

Here we show the final bone locations (1st), camera transformations and geometry (2nd).

Rendering after training

After training, we can check the reconstruction quality by rendering the reference view and novel views. Pre-trained checkpoints are provided here.

To render reference views of the input video, run:

# reference view
python lab4d/render.py --flagfile=logdir/$logname/opts.log --load_suffix latest --render_res 256

Note

Some of the frames are skipped during preprocessing (according to static-frame filtering) Those filtered frames are not used for training, and not rendered here.

To render novel views, run:

# turntable views, --viewpoint rot-elevation-angles --freeze_id frame-id-to-freeze
python lab4d/render.py --flagfile=logdir/$logname/opts.log --load_suffix latest --viewpoint rot-0-360 --render_res 256 --freeze_id 50

Note

The freeze_id is set to 50 to freeze the time at the 50-th frame while rotating the camera around the object.

To render a video of the proxy geometry and cameras over training iterations, run:

python scripts/render_intermediate.py --testdir logdir/$logname/

Exporting meshes and motion parameters after training

To export meshes and motion parameters, run:

python lab4d/export.py --flagfile=logdir/$logname/opts.log --load_suffix latest

Reconstruct the total scene

Now we have reconstructed the cat, can we put the cat in the scene? To do so, we train compositional neural fields with a foreground and a background component. Run the following to load the pre-trained foreground field and train the composed fields:

# Args: training script, gpu id, input args
bash scripts/train.sh lab4d/train.py 0 --seqname cat-pikachu-0 --logname comp-comp-s2 --field_type comp --fg_motion comp_skel-quad_dense --data_prefix full --num_rounds 120 --load_path logdir/cat-pikachu-0-fg-skel/ckpt_latest.pth

Note

The file_type is changed comp to compose the background field with the foreground field during differentiable rendering.

The fg_motion is changed to comp_skel-quad_dense to use the composed warping field (with skeleton-based deformation and soft deformation) for the foreground object.

To reconstruct the background, the data_prefix is changed to full to load the full frames instead of frames cropped around the object.

Note

We load the pretrained foreground model logdir/cat-pikachu-0-fg-skel/ckpt_latest.pth to initialize the optimization.

The optimization of 120 rounds (24k iterations) takes around 3.5 hours on a 3090 GPU.

To render videos from the bird’s eye view:

# bird's eye view, elevation angle=20 degree
python lab4d/render.py --flagfile=logdir/cat-pikachu-0-comp-comp-s2/opts.log --load_suffix latest --render_res 256 --viewpoint bev-20

Visit other tutorials.


© Copyright 2023, Gengshan Yang, Jeff Tan, Alex Lyons, Neehar Peri, Carnegie Mellon University.

Built with Sphinx using a theme provided by Read the Docs.