1. Reconstruct an arbitrary instance ======================================== In the first tutorial, we show how to reconstruct an arbitrary instance from a single video, taking `car-turnaround-2` as an example. .. raw:: html

.. note:: To reconstruct a complete shape, the video should contain sufficiently diverse viewpoint of the object. Download preprocessed data --------------------------------------- To download preprocessed data, run:: bash scripts/download_unzip.sh "https://www.dropbox.com/scl/fi/5wfbc692qhpejhyo8u9r0/car-turnaround-2.zip?rlkey=riq060i3wm5raynxryf8g2hcw&dl=0" This will download and unzip the preprocessed data to `database/processed/$type-of-processed-data/Full-Resolution/car-turnaround-2-0000/`. To use custom videos, see the `preprocessing tutorial `_. .. note:: The preprocessed data is stored with the following structure under `database/processed/`: - JPEGImages/Full-Resolution/$seqname/%05d.jpg - stores the raw rgb images (after flow filtering that removes static frames) - Annotations/Full-Resolution/$seqname/%05d.{npy,jpg} - .npy stores an image array with instance ids. We store the array as np.int8. - If there is no detection, set all pixels values to -1 - value 0: background - 1...127: instance ids. Currently only support 1 instance per video (id=1). - .jpg is for visualization purpose - Features/Full-Resolution/$seqname/{densepose-%02d.npy, dinov2-%02d.npy} - stores pixel features of segmented objects, either from `DensePose-CSE `_ (for either human or quadruped animals) or `DINOv2 `_ (for generic objects). - Flow{FW,BW}_%02d/Full-Resolution/$seqname/%05d.npy - stores forward / backward optical flow and their uncertainty from `VCNPlus `_. - Depth/Full-Resolution/$seqname/%05d.npy - stores depth maps estimated by `ZoeDepth `_ - Cameras/Full-Resolution/$seqname/%02d.npy - world-to-camera transformations (00.npy) and object-to-camera transformations (01.npy). - We use the opencv coordinate system convention defined as follows: - x: right - y: down - z: forward The metadata file at `database/configs/$seqname.config` is used to load the dataset. It also stores the initial camera intrinsics (obtained by heuristics `focal length = max(h,w)`). Visualize preprocessed data --------------------------------------- Before training, let's check the accuracy of the pesudo ground-truth. Instance segmentation and tracking ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Visualization of instance segmentation and tracking can be found at `Annotations/Full-Resolution/$seqname/vis.mp4`: .. raw:: html

Optical flow ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Visualization of Optical flow can be found at `Flow{FW,BW}_%02d/Full-Resolution/$seqname/visflow-%05d.jpg`. Color indicates flow direction and length indicates flow magnitude. The empty region is where the flow is uncertain. .. raw:: html

World / object to camera transformations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Visualizations of world / object to camera transformations can be found at `Cameras/Full-Resolution/$seqname/*.obj`. To visualize .obj files, use the vscode-3d-preview extension of VS Code, or download to local and open with meshlab. Below we show sparsely-annotated transformations (1st, `...canonical-prealign.obj`) and full transformations of all frames (2nd, `...canonical.obj`): .. raw:: html

.. note:: We assume opencv coordinate convention in the above visualizations. Each camera is represented by three axes: x (red, right), y (green, down), z (blue, forward). The object-to-camera transformations are roughly annotated in 12 frames and refined and propogated to all 120 frames using flow and monocular depth. Model Training --------------------------------------- In this stage, we use the pseudo ground-truth from the previous steps to train dynamic neural fields. The camera transformations are used to initialize the model. The other data including rgb, segmentation, flow, and depth are used to supervise the model. Run:: # Args: training script, gpu id, args for training script bash scripts/train.sh lab4d/train.py 0 --seqname car-turnaround-2 --logname fg-rigid --fg_motion rigid .. note:: The optimization takes around 14 minutes on a 3090. You may find the list of flags at `lab4d/config.py `_. By default we train for 20 rounds (each round contains 200 iterations), which leads to a good reconstruction quality and is used for developement purpose. To get higher quality, train for more iterations by adding `--num_rounds 120`. The rendering results in this page assumes 120 rounds, which takes 1.5 hours. Visualization during training --------------------------------------- - We use tensorboard to monitor losses and visualize intermediate renderings. Tensorboard logs are saved at `logdir/$logname`. To use tensorboard in VS Code, hold `shift+cmd+p` and select launch tensorboard. - Camera transformations and a low-res proxy geometry are saved at `logdir/$logname/...proxy.obj`. We show the final proxy geometry and cameras below: .. raw:: html

Rendering after training --------------------------------------- After training, we can check the reconstruction quality by rendering the reference view and novel views. Pre-trained checkpoints are provided `here `_. To render the reference view, run:: # reference view python lab4d/render.py --flagfile=logdir/$logname/opts.log --load_suffix latest --render_res 256 .. raw:: html

On the left we show the rgb rendering and on the right we show the dense corresonpdence (same color indicates the same canonical surface point). To render novel views, run:: # turntable views, --viewpoint rot-elevation-angles python lab4d/render.py --flagfile=logdir/$logname/opts.log --load_suffix latest --viewpoint rot-0-360 --render_res 256 # birds-eye-views, --viewpoint bev-elevation python lab4d/render.py --flagfile=logdir/$logname/opts.log --load_suffix latest --viewpoint bev-90 --render_res 256 .. raw:: html

.. note:: Rendering the above video at 256x256 takes ~40s on a 3090 (~0.4s/frame). The default rendering resolution is set to 128x128 for fast rendering. To render a video of the proxy geometry and cameras over training iterations, run:: python lab4d/render_intermediate.py --testdir logdir/$logname/ .. raw:: html

Exporting meshes and motion parameters after training ----------------------------------------------------------- To export meshes and motion parameters, run:: python lab4d/export.py --flagfile=logdir/$logname/opts.log --load_suffix latest .. raw:: html

.. note:: The default setting may produce broken meshes. To get better one as shown above, train for more iterations by adding `--num_rounds 120`. Also see `this `_ for an explanation. Visit other `tutorials `_.