Q&A¶
Installation¶
Conda/mamba is not able to resolve conflicts when installing packages.
Possible cause: The base conda environment is not clean. See the discussion in this thread.
Fix: Remove packages of the base environment that causes the conflict.
Data pre-processing¶
My gradio app got stuck at the loading screen.
Potential fix: kill the running vscode processes, and re-run the preprocessing code.
Model training¶
How to change hyperparameters when using more videos (or video frames)?
You want to increase pixels_per_image, imgs_per_gpu and use more gpus. The number of sampled rays / pixels per minibatch is computed as the number of gpus x imgs_per_gpu x pixels_per_image. Also see the note here.
Training on >50 videos might cause the following os error:
[Errno 24] Too many open files
To check the current file limit, run:
ulimit -S -n
To increate open file limit to 4096, run:
ulimit -u -n 4096
Multi-GPU training hangs but single-GPU training works fine.
Run training script with NCCL_P2P_DISABLE=1 bash scripts/train.sh … to disable direct GPU-to-GPU (P2P) communication. See discussion here.