Skip to content

Bharath2/NeRF.cpp

Repository files navigation

NeRF.cpp

C++ / LibTorch implementation of NeRF. Given RGB images of a static scene with known camera intrinsics and per-image extrinsics, fits a SIREN-trunk radiance field with Fourier positional encoding and renders RGB and depth from arbitrary novel viewpoints.

License: BSD-3-Clause C++20 LibTorch CUDA

RGB orbit Depth orbit
RGB Depth

The radiance field forming over training — the object rotates while the reconstruction sharpens from the noisy SIREN initialisation to the final render:

Training progression

Pipeline

  1. Ray generation. For each pixel the camera-space direction ((x − W/2)/f, −(y − H/2)/f, −1) is rotated into world space by the extrinsics; the camera origin is the translation column.
  2. Sampling. Three strategies are supported at train and eval time: UNIFORM, STRATIFIED (NeRF §5.2: partition the interval into bins and draw one jittered sample per bin), and PROPOSAL (hierarchical: a coarse pass defines a per-ray PDF from which extra points are importance-sampled via inverse transform, concentrating samples near surfaces). For fair comparisons, single-pass samplers use 192 depth samples per ray (n_samples + n_importance); the proposal path uses 64 coarse + 128 importance samples, then evaluates the full radiance field at all 192 merged depths. The coarse pass uses a lightweight density-only proposal head (SirenNeRF::proposal_sigma): Fourier-encoded points with a stop-gradient feed a small trunk + σ head (no view directions, no RGB), so guiding importance sampling is much cheaper than re-running the full model. An interlevel loss (Mip-NeRF 360) trains the coarse proposal histogram to upper-bound the final main-network weight distribution (different bin locations, as in the paper).
  3. Encoding + MLP. Inputs use NeRF-style Fourier positional encoding (L = 10 for xyz, L = 4 for view directions). See Architecture for layer counts.
  4. Volume render. Discretised quadrature of σ along the ray gives per-sample α and transmittance; the resulting weights composite RGB and depth, alpha-composited onto a configurable background.
  5. Loss. Pseudo-Huber photometric loss (δ = 0.1) plus the interlevel loss when training with PROPOSAL. Ray-batched training draws 8192 rays per step pooled across all training images (for PROPOSAL and STRATIFIED).

Architecture

Default widths and depths are set in SirenNeRF (include/siren_nerf.h, src/nerf.cpp). Fourier positional encoding (L = 10 xyz, L = 4 view) is applied before the SIREN trunks; view directions use the same encoding scheme as position for the RGB head only.

Component Width SIREN layers Other layers Output
Main trunk (nerf_net_) 128 6 (1 input + 5 hidden) features
Main σ head 128 → 1 linear + softplus density
Main RGB head 128 + view enc → 128 2 linear + sigmoid RGB
Proposal trunk (prop_net_) 128 3 (1 input + 2 hidden) features
Proposal σ head 128 → 1 linear + softplus density

Main network total: 6 SIREN layers in the position trunk (view-independent σ), plus 2 SIREN layers in the RGB branch after view concat (8 SIREN layers on the RGB path), with separate linear density/RGB heads. The proposal path uses xyz Fourier encoding only (stop-gradient).

Proposal network total: 3 SIREN layers + 1 linear σ head; no RGB, no view input. Trained only via the interlevel loss, not photometric loss.

Training defaults (src/main.cpp): 160×160 images, TrainSampler::PROPOSAL, 64 coarse + 128 importance samples, 8192 rays/step, AdamW (lr = 5e-4, weight decay = 1e-2), warmup_iters = 0, interlevel weight 1.0, pseudo-Huber δ = 0.1, 10000 iterations.

Results

Trained and evaluated on the NeRF synthetic lego and ship scenes (held-out test split: every 8th view). Training uses 160×160 images, 10000 AdamW iterations (lr = 5e-4, weight decay = 1e-2), ray-batched PROPOSAL sampling (64 coarse + 128 importance → 192 main-model evals/ray), pseudo-Huber loss (δ = 0.1), on an NVIDIA RTX A6000. Metrics are averaged over the 13 test views at the final evaluation step.

Quality (PSNR / SSIM)

Scene PSNR ↑ SSIM ↑ Output dir
lego 25.05 0.901 output_lego_160_10k
ship 25.03 0.767 output_ship_160_10k

At 10k iterations lego reaches 0.901 SSIM (+0.004 vs 6k @ 160) and 25.05 PSNR (+0.26 dB). Ship holds ~25.0 PSNR; SSIM is slightly below earlier 6k ship runs, which is common for this scene at this resolution.

Inference efficiency (coarse pass)

When rendering with the hierarchical PROPOSAL strategy, the coarse pass can reuse the full radiance field (Option A) or the cheap proposal head (Option B). Timing one deterministic 120×120 test view, averaged over 10 runs on the RTX A6000 (lego, trained with proposal):

Coarse pass ms / view ↓ vs Option A
full model (Option A) 167
proposal head (Option B) 136 −19 %

The proposal head cuts coarse-pass latency by ~19 % because it skips view-dependent RGB and uses a narrow density-only trunk. Total hierarchical render time still includes a full 192-sample fine pass; the win is making the coarse PDF step cheap enough to use at both train and eval time without a large quality penalty.

Build

Requires a C++20 compiler, CMake ≥ 3.20, LibTorch (with CUDA if GPU training is wanted), and nlohmann_json.

mkdir build && cd build
cmake ..
make

Run

./NeRF.cpp <data_path> <output_path>

After training, assemble README GIFs (orbit + training progress):

bash make_gifs.sh output_lego_160_10k 30 10000 100 5 output/

load_dataset reads <data_path>/transforms.json — a shared camera_angle_x (FoV) and a transform_matrix (4×4 camera-to-world) per frame — plus the referenced image files. Scenes from the NeRF synthetic dataset (e.g. lego, ship) work directly.

References

License

BSD 3-Clause.

About

NeRF implementation in C++ for 3D reconstruction and Novel View Synthesis using LibTorch.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages