3D Gaussian Splatting

TL;DR

Introduced by Kerbl, Kopanas, Leimkühler, and Drettakis at Inria in '3D Gaussian Splatting for Real-Time Radiance Field Rendering' (SIGGRAPH 2023).
Represents a scene as millions of explicit 3D anisotropic Gaussians — each with position, covariance, opacity, and view-dependent colour stored as spherical harmonics.
Renders by projecting each Gaussian to screen space and alpha-blending in depth order, achieving genuine real-time frame rates on consumer GPUs.
Matched or exceeded NeRF quality while rendering 100× faster, becoming the default real-time neural scene representation almost overnight.

Why It Was a Step Change#

NeRF and its successors made photorealistic novel-view synthesis possible but never solved real-time rendering. The fastest NeRF variants — Instant-NGP, Mip-NeRF 360 distillations — could hit interactive frame rates at low resolution on expensive hardware, but the volumetric ray marching kept fundamental compute cost high.

3D Gaussian Splatting replaced the implicit MLP with an explicit, optimisable point cloud where every point is an anisotropic 3D Gaussian. Rendering becomes a sort-and-rasterise problem — project the Gaussians to screen, sort by depth per tile, alpha-blend front-to-back. The same operations a GPU rasteriser was designed for. The result is genuine real-time rendering at NeRF-equivalent or better quality, on a single consumer GPU.

What a Splat Stores#

A typical reconstructed scene contains 1-5 million Gaussians. Storage is several hundred megabytes uncompressed; compressed PLY and structured formats have brought this down significantly.

Position μ — 3D centre of the Gaussian.
Covariance Σ — anisotropic shape, parameterised via a rotation quaternion and a scale triplet to keep Σ positive-semi-definite.
Opacity α — scalar.
Colour c — view-dependent, encoded as spherical harmonics (typically up to degree 3, giving 16 SH coefficients per channel).

Training#

Training optimises Gaussian parameters with stochastic gradient descent against per-pixel rendering loss. The standout training-time mechanism is adaptive density control: Gaussians are split when their gradients are large (under-reconstructed regions) and merged or pruned when they are tiny or transparent. The point cloud grows and shrinks during training, ending at a count tuned to the scene's complexity.

Training time on a single GPU is typically minutes to tens of minutes per scene — an order of magnitude faster than vanilla NeRF and comparable to Instant-NGP.

Rendering#

The original implementation ships custom CUDA kernels for sort and rasterise. On an L40S or H100, 1080p rendering at 100+ FPS for a million-Gaussian scene is straightforward. Mobile and web ports (mobile-splatting, splat.js variants) have brought rendering to phones and browsers, though typically with reduced quality.

text

For each frame:
  1. Project every 3D Gaussian to 2D screen-space Gaussian.
  2. Tile the screen (e.g. 16x16 pixel tiles).
  3. Bucket Gaussians by tile and sort by depth within each tile.
  4. For each pixel in each tile, alpha-blend the sorted Gaussians
     front-to-back until opacity saturates.
  5. Evaluate spherical harmonics at the current view direction for
     each contributing Gaussian to get view-dependent colour.

Active Research Directions#

Compression — neural compression and SH coefficient quantisation have cut scene size from gigabytes to tens of megabytes with minor quality loss.
Dynamic scenes — 4D Gaussian Splatting variants extend the representation to time-varying scenes for video.
Editing — text-driven and mask-driven Gaussian editing (GaussianEditor and follow-ups) is a fast-moving area.
Sparse-input — making Splatting work with 3-10 input views rather than 50-200 is a frontier with multiple competing approaches in 2025-2026.
Semantic Gaussians — embedding feature vectors (DINOv2, CLIP) into each splat for language-driven scene queries.

For digital twin and inspection workloads on Yobitel infrastructure, Splatfacto (Nerfstudio's Gaussian Splatting implementation) on L40S is the recommended baseline. Storage and bandwidth are typically the deployment bottleneck, not training compute.

Practical Comparison with NeRF#

Property	NeRF (vanilla)	Gaussian Splatting
Representation	Implicit MLP	Explicit point cloud of Gaussians
Training time	1-2 days/scene	Minutes to tens of minutes
Render speed	Seconds per frame	Real-time (100+ FPS)
Scene size	Tens of MB (MLP weights)	Hundreds of MB (uncompressed)
View-dependent colour	Direction input to MLP	Per-Gaussian spherical harmonics
Editing	Hard (implicit)	Easier (explicit primitives)

References

3D Gaussian Splatting for Real-Time Radiance Field Rendering (Kerbl et al., 2023) · Inria project page
Reference implementation · GitHub
Nerfstudio Splatfacto · Nerfstudio Docs

Why It Was a Step Change#

What a Splat Stores#

A typical reconstructed scene contains 1-5 million Gaussians. Storage is several hundred megabytes uncompressed; compressed PLY and structured formats have brought this down significantly.

Position μ — 3D centre of the Gaussian.

Covariance Σ — anisotropic shape, parameterised via a rotation quaternion and a scale triplet to keep Σ positive-semi-definite.

Opacity α — scalar.

Colour c — view-dependent, encoded as spherical harmonics (typically up to degree 3, giving 16 SH coefficients per channel).

Training#

Training time on a single GPU is typically minutes to tens of minutes per scene — an order of magnitude faster than vanilla NeRF and comparable to Instant-NGP.

Rendering#

text

For each frame:
  1. Project every 3D Gaussian to 2D screen-space Gaussian.
  2. Tile the screen (e.g. 16x16 pixel tiles).
  3. Bucket Gaussians by tile and sort by depth within each tile.
  4. For each pixel in each tile, alpha-blend the sorted Gaussians
     front-to-back until opacity saturates.
  5. Evaluate spherical harmonics at the current view direction for
     each contributing Gaussian to get view-dependent colour.

Active Research Directions#

Compression — neural compression and SH coefficient quantisation have cut scene size from gigabytes to tens of megabytes with minor quality loss.

Dynamic scenes — 4D Gaussian Splatting variants extend the representation to time-varying scenes for video.

Editing — text-driven and mask-driven Gaussian editing (GaussianEditor and follow-ups) is a fast-moving area.

Sparse-input — making Splatting work with 3-10 input views rather than 50-200 is a frontier with multiple competing approaches in 2025-2026.

Semantic Gaussians — embedding feature vectors (DINOv2, CLIP) into each splat for language-driven scene queries.

Practical Comparison with NeRF#

Property	NeRF (vanilla)	Gaussian Splatting
Representation	Implicit MLP	Explicit point cloud of Gaussians
Training time	1-2 days/scene	Minutes to tens of minutes
Render speed	Seconds per frame	Real-time (100+ FPS)
Scene size	Tens of MB (MLP weights)	Hundreds of MB (uncompressed)
View-dependent colour	Direction input to MLP	Per-Gaussian spherical harmonics
Editing	Hard (implicit)	Easier (explicit primitives)

3D Gaussian Splatting

Why It Was a Step Change#

What a Splat Stores#

Training#

Rendering#

Active Research Directions#

Practical Comparison with NeRF#

References

Browse all entries

Deploy on Yobitel

3D Gaussian Splatting

Why It Was a Step Change#

What a Splat Stores#

Training#

Rendering#

Active Research Directions#

Practical Comparison with NeRF#

References

Browse all entries

Deploy on Yobitel