pix2pixHD vs gaussian-splatting

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

pix2pixHD		gaussian-splatting
	Project
6	Mentions	7
6,521	Stars	11,266
0.9%	Growth	10.8%
0.0	Activity	8.9
11 months ago	Latest Commit	23 days ago
Python	Language	Python
GNU General Public License v3.0 or later	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

pix2pixHD

Posts with mentions or reviews of pix2pixHD. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-02-04.

How do I run more than 200 epochs in training a Pix2PixHD model?
1 project | /r/MLQuestions | 11 Aug 2022
NVIDIA DLSS Now Available in Over 150 Games, Including Dying Light 2 Stay Human, Sifu and Phantasy Star Online 2 New Genesis
1 project | /r/Games | 14 Feb 2022

Well, maybe not, considering things like pix2pix can generate detail from just solid shapes and colors.
Image to hand drawn
4 projects | /r/artificial | 4 Feb 2022

Sources: U2Net, ArtLine, Pix2PixHD, APDrawingGAN
[P] I made FaceShop! Instance segmentation + CGAN for editing faces (badly)
3 projects | /r/MachineLearning | 25 Sep 2021

Pix2PixHD (from DeepSIM)

3 projects | /r/MachineLearning | 25 Sep 2021

Uses a mix of instance segmentation (BiSeNet) and conditional GAN, and is heavily inspired by the Pix2PixHD and DeepSIM papers. Will have more details when I wake up!
How to access a class object when I use torch.nn.DataParallel()?
1 project | /r/pytorch | 12 Mar 2021

I used Pix2PixHD implementation in GitHub if you want to see the full code.

gaussian-splatting

Posts with mentions or reviews of gaussian-splatting. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-24.

Show HN: Gaussian Splat renderer in VR with Unity
2 projects | news.ycombinator.com | 24 Jan 2024

Chris' post doesn't really give much background info, so here's what's going on here and why it's awesome.
Real-time 3D rendering has historically been based on rasterisation of polygons. This has brought us a long way and has a lot of advantages, but making photorealistic scenes takes a lot of work from the artist. You can scan real objects like photogrammetry and then convert to high poly meshes, but photogrammetry rigs are pro-level tools, and the assets won't render at real time speeds. Unreal 5 introduced Nanite which is a very advanced LoD algorithm and that helps a lot, but again, we seem to be hitting the limits of what can be done with polygon based rendering.
3D Gaussian Splats is a new AI based technique that lets you render in real-time photorealistic 3D scenes that were captured with only a few photos taken using normal cameras. It replaces polygon based rendering with radiance fields.
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
3DGS uses several advanced techniques:
1. A 3D point cloud is estimated by using "structure in motion" techniques.
2. The points are turned into "3D gaussians", which are sort of floating blobs of light where each one has a position, opacity and a covariance matrix defined using "spherical harmonics" (no me neither). They're ellipsoids so can be thought of as spheres that are stretched and rotated.
3. Rendering is done via a form of ray-tracing in which the 3D Gaussians are projected to the 2D screen (into "splats"), sorted so transparency works and then rasterized on the fly using custom shaders.
The neural network isn't actually used at rendering time, so GPUs can render the scene nice and fast.
In terms of what it can do the technique might be similar to Unreal's Nanite. Both are designed for static scenes. Whilst 3D Gaussians can be moved around on the fly, so the scene can be changed in principle, none of the existing animation, game engines or artwork packages know what to do without polygons. But this sort of thing could be used to rapidly create VR worlds based on only videos taken from different angles, which seems useful.
Bad accuracy after model training, Can someone help me ?
1 project | /r/GaussianSplatting | 7 Dec 2023

the repo which I used is : https://github.com/graphdeco-inria/gaussian-splatting
The initial work to get the gaussian-splatting training code working on AMD/ROCm has been done
1 project | /r/GaussianSplatting | 19 Oct 2023
Future Tech SD VR, 3D modeling, Movies, and Video Game Creation (Paper / Videos included)
1 project | /r/StableDiffusion | 10 Oct 2023
Show HN: Real-Time 3D Gaussian Splatting in WebGL
4 projects | news.ycombinator.com | 11 Sep 2023

Really cool, I am also working on a port of gaussian-splatting [0] but to WebGPU.
Like all the other implementations I have seen so far, this also makes the same mistake when projecting the ellipsoids in a perspective: First you calculate the covariance in 3D and then project that to 2D [1]. This approach only works with parallel / orthographic projections and applying it to perspectives leads to incorrect results. That is because perspective projections have two additional effects:
- Parallax movements (that is the view plane moves parallel to the ellipsoids) change the shape of the projected ellipse. E.g. a sphere only appears circular when in center of the view, once it moves to the edges it becomes stretched into an ellipse. This effect is manually counter balanced by this matrix I believe [2].
- Rotating an ellipse can change the position it appears at, or in other words creates additional translation. This effect is zero if the ellipse has one of its three axes pointing straight at the view (parallel to the normal of the view plane). But, if it is rotated 45°, then the tip of the ellipse that is closer to the view plane becomes larger through the perspective while the other end becomes smaller. Put together, this slightly shifts the center of the appearance away from the projected center of the ellipsoid.
- Conic sections can not only result in ellipses but also parabola and hyperbola. This however is an edge case that only happens when the ellipsoid intersects with the view plane and can probably be ignored as one would clip away such ellipsoids anyway.
The last two effects are not accounted for in these calculations in any of the implementations I have seen so far. What would be correct to do instead? Do not calculate the 3D covariance. Instead calculate the bounding cone around the ellipsoid which has its vertex at the camera position (perspective origin). Then intersect that with the view plane and the resulting conic section is guaranteed to be the correct contour of the perspective projection of the ellipsoid.
[0]: https://github.com/graphdeco-inria/gaussian-splatting

What are some alternatives?

When comparing pix2pixHD and gaussian-splatting you can also consider the following projects:

pytorch-CycleGAN-and-pix2pix - Image-to-Image Translation in PyTorch

awesome-colab-notebooks - Collection of google colaboratory notebooks for fast and easy experiments

splat - WebGL 3D Gaussian Splat Viewer

sofgan - [TOG 2022] SofGAN: A Portrait Image Generator with Dynamic Styling

face-parsing.PyTorch - Using modified BiSeNet for face parsing in PyTorch

generative-inpainting-pytorch - A PyTorch reimplementation for paper Generative Image Inpainting with Contextual Attention (https://arxiv.org/abs/1801.07892)

contrastive-unpaired-translation - Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

CycleGAN - Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

StyleSwin - [CVPR 2022] StyleSwin: Transformer-based GAN for High-resolution Image Generation

Im2Vec - [CVPR 2021 Oral] Im2Vec Synthesizing Vector Graphics without Vector Supervision

DeblurGANv2 - [ICCV 2019] "DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better" by Orest Kupyn, Tetiana Martyniuk, Junru Wu, Zhangyang Wang