-
Not a dumb question! This first version is still mainly targetted at people who are in this area and generate some excitement, I do hope to make this more accesible though!
The inputs are 1. images 2. with a pose. The usual way to get poses for your images is https://github.com/colmap/colmap.
The output is a 3D model. Specifically a "Gaussian Splat", which is a sort of fuzzy point cloud. There are some tools out there to view & edit these (besides Brush), eg. https://playcanvas.com/supersplat/editor.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
The input to this are two things - images, and camare poses. The camera poses tell you where each camera was in 3D space (and some of its properties).
The training takes this information, to make a 3D model out it, visually matching all your photos.
COLMAP can still be quite expensive & a hassle sadly, order half hour, as opposed to seconds. There are modern alternatives like https://lpanaf.github.io/eccv24_glomap/, or even deep learning based systems like https://github.com/naver/dust3r
This is definitely still a big blocker to adoption. The goal is to get to a more all-in-one system. The splatting optimization can also help align cameras, if they don't start out entirely random, so any system to quickly provide a good "initial guess" will help here. At least for mobile devices, initialization from ARCore / ARKit poses should be enough.
Keep an eye out :)