Our great sponsors
-
demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation, but the goddamm motherfucker doesn't work.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
EfficientAT
This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
-
umx.cpp
C++17 port of Open-Unmix-PyTorch with streaming LSTM inference, ggml, quantization, and Eigen
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Demucs [1], one of the leading/SOTA systems, has an experimental 6-source model, `htdemucs_6s`, which adds piano and guitar:
>We are also releasing an experimental 6 sources model, that adds a guitar and piano source. Quick testing seems to show okay quality for guitar, but a lot of bleeding and artifacts for the piano source.
I also believe Audioshake [2] (a company in the space) is doing guitar separation as well.
1: https://github.com/facebookresearch/demucs
OK, so, tangentially related: I tried to do something once - I took small chunks of songs generated by SampleRNN in an attempt to stitch together the ones that sounded the most similar.
The script [1] uses Essentia Chromaprint [2] to "grade" the similarity of audio tracks, and combine the ones with the closest chromaprint.
I have a track on Soundcloud which uses the above technique (mashing together short generated clips by their chromagram), trained on Cannibal Corpse [3]
1: https://github.com/sevagh/1000sharks.xyz/blob/master/sampler...
2: https://essentia.upf.edu/reference/std_Chromaprinter.html
3: https://soundcloud.com/user-167126026/1000sharks-domainal-sk...
Interesting, I attempted to do the same as you but stopped just shy of BPM matching.
However I did get sound similarity working using an audio tagging neural net [1]. I chopped off the first and last 15 seconds of every song in my collection and ran them all through this analysis which produces a ~520 dimensional vector. I then targeted specific endings I wanted to match and used Euclidian distance to find the closest matching song beginning.
YMMV but I thought it actually worked pretty well, I just never got to automating the BPM matching. I can try to look for my old script if you're interested :)
[1] https://github.com/fschmid56/EfficientAT
Good question! So, I wasn't even thinking about WASM to begin with. When I saw llama.cpp and whisper.cpp on the front page of HN, I found the idea exciting - instead of neural networks being magic, I wanted to copy the ggml idea of parsing the PyTorch weights file myself and rewriting the inference code in a lower-level language than Python (or, it's even more accurate to say PyTorch, since there is so much matrix heavy lifting e.g. broadcasting or reshaping that is done for you automatically).
That's when I wrote umx.cpp [1] (which is what this site is based on).
On an unrelated project, a friend of mine mentioned WASM, and as I looked into it a bit more I thought trying to compile umx.cpp to WASM would be a great idea, since I only use Eigen (which is a header-only library that only depends on std).
1: https://github.com/sevagh/umx.cpp
I tried to use it but I had some issues as others in the thread.
I have tried many sources and method over the years and settled on spleeter [0]. Works well even for 10+ minute songs, varying styles from flamenco to heavy metal.
[0] https://github.com/deezer/spleeter
* Post-processing step (bigger impact)
I tried to tackle the post-processing step in my C++ code (which would win ~1 dB in quality across all targets) but it's too tricky for now [2]. Maybe some other day.
1: https://github.com/sevagh/free-music-demixer/blob/main/examp...
2: https://github.com/sigsep/open-unmix-pytorch/blob/master/ope...
* Post-processing step (bigger impact)
I tried to tackle the post-processing step in my C++ code (which would win ~1 dB in quality across all targets) but it's too tricky for now [2]. Maybe some other day.
1: https://github.com/sevagh/free-music-demixer/blob/main/examp...
2: https://github.com/sigsep/open-unmix-pytorch/blob/master/ope...
Related posts
- Are there any websites or programs that can separate vocals and drums from samples?
- I have questions about producing covers on Synth V
- Amending / separating vocals from songs - Audacity + DAW (Cakewalk) ?
- I've created a website that extracts audio stems from songs using Spleeter, Demucs3, and Open Unmix for free.
- I've created a website that extracts audio stems from songs using Spleeter, Demucs3, and Open Unmix for free.