Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Another interesting technology that came up last year is Differential Digital Signal Processing and the things you can do with it. Basically, the authors implemented a library of synthesizers in tensorflow and used those to train an autoencoder on monophonic audio, such that the waveform gets encoded into timbre, melody and loudness, which can be decoded to synthesizer control sequences, which then again sound like the original waveform. This (timbre, melody, loudness) is a pretty useful abstraction that has also been used by other work, but so far it only works for monophonic audio.
Generating audio is in general still a pretty hard topic, and I am not aware of any project that directly targets mashups of songs. In general, I would say the best model that generates music is OpenAi Jukebox, which can be conditioned on an artist and lyrics, or nothing. It uses VQ-VAEs to encode waveforms as a sequence of tokens similar to language, and then uses transformers like GPT-3 to generate these tokens, which can then be decoded to audio.