Get timestamps for .wav partials;

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Resemblyzer

4 2,596 3.4 Python

A python package to analyze and compare voices with deep learning

I want to perform a modification to Resemblyzer's speaker diarization script to cut out parts of audio where a specific speaker isn't present. While the graph generated by the original demo seems alright, the timestamps at which it chooses to cut the audio are off. I got this conclusion because when I outputted all the timestamp information, my 22 minute long video came out to be 1036 seconds long. Also, the variable I'm indexing the time by seems to be a collection of "wave partials as a list of slices ", as represented by the function that generates its value. Furthermore, the function I was modifying to get the time said that the intervals were non-reliable. This is bad, because as you will see below in my code, when cutting the video with ffmpeg, I treat them as if these were one-to-one with the video: from resemblyzer import preprocess_wav, VoiceEncoder from demo_utils import * from pathlib import Path from os import listdir, system from os.path import join def Diarization(path, file, segments): wav_fpath = Path(join(path, file)) wav = preprocess_wav(wav_fpath) speaker_names = ["Peter"] speaker_wavs = [wav[int(s[0] * sampling_rate):int(s[1]) * sampling_rate] for s in segments] encoder = VoiceEncoder("cpu") print("Running the continuous embedding on cpu, this might take a while...") _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16) speaker_embeds = [encoder.embed_utterance(speaker_wav) for speaker_wav in speaker_wavs] similarity_dict = {name: cont_embeds @ speaker_embed for name, speaker_embed in zip(speaker_names, speaker_embeds)} times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits] keep = True cutTimes = [[times[0], times[len(wav_splits) - 1]]] #similar = open("similarities.txt", "w+") for i in range(len(wav_splits)): similarities = [s[i] for s in similarity_dict.values()] best = np.argmax(similarities) name, similarity = list(similarity_dict.keys())[best], similarities[best] #similar.write(f"{times[i]} - {similarity}\n") if similarity > 0.65: if not keep: cutTimes.append([times[i], times[len(wav_splits) - 1]]) keep = True else: ⠀ if keep: cutTimes[len(cutTimes) - 1][1] = times[i] keep = False #similar.close() cutCommand = "" for num, seg in enumerate(cutTimes): if num == 0: cutCommand += f"between(t,{seg[0]},{seg[1]})" continue cutCommand += f"+between(t,{seg[0]},{seg[1]})" addMe = "Cut - " print(f"ffmpeg -i \"{join(path, file)}\" -af \"aselect='{cutCommand}',asetpts=N/SR/TB\" \"{join(path, addMe+file)}\"") system(f"ffmpeg -y -i \"{join(path, file)}\" -af \"aselect='{cutCommand}',asetpts=N/SR/TB\" \"{join(path, addMe+file)}\"") path = r'C:\Users\mlfre\OneDrive\Desktop\Resemblyzer\Resemblyzer-master\audio_data' for file in listdir(path): #if ".mp3" in file or ".wav" in file or ".mp4" in file: if file == "peter.mp3": segments = [[12, 21]] Diarization(path, file, segments) Since the graph's values were accurate in real time, if I could just manage to get the time intervals accurate in real time as well, I would be golden. Unfortunately, I do not know how to translate from iterating over a list of wav partials as slices, to the length in time of a wave file.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Text Prompt to SVG

1 project | news.ycombinator.com | 4 May 2024
GPT Home: A Home Assistant Built on the Raspberry Pi via the OpenAI API

1 project | news.ycombinator.com | 4 May 2024
Stirling PDF: Self-hosted, web-based PDF manipulation tool

4 projects | news.ycombinator.com | 2 May 2024
Show HN: Mininet-YAML – Create complex virtual networks through small YAML files

1 project | news.ycombinator.com | 4 May 2024
EcoEDA: Recycling E-Waste during Electronics Design

1 project | news.ycombinator.com | 4 May 2024

Get timestamps for .wav partials;

This page summarizes the projects mentioned and recommended in the original post on /r/learnpython Post date: 10 Sep 2021

Resemblyzer

InfluxDB

Related posts

Text Prompt to SVG

GPT Home: A Home Assistant Built on the Raspberry Pi via the OpenAI API

Stirling PDF: Self-hosted, web-based PDF manipulation tool

Show HN: Mininet-YAML – Create complex virtual networks through small YAML files

EcoEDA: Recycling E-Waste during Electronics Design