Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
P.808
This is an open-source implementation of the ITU P.808 standard for "Subjective evaluation of speech quality with a crowdsourcing approach" (see https://www.itu.int/rec/T-REC-P.808/en). It uses Amazon Mechanical Turk as the crowdsourcing platform. It includes implementations for Absolute Category Rating (ACR), Degradation Category Rating (DCR), and Comparison Category Rating (CCR).
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
It looks like the library in Rust is using `tract-onnx` to do the inference: https://github.com/Rikorose/DeepFilterNet/blob/2a84d2a1750a5... I am wondering whether using Python for research, training in big data center, and Rust at edge for efficient inference would be a trend in the future. We do have a larger community of C++ right now for inference (e.g. ggml). But Rust crate as component to build applications of AI is joy to use.
Frankly, what I hear is very similar to the results of classic spectral denoising, even with the characteristic artifacts (for Linux, there's Noise Repellent [1] available for advanced spectral denoising; there's also a ton of commercial spectral processors).
The demonstration could use more random background noises to separate it from spectral processors, and more varied vocabulary to separate it from RNNVoice [2] which tends to suppress breath and parts of sibilants, making the sound unnatural. The latency is also important - is it as low as in RNNVoice? What about the CPU load?
[1] https://github.com/lucianodato/noise-repellent
[2] https://github.com/werman/noise-suppression-for-voice
Frankly, what I hear is very similar to the results of classic spectral denoising, even with the characteristic artifacts (for Linux, there's Noise Repellent [1] available for advanced spectral denoising; there's also a ton of commercial spectral processors).
The demonstration could use more random background noises to separate it from spectral processors, and more varied vocabulary to separate it from RNNVoice [2] which tends to suppress breath and parts of sibilants, making the sound unnatural. The latency is also important - is it as low as in RNNVoice? What about the CPU load?
[1] https://github.com/lucianodato/noise-repellent
[2] https://github.com/werman/noise-suppression-for-voice
The golden standard is ITU-T P.808 subject test https://github.com/microsoft/P.808