-
ECAPA-TDNN
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
For face detection, we used the RetinaFace model with a MobileNet backbone from the InsightFace project. This model outputs four coordinates for each detected face on an image as well as 5 facial landmarks. The fact that images captured at different angles or with different optics can change the proportions of the face due to distortion. This may cause the model to struggle identifying the person.
The previous model with Jasper architecture was not able to verify the recordings of the same person taken from different microphones. So we solved this problem by using ECAPA-TDNN architecture, which was trained on VoxCeleb2 dataset from the SpeechBrain framework which did a better job at verifying employees.
The final security grain was added with speech-to-text anti-spoofing built on QuartzNet from the Nemo framework. This model provides a decent quality user experience and is suitable for real-time scenarios. To measure how close what the person says to what the system expects, requires calculation of the Levenshtein distance between them.