Voice to Text telegram bot

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • v2t

  • So that'd be it. Overall we ended up having a nice and tidy piece of code, which actually is super useful for many of you out there. This is literally opposite from the majority of enterprise stuff, that many software engineers are writing on their 9-to-5, but let's not be their judges. If you find any of this confusing, or just want to jump straight to the code, the final version is on my Github, here: https://github.com/sonac/v2t

  • Poetry

    Python packaging and dependency management made easy

  • We're gonna build the telegram bot, which would be able to patiently listen to the message that was sent to you and then just return you beautiful, pure text, the one that we all love so much. Ok, so if it's a speech recognition task, this means neural networks (there are of course available APIs, but we don't want to send our private messages to someone like Google). If it's neural networks - then we think Python. Actually, we won't train our own network, so we would be able to in fact be done with most other programming languages, but Python just feels more natural tool of choice for this task. We will start by initializing a new project (I'll be using poetry as a virtual environment and dependency management tool).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • FFmpeg

    Mirror of https://git.ffmpeg.org/ffmpeg.git

  • We'll again define it inside the class, to keep stuff organized, it will take one positional parameter at invocation, which is gonna be the name of the model, we gonna be using for our speech recognition (all models are available here: https://alphacephei.com/vosk/models). During class instantiation it will also instantiate the model, which will take some time, so be patient during app startup, it will not be as immediate as before. Our class will contain two methods, one private (well, kinda private, since it's Python) and one public. The first one will be responsible for concatenating pieces of transcripted text and the second accepts the path to the file, which contains audio to the transcript. Here we'll use the subprocess library to launch ffmpeg for converting to wave format. If you don't have ffmpeg in your system, you can install it using your package manager. To read more about ffmpeg you can go to https://ffmpeg.org.

  • telegram-bot-api

    Telegram Bot API server (by tdlib)

  • We'll create a class Bot, which will be the core of our program (that's unsurprising, considering we're writing the telegram bot). During its instantiation, it will read the token from your telegram bot (to learn how to register one go here) and build a basic bot app. Along with this, we're gonna add two async methods to our class, which will serve as message handlers (functions that are invoked when a specific message is received), one for video notes and one for audio. And the last, but not least start method, which is responsible for registering the message handlers and basically starting our bot. In the end, since it's our main.py file we'll add a couple of lines to invoke this bot and start the app when the script is running.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts