The difficulties of transcribing tone. Or, what's the goal of transcribing IPA with Machine Learning?

This page summarizes the projects mentioned and recommended in the original post on /r/linguistics

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • this-word-does-not-exist

    This Word Does Not Exist

  • I'm a software engineer by profession and occasionally have reason to play with so-called Machine Learning (ML). I think the best show case of what's possible nowadays is the This X Does Not Exist fashion for generating permutations of arbitrary categories of say, human faces or even English words. Imagine a word that seemingly possesses all the natural characteristics of a word, but is not a word that actually exists, for example: trichurid. ML can produce infinite numbers of these.

  • fairseq

    Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

  • It would seem ML is well placed to handle the world of phonemes (fundamental categories) and phones (permutations on those categories). Indeed, Facebook has a mature project and set of pre-trained ML models for something similar, if not identical: wav2vec (v2.0). If it's not identical then I think it'd be trivial to achieve. Wav2vec is trained to map the spoken word of a language to that language's particular writing system, see here for a specific example. However, we already have plenty of software that can convert writing systems to IPA. Whilst all that does connect a lot of dots, it's not exactly what I think the goal should be.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • stylegan2-pytorch

    Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement

  • I'm a software engineer by profession and occasionally have reason to play with so-called Machine Learning (ML). I think the best show case of what's possible nowadays is the This X Does Not Exist fashion for generating permutations of arbitrary categories of say, human faces or even English words. Imagine a word that seemingly possesses all the natural characteristics of a word, but is not a word that actually exists, for example: trichurid. ML can produce infinite numbers of these.

  • epitran

    A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)

  • It would seem ML is well placed to handle the world of phonemes (fundamental categories) and phones (permutations on those categories). Indeed, Facebook has a mature project and set of pre-trained ML models for something similar, if not identical: wav2vec (v2.0). If it's not identical then I think it'd be trivial to achieve. Wav2vec is trained to map the spoken word of a language to that language's particular writing system, see here for a specific example. However, we already have plenty of software that can convert writing systems to IPA. Whilst all that does connect a lot of dots, it's not exactly what I think the goal should be.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts