Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I used to keep track of the state of machine translation some years back.
I think the way you measure the success of an automated translation is edit distance, i.e. how many manual edits you need to make to a translated text before you reach some acceptable state. I suppose it's somewhat subjective, but it is possible to construct a benchmark and allow for multiple correct results.
The best resources I knew back then were:
VISL's CG-3 self-reported a competitively low edit distance compared to Google Translate: https://visl.sdu.dk/constraint_grammar.html -- the abstraction unfortunately requires a rather deep knowledge of any one particular language's grammar. It is a convincing argument that in order to beat Google Translate, you want less fuzzy machine learning and more structural analysis. But you also need a PhD in computational linguistics and deep knowledge of each language.
Apertium has an open-source pipeline: https://apertium.org/ -- seems to be much more like an open-source approach with a quality similar to Google Translate (although I don't know if it's better or worse; probably slightly worse in most cases, and with a slightly lower coverage).