Our great sponsors
-
bergamot-translator
Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
After that you would use Tesseract-OCR to OCR the pages. Tesseract is a open source multiplatform OCR software. If the typeface is something non standard you would have to train the recognition engine on your data.
Then you can run the OCR text through either a google, deepl, or one of the other commercial services to translate as a first pass. They all sell API access to the engines for bulk translation. Or you can use an open source engine like the new Bergamot Engine