-
doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
-
ocrmac
A python wrapper to extract text from images on a mac system. Uses the vision framework from Apple.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
aichat
All-in-one AI CLI tool that integrates 20+ AI platforms, including OpenAI, Azure-OpenAI, Gemini, Claude, Mistral, Cohere, VertexAI, Bedrock, Ollama, Ernie, Qianwen, Deepseek...
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Tesseract is widely known to be "meh" at this point.
If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.
I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.
[0] - https://github.com/Unstructured-IO/unstructured-inference
[1] - https://github.com/mindee/doctr
[2] - https://github.com/mindee/doctr#models-architectures
[3] - https://github.com/Unstructured-IO/unstructured-inference#mo...
Nice post, OP! I was super impressed with the Apple's vision framework. I used it on a personal project involving the OCRing of tens of thousands of spreadsheet screenshots and ingesting them into a postgres database.
I used a combination of RHetTbull's vision.py (for the actual implementation) [1] + ocrmac (for experimentation) [2] and was pleasantly surprised by the performance on my i7 6700k hackintosh.
I wouldn't call myself a programmer but I can generally troubleshoot anything if given enough time, but it did cost time.
[1]: https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...
[2]: https://github.com/straussmaximilian/ocrmac
Tesseract is widely known to be "meh" at this point.
If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.
I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.
[0] - https://github.com/Unstructured-IO/unstructured-inference
[1] - https://github.com/mindee/doctr
[2] - https://github.com/mindee/doctr#models-architectures
[3] - https://github.com/Unstructured-IO/unstructured-inference#mo...
use LLMs (gpt-4-vision or LLaVA) with aichat
`aichat -f tmp/test.png -- output only text in the image`
https://github.com/sigoden/aichat
I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)