Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
langchain
Discontinued ⚡ Building applications with LLMs through composability ⚡ [Moved to: https://github.com/langchain-ai/langchain] (by hwchase17)
-
donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
This seems amazing. Has anyone here tried to actually use this stuff? I am earnestly trying to create a website that can code simple applications for users. Or at least highly intelligent technical users or programmers. In particular it would be great to have an alternative to relying 100% on OpenAI for code. Or if there is for example some model that can "see" web page layout and also output markup.. I may have to experiment with some of these like this one maybe https://github.com/microsoft/unilm/tree/master/markuplm
Langchain has been a good project in the area: https://github.com/hwchase17/langchain
Adept.ai is rumoured to have a model that understands screens and web pages, but they are still in pre-release stage.
You can try the DONUT model (https://arxiv.org/abs/2111.15664) that takes an image and directly decodes question answers without OCR. It can be fine-tuned.
Related posts
- Ask HN: Why are all OCR outputs so raw?
- New to ML, looking for some GPU and learning material info
- [D] Is there a good ai model for image-to-text where the images are diagrams and screenshots of interfaces?
- How to Automate Document Extraction from Insurance Documents
- Any way to convert my handwritten diary to searchable PDFs?