[D] Is there a good ai model for image-to-text where the images are diagrams and screenshots of interfaces?

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

pix2struct

5 563 4.4 Python

Here are a few useful resources you could start with: [Pix2Struct by Google Research](https://github.com/google-research/pix2struct) might be a valuable tool, although it will most likely need some fine-tuning to fit your specifics. You can also find some fine-tuned models on HuggingFace by searching 'pix2struct'. Another option worth considering is [DonutI](https://github.com/clovaai/donut). Like Pix2Struct, fine-tuning likely needed to meet your requirements. Tesseract OCR is another alternative, particularly for handling text. It's primarily designed for pages of text, think books, but with some tweaking and specific flags, it can process tables as well as text chunks in regions of a screenshot. Bit too much tweaking for my taste. As I'm also in search of OCR tools for UI and chart screenshots, so share if you find something else.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
donut

19 5,447 2.1 Python

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Here are a few useful resources you could start with: [Pix2Struct by Google Research](https://github.com/google-research/pix2struct) might be a valuable tool, although it will most likely need some fine-tuning to fit your specifics. You can also find some fine-tuned models on HuggingFace by searching 'pix2struct'. Another option worth considering is [DonutI](https://github.com/clovaai/donut). Like Pix2Struct, fine-tuning likely needed to meet your requirements. Tesseract OCR is another alternative, particularly for handling text. It's primarily designed for pages of text, think books, but with some tweaking and specific flags, it can process tables as well as text chunks in regions of a screenshot. Bit too much tweaking for my taste. As I'm also in search of OCR tools for UI and chart screenshots, so share if you find something else.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Ask HN: Why are all OCR outputs so raw?

2 projects | news.ycombinator.com | 15 Nov 2023
New to ML, looking for some GPU and learning material info

1 project | /r/learnmachinelearning | 2 Aug 2023
How to Automate Document Extraction from Insurance Documents

1 project | /r/learnmachinelearning | 13 Jun 2023
Any way to convert my handwritten diary to searchable PDFs?

2 projects | /r/linuxquestions | 27 May 2023
Donut: OCR-Free Document Understanding Transformer

1 project | /r/patient_hackernews | 29 May 2023

[D] Is there a good ai model for image-to-text where the images are diagrams and screenshots of interfaces?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
hardware-buttons scrape-images linkedin-bot
Post date: 7 Jul 2023

pix2struct

Scout Monitoring

donut

Related posts

Ask HN: Why are all OCR outputs so raw?

New to ML, looking for some GPU and learning material info

How to Automate Document Extraction from Insurance Documents

Any way to convert my handwritten diary to searchable PDFs?

Donut: OCR-Free Document Understanding Transformer

[D] Is there a good ai model for image-to-text where the images are diagrams and screenshots of interfaces?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning hardware-buttons scrape-images linkedin-bot Post date: 7 Jul 2023

pix2struct

Scout Monitoring

donut

Related posts

Ask HN: Why are all OCR outputs so raw?

New to ML, looking for some GPU and learning material info

How to Automate Document Extraction from Insurance Documents

Any way to convert my handwritten diary to searchable PDFs?

Donut: OCR-Free Document Understanding Transformer

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
hardware-buttons scrape-images linkedin-bot
Post date: 7 Jul 2023