donut
openpilot
donut | openpilot | |
---|---|---|
19 | 839 | |
5,312 | 47,602 | |
2.0% | 0.7% | |
3.6 | 10.0 | |
6 months ago | 1 day ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
donut
-
Ask HN: Why are all OCR outputs so raw?
maybe this is better? https://github.com/clovaai/donut
I'm not sure
-
Show HN: BetterOCR combines and corrects multiple OCR engines with an LLM
Yup! But I'm still exploring options. (any recommendations would be welcomed!) Here are some candidates I'm considering:
- https://github.com/mindee/doctr
- https://github.com/open-mmlab/mmocr
- https://github.com/PaddlePaddle/PaddleOCR (honestly I don't know Mandarin so I'm a bit stuck)
- https://github.com/clovaai/donut - While it's primarily an "OCR-free document understanding transformer," I think it's worth experimenting with. Think I can sort this out by letting the LLM reason through it multiple times (although this will impact performance)
- yesterday got a suggestion to consider https://github.com/kakaobrain/pororo - I don't think development is still active but the results are pretty great on Korean text
-
New to ML, looking for some GPU and learning material info
I am also interested in experimenting with something like DONUT (https://github.com/clovaai/donut) but I have never seen anything on what the VRAM expectations are for something like this. Does anyone know also if there are any newer better models than this for document parsing as well? Or what the VRAM requirements for something like this tend to be?
-
[D] Is there a good ai model for image-to-text where the images are diagrams and screenshots of interfaces?
Here are a few useful resources you could start with: [Pix2Struct by Google Research](https://github.com/google-research/pix2struct) might be a valuable tool, although it will most likely need some fine-tuning to fit your specifics. You can also find some fine-tuned models on HuggingFace by searching 'pix2struct'. Another option worth considering is [DonutI](https://github.com/clovaai/donut). Like Pix2Struct, fine-tuning likely needed to meet your requirements. Tesseract OCR is another alternative, particularly for handling text. It's primarily designed for pages of text, think books, but with some tweaking and specific flags, it can process tables as well as text chunks in regions of a screenshot. Bit too much tweaking for my taste. As I'm also in search of OCR tools for UI and chart screenshots, so share if you find something else.
- How to Automate Document Extraction from Insurance Documents
- FLaNK Stack Weekly 29 may 2023
- Donut: OCR-Free Document Understanding Transformer
openpilot
-
Tinygrad: Hacked 4090 driver to enable P2P
Yes, but he spent several years in self-driving cars (https://comma.ai), which while interesting is also a space that a lot of players are in, so it's not the same as seeing him back to doing stuff that's a little more out there, especially as pertains to IP.
-
Imitation Learning
We have a product for sale: https://comma.ai
We raised $18.1M and have made $28M in lifetime revenue to date.
Where are you getting your narrative?
-
Driverless cars immune from traffic tickets in California under current laws
What about comma? https://comma.ai/ Seems like our old friend geohot built exactly what you want.
Positive HN discussion: https://news.ycombinator.com/item?id=36927971
-
No USS?
The issue was that the front camera on the windshield couldn’t see under the hood. You misunderstand how easy it is to solve for depth and distance with AI without requiring stereo cameras. Read https://github.com/commaai/openpilot
-
What car should I get for Seattle city and some ski/hike driving? Or not get a car at all?
Nice to have: I want to get a self-driving add-on that supports some cars better than others. Not a must but high up on my nice-to-have list.
- I need some help understanding video uploads.
-
I am nearing the end of my Kona 2020 lease, and I have an appointment at a dealer tomorrow had some questions about leasing an ioniq 6, hopefully someone can help me out.
EDIT: I probably should have added that I currently have the base model of the Kona the lowest model available, and I am looking for a similar thing in the ioniq 6, because my understanding is that it's fully compatible with the comma.ai device and therefore I am not planning on getting the better on board driving system, the Kona that I got unfortunately was not compatible with that device.
-
Tesla: Security Vulnerabilities
I wonder how bad this is compared to the competition. https://comma.ai allows you to add self-driving features to a large number of non-Tesla cars so, if we’re including physical firmware hacks as a threat vector, I’d bet tons of alternative cars (new enough Honda Odysseys, Toyota Siennas, etc: probably anything with adaptive cruise control and lane following) have the same sort of potential vulnerability.
- 2024 highlander has Toyota Security Key Now
-
Cruise co-founder and CEO Kyle Vogt resigns
Not sure, but from the first article from 4 years ago:
>Last month, we had 1,209 cars drive a little over 1,000,000 miles
Let's say they've had zero growth since then, so 48,000,000 conservatively?
Actually, from their website [1]:
>100+ million miles driven and 10k users.
[1]: https://comma.ai
What are some alternatives?
PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
sunnypilot - sunnypilot is a fork of comma.ai's openpilot, an open source driver assistance system. sunnypilot offers the user a unique driving experience for over 290 supported car makes and models with modified behaviors of driving assist engagements. sunnypilot complies with comma.ai's safety rules as accurately as possible.
image-to-sound-python- - A python project for converting an Image into audible sound using OCR and speech synthesis
opendbc - democratize access to car decoder rings
qlora - QLoRA: Efficient Finetuning of Quantized LLMs
carla - Open-source simulator for autonomous driving research.
CascadeTabNet - This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
dragonpilot - dragonpilot - 基於 openpilot 的開源駕駛輔助系統
Multi-Type-TD-TSR - Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
label-studio - Label Studio is a multi-type data labeling and annotation tool with standardized output format
deepdoctection - A Repo For Document AI
netron - Visualizer for neural network, deep learning and machine learning models