seeV
surya
seeV | surya | |
---|---|---|
1 | 16 | |
26 | 16,789 | |
- | 6.8% | |
7.5 | 9.7 | |
9 months ago | 5 days ago | |
Swift | Python | |
- | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
seeV
-
A Picture Is Worth 170 Tokens: How Does GPT-4o Encode Images?
I also wrote a Swift CLI that wraps over the Vision framework: https://github.com/nexuist/seev
Text extraction is included (including the ability to specify custom words not found in the dictionary) but there are also utilities for face detection, classification, etc.
surya
-
Ask HN: What is the best method for turning a scanned book as a PDF into text?
I have tried a bunch of things. This is what worked best for me: Surya [0]. It can run fully local on your laptop. I also tried EasyOCR [1], which is also quite good. I haven't tried this myself, but I will look at Paddle [2] if the previous two don't float your boat.
All of these are OSS, and you don't need to pay a dime to anyone.
[0]: https://github.com/VikParuchuri/surya
[1]: https://github.com/JaidedAI/EasyOCR
[2]: https://github.com/PaddlePaddle/Paddle
-
Show HN: Kreuzberg – Modern async Python library for document text extraction
pypdfium2 is a great choice and a solid piece of software!
You might want to look into https://github.com/VikParuchuri/surya as an alternative to tesseract. Yes, it's associated with a commercial company, but as you long as you aren't a company with 5M in ARR or $5M in funding it's free to use.
-
Nvidia-Ingest: Multi-modal data extraction
Surya is a great open source toolkit for table parsing, layout analysis and OCR: https://github.com/VikParuchuri/surya
-
Ask HN: Who is hiring? (January 2025)
Datalab | NYC | Full-time | Software Engineer and Head of Business Ops | $250k-$350k + 1.5-3% equity | https://www.datalab.to
A significant % of useful data is locked away in tough-to-parse formats like PDFs. We build tools to extract it, like https://github.com/VikParuchuri/surya (15k Github stars), and https://github.com/VikParuchuri/marker (19k stars). We also run an inference API and product.
We do meaningful research (we’ve trained several SoTA models), ship product, and contribute to open source. We’re hiring for 2 roles to help us scale:
Senior fullstack software engineer
- work across our open source repos, inference api, and frontend product
-
Show HN: Lessons learned from a big OCR project
I’ve used Surya (https://github.com/VikParuchuri/surya) before. It is very good (on par with Google Vision, potentially better layout analysis), but yours is a challenging use case. I wonder if it would be useful.
-
Show HN: LLM Aided OCR (Correcting Tesseract OCR Errors with LLMs)
Hi, I'm the author of surya (https://github.com/VikParuchuri/surya) - working on improving speed and accuracy now. Happy to collaborate if you have specific page types it's not working on. For modern/clean documents it benchmarks very similarly to Google Cloud, but working on supporting older documents better now.
-
Decoding OCR: A Comprehensive Guide
For a deeper dive into Surya-OCR, an advanced OCR system, enthusiasts and developers can explore its extensive components on GitHub. This open-source project is readily accessible for those eager to understand its mechanics or contribute to its evolution. Visit Surya-OCR on GitHub to explore the documentation, source code, and more.
- From GPT-4 to AGI: Counting the OOMs
- Ask HN: How to OCR a PDF and preserve whitespace?
-
A Picture Is Worth 170 Tokens: How Does GPT-4o Encode Images?
checkout https://github.com/mindee/doctr or https://github.com/VikParuchuri/surya for something practical
multimodal llm would of course blow it all out the water, so some llama3-like model is probably SOTA in terms of what you can run yourself. something like https://huggingface.co/blog/idefics2
What are some alternatives?
magic - Scanner for decks of cards with bar codes printed on card edges
llama_cloud_services - Knowledge Agents and Management in the Cloud
unilm - Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
unstract - No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
LlamaChat - Chat with your favourite LLaMA models in a native macOS app
marker - Convert PDF to markdown + JSON quickly with high accuracy