madbg VS PaddleOCR

Compare madbg vs PaddleOCR and see what are their differences.

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices) (by PaddlePaddle)
Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Purpose built for real-time analytics at any scale.
InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
www.influxdata.com
featured
madbg PaddleOCR
4 65
246 42,471
- 2.7%
4.4 9.1
about 1 year ago 2 days ago
Python Python
MIT License Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

madbg

Posts with mentions or reviews of madbg. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-05-13.

PaddleOCR

Posts with mentions or reviews of PaddleOCR. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-08-09.
  • Show HN: LLM Aided OCR (Correcting Tesseract OCR Errors with LLMs)
    17 projects | news.ycombinator.com | 9 Aug 2024
    Was this by any chance Paddle OCR https://github.com/PaddlePaddle/PaddleOCR
  • OCR Solutions Uncovered: How to Choose the Best for Different Use Cases
    2 projects | dev.to | 1 Aug 2024
    Budget Constraints: For users with limited budgets, open-source options like Tesseract OCR or PaddleOCR provide good solutions that can be customized to meet specific business needs. Additionally, consider Klippa or API4AI OCR for affordable yet reliable OCR services that work out-of-the-box.
  • Ask HN: What are you using to parse PDFs for RAG?
    16 projects | news.ycombinator.com | 30 Jul 2024
  • PDF Hell and Practical RAG Applications
    5 projects | dev.to | 1 Jul 2024
    Paddle OCR
  • Ask HN: I have many PDFs – what is the best local way to leverage AI for search?
    10 projects | news.ycombinator.com | 30 May 2024
    If you want to run locally you can look into this https://github.com/PaddlePaddle/PaddleOCR

    https://andrejusb.blogspot.com/2024/03/optimizing-receipt-pr...

    But I suggest that you just skip that and use gpt-4o. They aren't actually going to steal your data.

    Sort through it to find anything with a credit card number or anything ahead time.

    Or you could look into InternVL..

    Or a combination of PaddleOCR first and then use a strong LLM via API, like gpt-4o or llama3 70b via together.ai

    If you truly must do it locally, then if you have two 3090s or 4090s it might work out. Otherwise it the LLMs may not be smart enough to give good results.

    Leaving out the details of your hardware makes it impossible to give good advice about running locally. Other than, it's not really necessary.

  • Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide
    5 projects | dev.to | 27 Dec 2023
    PyTesseract Module [ Github ] EasyOCR Module [ Github ] PaddlePaddle OCR [ Github ]
  • What is the best repo for hand written text recognition?
    1 project | /r/computervision | 11 Dec 2023
    My default recommendation for OCR is https://github.com/PaddlePaddle/PaddleOCR but most of the examples there are not handwritten - so I'm not sure how well it'll handle it this time.
  • Ask HN: Best way to perform complex OCR task in 2023?
    1 project | news.ycombinator.com | 5 Dec 2023
    Other than EasyOCR and Tesseract, PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR) is probably the most well known open-source OCR solution.

    What are you planning to do with the text after detecting / recognizing it? How fast does the detection / recognition need to be in order to be useful?

  • Show HN: BetterOCR combines and corrects multiple OCR engines with an LLM
    8 projects | news.ycombinator.com | 28 Oct 2023
    Yup! But I'm still exploring options. (any recommendations would be welcomed!) Here are some candidates I'm considering:

    - https://github.com/mindee/doctr

    - https://github.com/open-mmlab/mmocr

    - https://github.com/PaddlePaddle/PaddleOCR (honestly I don't know Mandarin so I'm a bit stuck)

    - https://github.com/clovaai/donut - While it's primarily an "OCR-free document understanding transformer," I think it's worth experimenting with. Think I can sort this out by letting the LLM reason through it multiple times (although this will impact performance)

    - yesterday got a suggestion to consider https://github.com/kakaobrain/pororo - I don't think development is still active but the results are pretty great on Korean text

  • How would you go about driving contextual data from images?
    3 projects | /r/LangChain | 4 Jul 2023
    For images with text, if you want to do visual qa, document classification, table/key information extraction, checkout https://huggingface.co/blog/document-ai https://github.com/philschmid/document-ai-transformers https://github.com/google-research/pix2struct https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/README.md

What are some alternatives?

When comparing madbg and PaddleOCR you can also consider the following projects:

scapy - Scapy: the Python-based interactive packet manipulation program & library.

EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

telminal - A Terminal in Telegram!

tesseract-ocr - Tesseract Open Source OCR Engine (main repository)

img2cmap - Create colormaps from images

mmocr - OpenMMLab Text Detection, Recognition and Understanding Toolbox

gdb-dashboard - Modular visual interface for GDB in Python

Tesseract.js - Pure Javascript OCR for more than 100 Languages 📖🎉🖥

ScoutSuite - Multi-Cloud Security Auditing Tool

OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

pymg - pymg is a CLI that can interpret Python files by the Python interpreter and display the error message in a more readable way if an exception occurs.

keras-ocr - A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Purpose built for real-time analytics at any scale.
InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
www.influxdata.com
featured