invoice2data
pyod
invoice2data | pyod | |
---|---|---|
2 | 7 | |
1,699 | 7,962 | |
1.6% | - | |
6.7 | 7.5 | |
8 days ago | 4 days ago | |
Python | Python | |
MIT License | BSD 2-clause "Simplified" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
invoice2data
-
Utilize OpenAI API to extract information from PDF files
Using regex: to match patterns in text after converting the PDF to plain text. Examples include invoice2data and traprange-invoice. However, this method requires knowledge of the format of the data fields.
-
Base64.ai β Extract text, data, photos and more from all types of docs
It's not really working. Tried 2 English PDF invoices. Normal format. One came back empty, the other only had the amount right.
I'm assuming they only trained on some specific documents (passport of country X, etc) and all others don't work.
If someone processes the same document all the time, then my invoice2data project may work better and is open source. It's based on Regx, rather than machine learning: https://github.com/invoice-x/invoice2data
pyod
-
A Comprehensive Guide for Building Rag-Based LLM Applications
This is a feature in many commercial products already, as well as open source libraries like PyOD. https://github.com/yzhao062/pyod
-
Analyze defects and errors in the created images
PyOD
-
Multivariate Outlier Detection in Python
Check out the algorithms and documentation in this toolkit. Itβll give you a list of methods to read up on to understand their mechanisms. https://github.com/yzhao062/pyod
- Pyod β A Comprehensive and Scalable Python Library for Outlier Detection
- Predictive Maintenance and Anomaly Detection Resources
-
[D] Unsupervised Outlier Detection - Advise Requested
The source code and documentaion of PyOD is the best survey about OOD. Besides, the normalized flow and VQVAE are also feasible.
- PyOD: ~50 anomaly detection algorithms in one framework.
What are some alternatives?
OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
tods - TODS: An Automated Time-series Outlier Detection System
EasyOCR - Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
isolation-forest - A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
DeepSpeech - DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
alibi-detect - Algorithms for outlier, adversarial and drift detection
silero-models - Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
pycaret - An open-source, low-code machine learning library in Python
gensim - Topic Modelling for Humans
anomalib - An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
orange - π :bar_chart: :bulb: Orange: Interactive data analysis
stumpy - STUMPY is a powerful and scalable Python library for modern time series analysis