tabula
Cuba
Our great sponsors
tabula | Cuba | |
---|---|---|
11 | 5 | |
6,521 | 1,433 | |
1.1% | - | |
2.8 | 5.4 | |
21 days ago | 3 months ago | |
CSS | Ruby | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tabula
- Automatisches Auslesen von PDFs
- How To: Extract Table From Image In Python (OpenCV & OCR)
-
Ruby
Another option would be JRuby. I routinely use an application called Tabula, which is built using JRuby and compiles to a Jar file. This, of course, requires Java on the target machine, but you can ship the Jar file and it will work. It's often easier to rely on a working Java environment than it is a working Ruby environment. Especially on Windows.
- I am looking to automate a process at work...
-
Self Hosted Roundup #19
Idk if it has been suggested yet, tabulapdf is a self hosted solution to extract tables from PDF
- Alternative to tabula.technology
-
Text extraction from pdf, word and PPT
For table extraction from pdfs, have a look at Tabula and Camelot, two open-source projects. They work well with clean tables, both the Tabula Python binding and Camelot allow you to export directly as a pandas dataframe. Otherwise AWS Textract API is very efficient at extracting tables from pdfs, regardless of how clean/messy they are.
-
hello everyone someone can help me to resolve this problem please. i want to extract this unstructured data from pdf file to excel file
No idea if it will work for you, but there is a git project that seems to do what you want https://github.com/tabulapdf/tabula
- Why is the point of having so many implementation of Ruby?
-
Pdfsandwich
While trying to find a specific project I recalled, I encountered this list of projects which might be of interest: https://github.com/tstanislawek/awesome-document-understandi...
The project I had in mind was similar to this one but I can't remember the name currently: https://github.com/tabulapdf/tabula
However, if you're looking for a ML-based, invoice-specific project looks like the other comment to your reply might be more useful.
Cuba
-
Web Frameworks actively maintained in 2023?
Cuba (cuba.is)
-
What Would It Take for Roda to Win?
To anyone here who enjoys and values the things Roda is good at: I would recommend you also take a look at Cuba, the project Roda is forked from.
- Soveran/cuba: Rum based microframework for web development
-
16 Best Ruby Frameworks For Web Development
Cuba is a microframework to develop web applications in the Ruby language. Rum inspires Cuba, and the official website defines Cuba as “a tiny but powerful mapper for Rack applications.” making it one of the best ruby frameworks. The GitHub page is a practical guide if you are looking to start development in Cuba.
-
Tabula: Liberate Data From PDF Tables [jRuby]
Ties together a Cuba web app, the tabula-java library and lauch4j to provide a platform executable.
What are some alternatives?
Apache PDFBox - Mirror of Apache PDFBox
Ruby on Rails - Ruby on Rails
obsidian-notion-like-tables - Your premiere tool for creating and managing tabular data in Obsidian.md
Sinatra - Classy web-development dressed in a DSL (official / canonical repo)
awesome-english-ebooks - 经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新
Roda - Routing Tree Web Toolkit
ripgrep-all - rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Hanami - The web, with simplicity.
OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Syro - Simple router for web applications
laravel-report-generator - Rapidly Generate Simple Pdf, CSV, & Excel Report Package on Laravel
Ramaze - Ramaze is a simple, light and modular open-source web application framework written in Ruby.