|2 days ago||13 days ago|
|BSD 3-clause "New" or "Revised" License||MIT License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
QuestPDF 2021.10 - a new version of the open-source, MIT-licensed, C# library for generating PDF documents with fluent API, now with extended text capabilities. Please help me make it popular :)
8 projects | reddit.com/r/csharp | 6 Oct 2021
I’d recommend Weasyprint (.net core wrapper) instead of wkhtmltopdf. It supports CSS Paged Media which is pretty much required for everything but the simplest of HTML2PDF conversions.
Is there a way to publish a PDF report in a Jenkins job?
1 project | reddit.com/r/jenkinsci | 16 Sep 2021
you can convert html to pdf using https://github.com/Kozea/WeasyPrint
Beautiful PDFs from HTML
13 projects | news.ycombinator.com | 4 Apr 2021
Yeah, in the Python world there's WeasyPrint for PDF out in the wild as well. It's quite slick, but it's a harder sell because of Python, which corporate types seem to think is bad hacker central.
WeasyPrint - The awesome document factory
1 project | reddit.com/r/programming | 27 Mar 20211 project | reddit.com/r/coolgithubprojects | 27 Mar 20211 project | reddit.com/r/opensource | 27 Mar 2021
WeasyPrint – Convert web documents to PDF
1 project | news.ycombinator.com | 27 Mar 2021
wkhtmltopdf - Convert HTML to PDF
3 projects | reddit.com/r/commandline | 26 Mar 2021
Another free CLI tool to consider is WeasyPrint. (Github)
Need help with indexing pdf tables in python
2 projects | reddit.com/r/programminghelp | 12 Dec 2021
exporting handwritten dataset as text, export it and use it as a csv
3 projects | reddit.com/r/RemarkableTablet | 16 Sep 2021
Yeah, I’m pretty sure the Remarkable OCR is not up to these kinds of tasks unfortunately. If you know some coding you could write something that’d likely work well in Python using for ex. this for receiving the mail attachment and this for converting the PDF to CSV. This is in case you’d write your data as a table on the Remarkable, which I guess is preferable to writing something like (0.5, 8.4, -0.3). If you’d rather do it that way, there are other more suitable OCR tools like this one. The checkbox use-case in the comment above would also be possible by modifying this approach. DM if you’d like to discuss further work.
Camelot: PDF Table Extraction for Humans
1 project | news.ycombinator.com | 4 Aug 2021
Show HN: I made a tool to convert images of tables to CSV
4 projects | news.ycombinator.com | 9 Mar 2021
I've had success using camelot-py (https://camelot-py.readthedocs.io) to extract tabular data from PDFs (for images, I use imagemagick to convert those to PDF). If your table has borders the default method (lattice) works quite well. For non-bordered table there is the option to use 'stream' option but usually requires bit more preprocessing to get usable results.4 projects | news.ycombinator.com | 9 Mar 2021
Looks like it's a bit in-progress: https://github.com/camelot-dev/camelot/pull/209
"Update docs" isn't checked, and that's what I was going on.
What are some alternatives?
PyPDF2 - A utility to read and write PDFs with Python
PDFMiner - Python PDF Parser (Not actively maintained). Check out pdfminer.six.
pdftabextract - A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
pymorphy2 - Morphological analyzer / inflection engine for Russian and Ukrainian languages.
MathJax - Beautiful and accessible math in all browsers
borb - borb is a library for reading, creating and manipulating PDF files in python.
image-table-ocr - Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.