pdfplumber
sioyek
pdfplumber | sioyek | |
---|---|---|
29 | 88 | |
5,603 | 5,859 | |
- | - | |
8.2 | 5.5 | |
16 days ago | 6 days ago | |
Python | C | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pdfplumber
- Running OCR against PDFs and images directly in the browser
-
Google Scholar PDF Reader
- [pdfplumber](https://github.com/jsvine/pdfplumber)
- Parsing dates with PDFminer
-
How to Extract Data from Tables in a Public Record PDF
I recently published a story that was based on some data analysis I did of a report I obtained from the Department of Behavioral Health and Developmental Services in VA. I wanted to share a quick walkthrough of how I extracted the data from tables in a PDF using a Python module called PDFplumber. I also uploaded a video to Youtube if you prefer that.
-
Code to extract text from pdf to excel
I've been working with pdfplumber, which is built atop pdfminer.six. It allows one to break the page up into sections and extract text from them in turn, which may help keep columns separated better.
-
I need to parse unstructured tables from a pdf into a json, what can I do
You could try pdfplumber
-
Advanced PDF to Excel with documents and example code
I'm not sure if there is a way to reliably detect bold characters: https://github.com/jsvine/pdfplumber/issues/724
-
how do I automate extracting data from two pdfs and input into an excel sheet according to an order number
pdfplumber is also pretty good. It can help segment text a bit better than pdfminer can alone.
-
Extracting particular things from pdf program?
To handle machine generated one, a possible package is pdfplumber.
- Convert PDF to text for parsing
sioyek
-
Google Scholar PDF Reader
Sioyek is a PDF viewer designed exactly for reading research papers and textbooks: https://github.com/ahrm/sioyek.
- ArXiv now offers papers in HTML format
-
Ask HN: What apps have you created for your own use?
Sioyek: a PDF viewer optimized for reading research papers and textbooks. https://github.com/ahrm/sioyek
It has a lot of niche features, but my favorite is the ability to preview or jump to references even when they are not linked in the PDF file.
- Sioyek is a PDF viewer with a focus on textbooks and research papers
-
SumatraPDF Reader
I implore all developers of PDF readers to implement sioyek's overview feature[0]. When you hover on a cross-referenced entry, it opens a little preview window with the contents of the reference. It is an absolute game-changer for reading textbooks and technical papers; I cannot overstate its utility.
[0] https://github.com/ahrm/sioyek#overview
- Vimtex: sioyek is not executable. Any idea how to solve this. I'm on wsl2 Ubuntu.
-
Sioyek PDF Viewer on Asahi Linux
Has anyone been able to run the sioyek PDF viewer (https://github.com/ahrm/sioyek) on Asahi? I've tinkered around a little, but the max I've gotten to is a black window with the executable complaining about an inability to compile certain shaders. Would this be due to the current OpenGL in mesa-asahi-edge (and thus can't be solved until we get more recent OpenGL version support) or is there some way to finagle around this?
-
PDF Viewer that Compiles LaTeX Notes?
Maybe chat to the dev of https://github.com/ahrm/sioyek about adding this as a feature. It's probably the closest pdf viewer I can think of that might do something like this in the future.
- What software would you like to see ported?
- Good PDF reader/annotation for research
What are some alternatives?
PDFMiner - Python PDF Parser (Not actively maintained). Check out pdfminer.six.
zathura - Document viewer
PyPDF2 - A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
zotero - Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.
OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
sumatrapdf - SumatraPDF reader
pdfminer.six - Community maintained fork of pdfminer - we fathom PDF
org-ref - org-mode modules for citations, cross-references, bibliographies in org-mode and useful bibtex tools to go with it.
py-pdf-parser - A Python tool to help extracting information from structured PDFs.
libharu - libharu - free PDF library
PyMuPDF - PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
clawPDF - Open Source Virtual (Network) Printer for Windows that allows you to create PDFs, OCR text, and print images, with advanced features usually available only in enterprise solutions.