pdfcpu
rod
pdfcpu | rod | |
---|---|---|
30 | 20 | |
6,236 | 4,784 | |
1.6% | 2.3% | |
9.1 | 7.9 | |
8 days ago | 16 days ago | |
Go | Go | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pdfcpu
- Show HN: A PDF Processing CLI/API Written in Go
- Show HN
-
Making a PDF that's larger than Germany
Slightly tangential: if you are hacking on PDFs, manually or otherwise, this is an incredibly useful tool: https://pdfcpu.io/ (not the author, just a user)
-
Stirling-PDF: local web application to perform various operations on PDFs
A really nice, stand-alone command line tool is pdfcpu.
https://github.com/pdfcpu/pdfcpu
-
pdfcpu v0.6.0 out! - pdfcpu.io
Check it out => https://github.com/pdfcpu/pdfcpu/releases/tag/v0.6.0
-
Marker: Convert PDF to Markdown quickly with high accuracy
I can report that the closest I've came before is with PDFMiner (https://pypi.org/project/pdfminer/) for Python. The benefit of this one is that it retains styling information, so that italics and the like can be retained, at least with some post-processing (I think one might need to convert certain CSS-classes to actual or tags).
The other option I have started looking into is the PDFCPU library for Go. It is a bit more low-level than PDFMiner, but one gets out very well structured info, that seem it might be possible to post-process quite well, for one's particular use case and PDF layouts: https://github.com/pdfcpu/pdfcpu
I also now tried the Marker tool in the OT, and it seems to do a reasonable job. It did intermingle some columns though, at least in some tricky cases such as when there were a round shaped image in between the two columns. One note is that Marker doesn't seem to retain styling like italics though.
-
PDFcpu snippet for read text of PDF file?
Of course, the best way would be to solve it via the API without CLI. But this doesn't seem to work. https://github.com/pdfcpu/pdfcpu/issues/122
- wie splittet ihr denn PDFs - ich hab hier einige - die ich zerlegen muss in Teile
- Do you know any library to make pdf in golang?
- Pdfcpu: A Go PDF Processor
rod
-
Need help authenticating to Okta programatically.
I have tried the following. 1. Login to Okta via browser programatically using go-rod. Which I managed to do so successfully, but I'm failing to load up Slack as it's stuck in the browser loader screen for Slack. 2. I tried to authenticate via Okta RESTful API. So far, I have managed to authenticate using {{domain}}/api/v1/authn, and then subsequently using MFA via the verify endpoint {{domain}}/api/v1/authn/factors/{{factorID}}/verify which returns me a sessionToken. From here, I can successfully create a sessionCookie which have proven quite useless to me. Perhaps I am doing it wrongly.
- Library to convert HTML to pdf in Golang
- Web scraping with Go
- Best option for browser automation
-
I’m messed up with Go libraries
I usually find libraries by googling them or searching awesome go on GitHub, for selenium/puppeteer I've always found go-rod useful and easy in every way, Sometimes I also Google "X in Nodejs for Golang"
-
Go for web scraping
I recently tried out https://github.com/go-rod/rod. I think it's based on chromedp (so, Chrome dev tools and headless browser) but it also has code to download and run a supported version of Chrome that doesn't interfere with your local browser.
- Reducir tiempo de Web Scraping con concurrencia - GO
-
VHS: CLI Home Video Recorder
One of the dependencies is `rod`[0], which is a web scraping/automation library, and I believe requires a browser to work. I don't know what they're using it for though as I haven't looked at the code (and I'm not familiar with Go anyways).
0: https://github.com/go-rod/rod
-
Thoughts on Go headless browser tools for testing and scraping?
I don't have personal experience, but https://github.com/go-rod/rod is far more active than chromedp
- Project with a Web scraper GO binary
What are some alternatives?
gopdf - A simple library for generating PDF written in Go lang
playwright-go - Playwright for Go a browser automation library to control Chromium, Firefox and WebKit with a single API.
go-wkhtmltopdf - Go bindings for wkhtmltopdf and high-level HTML to PDF conversion interface
chromedp - A faster, simpler way to drive browsers supporting the Chrome DevTools Protocol.
qpdf - QPDF: A content-preserving PDF document transformer
colly - Elegant Scraper and Crawler Framework for Golang
merge2pdf - Merge Image and PDF files (optionally with selective pages) with lossless quality
WebDumper - A tool for scraping, dumping and unpacking (webpacked) javascript source files.
markpdf - Watermark PDF files using image or text
realize - Realize is the #1 Golang Task Runner which enhance your workflow by automating the most common tasks and using the best performing Golang live reloading.
ngrok - Unified ingress for developers
gotests - Automatically generate Go test boilerplate from your source code.