pdf-highlights
pypandoc
pdf-highlights | pypandoc | |
---|---|---|
1 | 5 | |
22 | 821 | |
- | - | |
10.0 | 6.8 | |
over 1 year ago | 22 days ago | |
Python | Python | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pdf-highlights
pypandoc
-
Web Scraping in Python – The Complete Guide
I recently used [0] Playwright for Python and [1] pypandoc to build a scraper that fetches a webpage and turns the content into sane markdown so that it can be passed into an AI coding chat [2].
They are both very gentle dependencies to add to a project. Both packages contain built in or scriptable methods to install their underlying platform-specific binary dependencies. This means you don't need to ask end users to use some complex, platform-specific package manager to install playwright and pandoc.
Playwright let's you scrape pages that rely on js. Pandoc is great at turning HTML into sensible markdown. Below is an excerpt of the openai pricing docs [3] that have been scraped to markdown [4] in this manner.
[0] https://playwright.dev/python/docs/intro
[1] https://github.com/JessicaTegner/pypandoc
[2] https://github.com/paul-gauthier/aider
[3] https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turb...
[4] https://gist.githubusercontent.com/paul-gauthier/95a1434a28d...
## GPT-4 and GPT-4 Turbo
- GitHub Accelerator: our first cohort and what's next
-
Converting multiple docx to multiple txt filed
Use Pypandoc
What are some alternatives?
kobuddy - Kobo database backup and parser: extracts notes, highlights, reading progress and more
taffy - A high performance rust-powered UI layout library
fpdf2 - Simple PDF generation for Python
sniffnet - Comfortably monitor your Internet traffic 🕵️♂️
remarks - Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG
formbricks - Open Source Survey Platform
KoHighlights - KOHighlights is a utility for viewing KOReader's highlights and/or export them to simple text, html, csv or markdown files.
nuxt - The Intuitive Vue Framework.
chatgpt-history-export-to-md - A script to effortlessly extract your entire ChatGPT data export from JSON files to nicely-formatted markdown files.
trpc - 🧙♀️ Move Fast and Break Nothing. End-to-end typesafe APIs made easy.
imdown - imdown (pronounce "I'm down") can be used to collect images from a directory tree and put them into a markdown file for markdown to compile to another format using [pandoc](https://pandoc.org/).
responsively-app - A modified web browser that helps in responsive web development. A web developer's must have dev-tool.