pup
org-web-tools
pup | org-web-tools | |
---|---|---|
52 | 14 | |
8,000 | 617 | |
- | - | |
0.0 | 7.5 | |
about 1 month ago | 3 months ago | |
HTML | Emacs Lisp | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pup
-
script to download some notes
And lnk=$(curl -s https://www.selfstudys.com$url |grep "PDFFlip" | cut -d '"' -f 6) to lnk=$(curl -s https://www.selfstudys.com$url | pup "div#PDFF attr{source}" ) here pup will print content of source attribute from div tag with id PDFF i dont know that much about html & css so this is what i came up with. but i am sure you can also select class & make list of suburls from them. check out the video from bugswriter on pup or read docs from git hub for more info github link: https://github.com/ericchiang/pup
-
What monitoring tool do you use or recommend?
jq is pretty amazing. If you are comfortable with its jquery-like CSS selector syntax, then I should also mention a couple similar cli utilities that apply it to HTML: htmlp and pup.
-
Creating a data scraper as a beginner?
Regex is not a great tool for parsing web pages. Open up a browser dev tools window and select a bit of the page. Right click > copy... XPath expression or CSS selector. A proper web scraping tool will accept either of those. No muss, no fuss. You can even use simple command line tools: xpath or pup
- December 5, 2022: FLiP Stack Weekly
-
Show HN: A tool like jq, but for parsing HTML
This is HTML to JSON, written in Rust, and there's also pup[1] which I found out about just the other day on HN[2] which uses a very similar syntax (CSS selectors) but outputs HTML and is written in Go.
I can see room for both though it would interesting to have a more detailed comparison to go on (e.g. types of HTML, speed etc).
[1] https://github.com/ericchiang/pup
[2] https://news.ycombinator.com/item?id=33805732
- Pup: Parsing HTML at the command line
-
pup: Parsing HTML at the Command Line
It looks like the project became inactive for a bit and there are alternatives such as htmlq, etc. https://github.com/ericchiang/pup/issues/150
-
Converting field before delimiter to uppercase and how to replace with multiple newlines
Another tool worth mentioning is pup - it can produce JSON output which means you can pipe it to jq
org-web-tools
- org-web-tools: View, capture, and archive Web pages in Org-mode
-
Converting a web page to Org mode to include in my notes
There is also org-web-tools which use pandoc to convert html to org-mode. You can use pandoc also in scripts.
- Anybody here that isn’t a developer or has a degree in CS?
-
Introducing Captee, an app to wrap a link in Org Mode or Markdown from the macOS Share Menu
There is also another great alphapapa package org-web-tools for those who only want org-mode format, or don’t use MacOS.
-
Why not use Obsidian and/or Logseq instead of OrgRoam?
[org-web-tools] https://github.com/alphapapa/org-web-tools
- The sublime Joy of Emacs / Org Mode
-
How do you save / archive web pages for references in notes?
You can use https://github.com/alphapapa/org-web-tools it can save a web pages as an org files and has some extra cool functionality.
-
Is it possible to use org-mode as a filing cabinet too?
Not sure what you mean by "filing cabinet." Org does have file attachments. See also https://github.com/alphapapa/org-web-tools for archiving Web pages with Org.
-
How to Use org mode for Lecture Notes (CS and Engineering)
Also, not exactly related to your question, but you may find it useful: See https://github.com/alphapapa/org-web-tools, which also makes it easy to attach Web page archives.
-
Org capture in Nyxt: taking notes while browsing
Press c l to choose my commonplace-book link-capture template, which uses org-web-tools to insert an Org link with the Web page's title as the description:
What are some alternatives?
htmlq - Like jq, but for HTML.
organice - An implementation of Org mode without the dependency of Emacs - built for mobile and desktop browsers
xidel - Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
org-roam - Rudimentary Roam replica with Org-mode
gron - Make JSON greppable!
org-web - org-mode on the web, built with React, optimized for mobile, synced with Dropbox and Google Drive
yq - Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
org-cliplink - Insert org-mode links from clipboard
cascadia - Go cascadia package command line CSS selector
nyxt - Nyxt - the hacker's browser.
ddgr - :duck: DuckDuckGo from the terminal
org-noter - Emacs document annotator, using Org-mode