parser
toltec
parser | toltec | |
---|---|---|
12 | 66 | |
5,254 | 660 | |
1.8% | 2.6% | |
1.1 | 5.4 | |
6 months ago | 11 days ago | |
JavaScript | Shell | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
parser
-
Show HN: I made a tool to clean and convert any webpage to Markdown
Thoroughly scraping is challenging, especially in an environment where you donāt have (or want) a JavaScript runtime.
For content extraction, I found the approach the Postlight library takes quite neat. It scores individual html nodes based on some heuristics (text length, link density, css classes). It the selects the nodes with the highest score. [1] I ported it to Swift for a personal read later app.
[1] https://github.com/postlight/parser
-
Trouble Building Chrome Extension to Get News Article Content
I've been working on an enhanced reader mode extension for the last few months. I found that Mercury Reader's parser tool is useful for extracting content. If that's not exactly what you're looking for, readibility is another good option. It's a library used inside Firefox's reader moder that you can use in any project.
-
What Are The Coolest Virtual Machines You Currently Run 24/7?
I currently have it turned off while I search for better sources, but I have a VM that runs a custom cron script that combines a custom RSS reader, podfox, mercury-parser, and coqui-ai to generate audio podcasts from RSS news feeds. I should probably clean it up and release the script/setup process. With a few tweaks and some AI text-to-speech and a little machine learning audio processing you can get a really good podcast experience from text posts.
-
Extracting Text button no longer works
It looks like Relay could be updated to convert it locally though, since the parser that it uses appears to be open source.
-
Which are some open-source Chrome extensions you want to use on Firefox?
https://github.com/postlight/mercury-parser The only one I need, shit's too good
-
API for getting news fulltext
An alternative would be to extract the plain text from the article's page with either some "readability" API or a library like Mercury Parser: https://github.com/postlight/mercury-parser
-
How does Firefox's Reader View work?
I havenāt directly compared them, but I have also found mercury parser (https://github.com/postlight/mercury-parser) to be very reliable.
Since it turns a website into very plain (X)HTML itās fairly easy to use it to make a browsing proxy or automatically produce epub files for e-readers, which is what I do.
-
Build your self-hosted Evernote
Make sure that at the end of the process you have the node and npm executables installed - the http.webpage integration uses the Mercury Parser API to convert web pages to Markdown.
-
Reading from the web offline and distraction-free
Good luck! Those HTML issues you're coming across are tough and so varied across the web!
I was working with Mercury Parser (pluggable parsing for different sites) in the past.
https://github.com/postlight/mercury-parser
- The most underused browser feature
toltec
-
Notes on My Remarkable Tablet
3.x support will come to toltec, I've been blocked by stuff outside of my control a couple of times. Including things happening in my life that I won't get into.
You can see the current progress here: https://github.com/toltec-dev/toltec/issues/820
As for the comment on the kernel change, that was actually an ask by someone in the community: https://github.com/reMarkable/linux/issues/8
-
The ReMarkable Streaming Tool v2: Elevating Remote Work Efficiency
I love seeing work in this space! I made a collaborative whiteboard app for the reMarkable a while ago: https://github.com/fenollp/reMarkable-tools
It is packaged in the homebrew Toltec repo https://toltec-dev.org/
- What are you doing with community projects?
-
remarkable hacks
Remember to read the warning on Toltec home page:
-
Training room Remarkable
- https://toltec-dev.org/
- What operating system does the Remarkable 2 use?
- Is it just me or did the ebook reader function get ruined several updates back?
-
Remarkable 1 purchase
Do you ever plan to put your own tools and stuff on it? If so I would reccomened staying on 2.15 so you can use https://toltec-dev.org/. Also newest version 3 software forces infinite scroll and a lot of people absolutely hate it. I happily stay on 2.10. You can change versions as well, unofficially. Not sure if using the cloud still works with that, lots of us have cut that out entirely.
-
Neofetch, for ReMarkable
Definitely start by installing toltec if your device is on version <=2.15.1.1189, https://toltec-dev.org
-
Toltec for V3
Still waiting on ddvk-hacks and rm2fb. Only an updated rm2fb package is pending: https://github.com/toltec-dev/toltec/pull/656
What are some alternatives?
readability - A standalone version of the readability lib
remarkable-hacks - additional functionality via binary patching
hn-search - Hacker News Search
awesome-reMarkable - A curated list of projects related to the reMarkable tablet
Just-Read - A customizable read mode web extension.
remarkable2-framebuffer - remarkable2 framebuffer reversing
FParsec - A parser combinator library for F#
draft-reMarkable - A launcher for the reMarkable tablet, which wraps around the standard interface.
tidy-html5 - The granddaddy of HTML tools, with support for modern standards
koreader - An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
rdrview - Firefox Reader View as a command line tool
remarkable-keywriter