Readability4J
Crate
Readability4J | Crate | |
---|---|---|
3 | 6 | |
135 | 3,957 | |
- | 0.5% | |
4.3 | 9.9 | |
over 2 years ago | 5 days ago | |
HTML | Java | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Readability4J
-
Creating an advanced search engine with PostgreSQL
Depending upon the type of content, one might want to look into using the Readability (Browder's reader view) to parse the webpage. It will give you all the useful info without the junk. Then you can put it in the DB as needed.
https://github.com/mozilla/readability
Btw, readability, is also available in few other languages like Kotlin:
https://github.com/dankito/Readability4J
-
How does Firefox's Reader View work?
My Hacker News client HACK for iOS and Android has a reader mode ability browser. While on iOS, I was able to use the reader mode feature provided by SFSafariViewController, that wasn't available on android.
So I had to read a ton about this. I ended up using a heavily modified Kotlin version of Readability:
https://github.com/dankito/Readability4J
https://play.google.com/store/apps/details?id=com.pranapps.h...
https://apps.apple.com/us/app/id1464477788
-
Show HN: Instantly Listen to Any URL
Not sure about OP but I just implemented this in my Hacker News android client (thanks for the idea OP).
This is how I implemented it. I had already achieved article to "reader mode" by heavily customizing the Kotlin port of Mozilla‘s Readability:
https://github.com/dankito/Readability4J
Then I pass the text via Android's TextToSpeech library and it works very well:
fun trySpeaking(str:String){
Crate
- FLaNK AI - 01 April 2024
-
Creating an advanced search engine with PostgreSQL
I'm wondering if CrateDB [https://github.com/crate/crate] could fit your use case.
It's a relational SQL database which aims for compatibility with PostgreSQL. Internally it uses Lucene as a storage and such can offer fulltext functionality which is exposed via MATCH.
-
Distributed query execution in CrateDB: What you need to know
A logical execution plan does not take into account the information about data distribution. CrateDB is a distributed database and data is sharded: a table can be split into many parts - so-called shards. Shards can be independently replicated and moved from one node to another. The number of shards a table can have is specified at the time the table is created.
- Parser generators vs. handwritten parsers: surveying major languages in 2021
-
Querying time series data with SQL: examples
PD: If you liked this post... We'd really appreciate a ⭐️ in Github!
-
What is CrateDB? 🤔 FAQ
But there's nothing better than trying things by yourself... So Download CrateDB, experiment, and tell us what you think! 😁
What are some alternatives?
go-readability - Go package that cleans a HTML page for better readability.
Presto - The official home of the Presto distributed SQL query engine for big data
article-extractor - To extract main article from given URL with Node.js
OrientDB - OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
Just-Read - A customizable read mode web extension.
MapDB - MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
percollate - A command-line tool to turn web pages into readable PDF, EPUB, HTML, or Markdown docs.
jOOQ - jOOQ is the best way to write SQL in Java
web-clipper - For Notion,OneNote,Bear,Yuque,Joplin。Clip anything to anywhere
Flyway - Flyway by Redgate • Database Migrations Made Easy.
unclutter - A modern reader mode and article library for your browser.
sql2o - sql2o is a small library, which makes it easy to convert the result of your sql-statements into objects. No resultset hacking required. Kind of like an orm, but without the sql-generation capabilities. Supports named parameters.