awesome-semantic-web
selectolax
awesome-semantic-web | selectolax | |
---|---|---|
5 | 6 | |
1,319 | 970 | |
1.2% | - | |
6.1 | 7.7 | |
12 days ago | about 2 months ago | |
Cython | ||
Creative Commons Zero v1.0 Universal | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
awesome-semantic-web
-
GitHub – GSA/code-gov: An informative repo for all Code.gov repos
https://github.com/semantalytics/awesome-semantic-web#csvw
A GitHub Action would run regularly, fetch each code.json, save each to a git repo, and then upsert each into a SQLite database to be published with e.g. datasette or datasette-lite.
- Super-Structured Data: Rethinking the Schema
-
Python Tools for the Semantic Web, an Overview
Have you taken a look at: https://github.com/semantalytics/awesome-semantic-web#python, it would be great to further this list along given it's breadth and age.
-
Looking for software
You might find some of what you need here https://github.com/semantalytics/awesome-semantic-web
-
A Review of the Semantic Web Field
https://github.com/semantalytics/awesome-semantic-web#progra...
Why are you spreading FUD?
selectolax
-
GitHub – GSA/code-gov: An informative repo for all Code.gov repos
https://github.com/rushter/selectolax#simple-benchmark )
(Apache Nutch is a Java-based web crawler which supports e.g. CommonCrawl (which backs various foundational LLMs)) https://en.wikipedia.org/wiki/Apache_Nutch#Search_engines_bu... . But extruct extracts more types of metadata and data than Nutch AFAIU: https://github.com/scrapinghub/extruct )
datasette-graphql adds a GraphQL HTTP API to a SQLite database:
-
8 Most Popular Python HTML Web Scraping Packages with Benchmarks
selectolax
- High performance code in Python
-
Web Scraping with Python: Everything you need to know to get started (2022)
try this... https://github.com/rushter/selectolax
-
The State of Web Scraping in 2021
Lazyweb link: https://github.com/rushter/selectolax
although I don't follow the need to have what appears to be two completely separate HTML parsing C libraries as dependencies; seeing this in the readme for Modest gives me the shivers because lxml has _seen some shit_
> Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.
although its other dep seems much more cognizant about the HTML5 standard, for whatever that's worth: https://github.com/lexbor/lexbor#lexbor
---
> It looks like the author of the article just googled some libraries for each language and didn't research the topic
Heh, oh, new to the Internet, are you? :-D
- Show HN: Fast HTML5 parser for Python with multiple backends
What are some alternatives?
clojure-graph-resources - A curated list of Clojure resources for dealing with graph-like data.
lxml - The lxml XML toolkit for Python
zed - A novel data lake based on super-structured data
lexbor - Lexbor is development of an open source HTML Renderer library. https://lexbor.com
EasierRDF - Making RDF easy enough for most developers
html5lib - Standards-compliant library for parsing and serializing HTML documents and fragments in Python
lv2 - The LV2 audio plugin specification
pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)
trifid - Lightweight Linked Data Server and Proxy
pyquery - A jquery-like library for python
awesome-knowledge-management - A curated list of amazingly awesome articles, people, applications, software libraries and projects related to the knowledge management space
gazpacho - 🥫 The simple, fast, and modern web scraping library