The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Selectolax Alternatives
Similar projects and alternatives to selectolax
-
Playwright
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
html5lib
Standards-compliant library for parsing and serializing HTML documents and fragments in Python
-
utls
Fork of the Go standard TLS library, providing low-level access to the ClientHello for mimicry purposes.
-
bleach
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
selectolax reviews and mentions
-
GitHub – GSA/code-gov: An informative repo for all Code.gov repos
https://github.com/rushter/selectolax#simple-benchmark )
(Apache Nutch is a Java-based web crawler which supports e.g. CommonCrawl (which backs various foundational LLMs)) https://en.wikipedia.org/wiki/Apache_Nutch#Search_engines_bu... . But extruct extracts more types of metadata and data than Nutch AFAIU: https://github.com/scrapinghub/extruct )
datasette-graphql adds a GraphQL HTTP API to a SQLite database:
-
8 Most Popular Python HTML Web Scraping Packages with Benchmarks
selectolax
- High performance code in Python
-
Web Scraping with Python: Everything you need to know to get started (2022)
try this... https://github.com/rushter/selectolax
-
The State of Web Scraping in 2021
Lazyweb link: https://github.com/rushter/selectolax
although I don't follow the need to have what appears to be two completely separate HTML parsing C libraries as dependencies; seeing this in the readme for Modest gives me the shivers because lxml has _seen some shit_
> Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.
although its other dep seems much more cognizant about the HTML5 standard, for whatever that's worth: https://github.com/lexbor/lexbor#lexbor
---
> It looks like the author of the article just googled some libraries for each language and didn't research the topic
Heh, oh, new to the Internet, are you? :-D
- Show HN: Fast HTML5 parser for Python with multiple backends
-
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024
Stats
rushter/selectolax is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of selectolax is Cython.
Sponsored