selectolax vs datasette-lite

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

selectolax		datasette-lite
	Project
6	Mentions	10
970	Stars	309
-	Growth	-
7.7	Activity	5.4
about 2 months ago	Latest Commit	about 1 month ago
Cython	Language	HTML
MIT License	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

selectolax

Posts with mentions or reviews of selectolax. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-09.

GitHub – GSA/code-gov: An informative repo for all Code.gov repos
12 projects | news.ycombinator.com | 9 Sep 2023

https://github.com/rushter/selectolax#simple-benchmark )
(Apache Nutch is a Java-based web crawler which supports e.g. CommonCrawl (which backs various foundational LLMs)) https://en.wikipedia.org/wiki/Apache_Nutch#Search_engines_bu... . But extruct extracts more types of metadata and data than Nutch AFAIU: https://github.com/scrapinghub/extruct )
datasette-graphql adds a GraphQL HTTP API to a SQLite database:
8 Most Popular Python HTML Web Scraping Packages with Benchmarks
4 projects | dev.to | 1 Feb 2023

selectolax
High performance code in Python
1 project | /r/Python | 8 Jul 2022
Web Scraping with Python: Everything you need to know to get started (2022)
1 project | /r/Python | 16 May 2022

try this... https://github.com/rushter/selectolax
The State of Web Scraping in 2021
9 projects | news.ycombinator.com | 11 Oct 2021

Lazyweb link: https://github.com/rushter/selectolax
although I don't follow the need to have what appears to be two completely separate HTML parsing C libraries as dependencies; seeing this in the readme for Modest gives me the shivers because lxml has _seen some shit_
> Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.
although its other dep seems much more cognizant about the HTML5 standard, for whatever that's worth: https://github.com/lexbor/lexbor#lexbor
---
> It looks like the author of the article just googled some libraries for each language and didn't research the topic
Heh, oh, new to the Internet, are you? :-D
Show HN: Fast HTML5 parser for Python with multiple backends
1 project | news.ycombinator.com | 22 Aug 2021

datasette-lite

Posts with mentions or reviews of datasette-lite. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-09.

Sqlime: Online SQLite Playground
5 projects | news.ycombinator.com | 9 Apr 2024

Also see: https://github.com/simonw/datasette-lite
Use SQL Without Databases
2 projects | news.ycombinator.com | 27 Dec 2023
GitHub – GSA/code-gov: An informative repo for all Code.gov repos
12 projects | news.ycombinator.com | 9 Sep 2023

https://github.com/simonw/datasette-lite :
> You can use this tool to open any SQLite database file that is hosted online and served with a `access-control-allow-origin: ` CORS header. Files served by GitHub Pages automatically include this header, as do database files that have been published online using `datasette publish`.*
> [...] You can paste in the "raw" URL to a file, but Datasette Lite also has a shortcut: if you paste in the URL to a page on GitHub or a Gist it will automatically convert it to the "raw" URL for you
> To load a Parquet file, pass a URL to `?parquet=`
> [...] https://lite.datasette.io/?parquet=https://github.com/Terada...*
There are various *-to-sqlite utilities that load data into a SQLite database for use with e.g. datasette. E.g. Pandas with `dtype_backend='arrow'` saves to Parquet.
datasette plugins are written in Python and/or JS w/ pluggy:
[SQLlite] Is there any online SQL editor I can host on my website? Maybe something in JS or php
4 projects | /r/SQL | 20 Dec 2022

Datasette Lite might be even better for this - you can construct URLs that link directly to examples: https://github.com/simonw/datasette-lite
SQLite WASM Official
9 projects | news.ycombinator.com | 28 Oct 2022

There are some amazing things for SQLite in the browser especially if you're looking for ways to host queryable data for cheap.
I have a hacked up POC experimental version of datasette-lite to be able to look at multi-GB databases at https://github.com/simonw/datasette-lite/pull/49. It uses a hacked up chunk'd lazyFile implementation from emscripten and others to grap pages from Cloudflare R2.
It's a test with california's unclaimed property records (https://www.sco.ca.gov/upd_download_property_records.html) of a 28GB searching up that guy who owns Twitter: https://datasette-lite-lab.mindflakes.com/index.html?url=htt...
I think there may be a space for super-large multi-GB files served from static storage being accessible from SQlite as well. Another one would be this full-text search of a 43GB SQLite database of Wikipedia's full text search: http://static.wiki/ . Hearing there's official support for this is awesome and I hope they also might add some provisions for those sticking with POSIX/Emscripten as well.
Hosting SQLite Databases on GitHub Pages
2 projects | news.ycombinator.com | 12 Oct 2022

I grafted the enhanced lazyFile implementation of this to datasette-lite relatively recently. Threw in a 18GB CSV from
https://www.sco.ca.gov/upd_download_property_records.html
into a FTS5 Sqlite Database which came out to about 28GB after processing:
POC, non-merging Draft PR for the hack:
https://github.com/simonw/datasette-lite/pull/49
You can run queries through it if you URL hack into it and just get to the query dialog, browsing is kind of a dud at the moment since datasette runs a count(*) which downloads everything.
Learn Postgres at the Playground
9 projects | news.ycombinator.com | 17 Aug 2022
A SQLite extension for reading large files line-by-line
8 projects | news.ycombinator.com | 30 Jul 2022

Oh wow! I wonder how hard it would be to load that module into https://github.com/simonw/datasette-lite
This Week in Python
5 projects | dev.to | 6 May 2022

datasette-lite – Datasette running in your browser using WebAssembly and Pyodide
Datasette Lite: a server-side Python web application running in a browser
5 projects | news.ycombinator.com | 4 May 2022

I have an open issue for that here: https://github.com/simonw/datasette-lite/issues/28
My initial hunch is that this will be really difficult - probably require a fork of something like https://github.com/coleifer/pysqlite3 then compiled for WebAssembly.
I'm confident it's feasible, but I don't have the skills to figure it out myself.

What are some alternatives?

When comparing selectolax and datasette-lite you can also consider the following projects:

lxml - The lxml XML toolkit for Python

pyscript - Try PyScript: https://pyscript.com Examples: https://tinyurl.com/pyscript-examples Community: https://discord.gg/HxvBtukrg2

lexbor - Lexbor is development of an open source HTML Renderer library. https://lexbor.com

sqlite-plus - The ultimate set of SQLite extensions

html5lib - Standards-compliant library for parsing and serializing HTML documents and fragments in Python

file-system-access - Expose the file system on the user’s device, so Web apps can interoperate with the user’s native applications.

pyppeteer - Headless chrome/chromium automation library (unofficial port of puppeteer)

datastation - App to easily query, script, and visualize data from every database, file, and API.

pyquery - A jquery-like library for python

pyodide - Pyodide is a Python distribution for the browser and Node.js based on WebAssembly

gazpacho - 🥫 The simple, fast, and modern web scraping library

mergestat-lite - Query git repositories with SQL. Generate reports, perform status checks, analyze codebases. 🔍 📊

selectolax vs lxml datasette-lite vs pyscript selectolax vs lexbor datasette-lite vs sqlite-plus selectolax vs html5lib datasette-lite vs file-system-access selectolax vs pyppeteer datasette-lite vs datastation selectolax vs pyquery datasette-lite vs pyodide selectolax vs gazpacho datasette-lite vs mergestat-lite

Compare selectolax vs datasette-lite and see what are their differences.

selectolax

datasette-lite

selectolax

datasette-lite

What are some alternatives?