datasette
duckdb
Our great sponsors
datasette | duckdb | |
---|---|---|
187 | 52 | |
8,934 | 16,576 | |
- | 10.7% | |
9.3 | 10.0 | |
1 day ago | 5 days ago | |
Python | C++ | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
datasette
-
Ask HN: High quality Python scripts or small libraries to learn from
Simon Willison's github would be a great place to get started imo -
https://github.com/simonw/datasette
- Show HN: TextQuery – Query and Visualize Your CSV Data in Minutes
-
Little Data: How do we query personal data? (2013)
I'm a fan on simonw's datasette/dogsheep ecosystem https://datasette.io/
-
LaTeX and Neovim for technical note-taking
I use Anki the exact same way. After a lifetime of learning I have accepted that I will never read over anything I write for myself voluntarily - so my two options are:
1. Write an article so good I can publish it and look it over myself later on. I did this last year with https://andrew-quinn.me/fzf/, for example.
2. Create Anki cards out of the material. Use the builtin Card Browser or even https://datasette.io/ on the underlying SQLite database in a pinch to search for my notes any time I have to.
-
Daily Price Tracking for Trader Joes
Were you aware of, or tempted by https://datasette.io/ for creating your solution?
- SQLite-Web: Web-based SQLite database browser written in Python
-
Ask HN: What two software products should have a kid?
Browsing HN, GitHub and the like we get to see a huge variety of software products and code bases.
I often see products and think - if this product X, got together with Y, it would be pretty cool - kind of like if they had a kid together.
Not too literally, but more on the conceptual level - my level of programming is low.
E.g. Just some....
- pocketable.io & datasette (+with some more charting) [https://pocketbase.io, https://datasette.io]
-
Ask HN: Looking for a project to volunteer on? (February 2024)
You might like the Datasette project: https://datasette.io/
I don't think they are desperate for contributions but it's a welcoming environment and a fun project to hack on. You'll learn a lot just from reading the source and the incredibly informative PRs. The creator is a really talented developer with a great blog which shows up on the HN front page often.
-
Stuff I Learned during Hanukkah of Data 2023
Last year I worked through the challenges using VisiData, Datasette, and Pandas. I walked through my thought process and solutions in a series of posts.
-
What We Watched: A Netflix Engagement Report – About Netflix
> uploads of boring raw excel data and receive a nice UI
https://datasette.io/
duckdb
- 🪄 DuckDB sql hack : get things SORTED w/ constraint CHECK
- DuckDB: Move to push-based execution model (2021)
-
DuckDB performance improvements with the latest release
I'm not sure if the fix is reassuring or not: https://github.com/duckdb/duckdb/pull/9411/files
-
Building a Distributed Data Warehouse Without Data Lakes
It's an interesting question!
The problem is that the data is spread everywhere - no choice about that. So with that in mind, how do you query that data? Today, the idea is that you HAVE to put it into a central location. With tools like Bacalhau[1] and DuckDB [2], you no longer have to - a single query can be sharded amongst all your data - EFFECTIVELY giving you a lot of what you want from a data lake.
It's not a replacement, but if you can do a few of these items WITHOUT moving the data, you will be able to see really significant cost and time savings.
[1] https://github.com/bacalhau-project/bacalhau
[2] https://github.com/duckdb/duckdb
- DuckDB 0.9.0
-
Push or Pull, is this a question?
[4] Switch to Push-Based Execution Model by Mytherin · Pull Request #2393 · duckdb/duckdb (github.com)
-
Show HN: Hydra 1.0 – open-source column-oriented Postgres
it depends on your query obviously.
In general, I did very deep benchmarking of pg, clickhouse and duckdb, and I sure didn't make stupid mistakes like this: https://news.ycombinator.com/item?id=36990831
My dataset has 50B rows and 2tb of data, and I think columnar dbs are very overhiped and I chose pg because:
- pg performance is acceptable, maybe 2-3x times slower than clickhouse and duckdb on some queries if pg is configured correctly and run on compressed storage
- clickhouse and duckdb start falling apart very fast because they specialized on very narrow type of queries: https://github.com/ClickHouse/ClickHouse/issues/47520 https://github.com/ClickHouse/ClickHouse/issues/47521 https://github.com/duckdb/duckdb/discussions/6696
-
🦆 Effortless Data Quality w/duckdb on GitHub ♾️
This action installs duckdb with the version provided in input.
-
Using SQL inside Python pipelines with Duckdb, Glaredb (and others?)
Duckdb: https://github.com/duckdb/duckdb - seems pretty popular, been keeping an eye on this for close to a year now.
-
CSV or Parquet File Format
The Parquet-Go library is very complex, not yet success to use it. So I ask whether DuckDB can provide API https://github.com/duckdb/duckdb/issues/7776
What are some alternatives?
nocodb - 🔥 🔥 🔥 Open Source Airtable Alternative
ClickHouse - ClickHouse® is a free analytics DBMS for big data
sql.js-httpvfs - Hosting read-only SQLite databases on static file hosters like Github Pages
sqlite-worker - A simple, and persistent, SQLite database for Web and Workers.
litestream - Streaming replication for SQLite.
octosql - OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Sequel-Ace - MySQL/MariaDB database management for macOS
metabase-clickhouse-driver - ClickHouse database driver for the Metabase business intelligence front-end
beekeeper-studio - Modern and easy to use SQL client for MySQL, Postgres, SQLite, SQL Server, and more. Linux, MacOS, and Windows.
datafusion - Apache DataFusion SQL Query Engine
Redash - Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
LevelDB - LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.