sqlite-utils VS simdjson

Compare sqlite-utils vs simdjson and see what are their differences.

simdjson

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks (by simdjson)
Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
sqlite-utils simdjson
35 65
1,510 18,386
- 1.3%
8.1 9.2
20 days ago 3 days ago
Python C++
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

sqlite-utils

Posts with mentions or reviews of sqlite-utils. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-19.
  • Ask HN: High quality Python scripts or small libraries to learn from
    12 projects | news.ycombinator.com | 19 Apr 2024
    https://github.com/simonw/sqlite-utils

    So, his code might not be a good place to find best patterns (for ex, I don't think they are fully typed), but his repos are very pragmatic, and his development process is super insightful (well documented PRs for personal repos!). Best part, he blogs about every non-trivial update, so you get all the context!

  • Why you should probably be using SQLite
    8 projects | news.ycombinator.com | 27 Oct 2023
    Sounds like your problem is with SQLAlchemy, not with SQLite.

    My https://sqlite-utils.datasette.io library might be a better fit for you. It's a much thinner abstraction than SQLAlchemy.

  • Welcome to Datasette Cloud
    6 projects | news.ycombinator.com | 20 Aug 2023
    There are a few things you can do here.

    SQLite is great at JSON - so I often dump JSON structures in a TEXT column and query them using https://www.sqlite.org/json1.html

    I also have plugins for running jq() functions directly in SQL queries - https://datasette.io/plugins/datasette-jq and https://github.com/simonw/sqlite-utils-jq

    I've been trying to drive the cost of turning semi-structured data into structured SQL queries down as much as possible with https://sqlite-utils.datasette.io - see this tutorial for more: https://datasette.io/tutorials/clean-data

    This is also an area that I'm starting to explore with LLMs. I love the idea that you could take a bunch of messy data, tell Datasette Cloud "I want this imported into a table with this schema"... and it does that.

    I have a prototype of this working now, I hope to turn it into an open source plugin (and Datasette Cloud feature) pretty soon. It's using this trick: https://til.simonwillison.net/gpt3/openai-python-functions-d...

  • SQLite Functions for Working with JSON
    10 projects | news.ycombinator.com | 10 Aug 2023
    I've baked a ton of different SQLite tricks - including things like full-text indexing support and advanced alter table methods - into my sqlite-utils CLI tool and Python library: https://sqlite-utils.datasette.io

    My Datasette project provides tools for exploring, analyzing and publishing SQLite databases, plus ways to expose them via a JSON API: https://datasette.io

    I've also written a ton of stuff about SQLite on my two blogs:

    - https://simonwillison.net/tags/sqlite/

    - https://til.simonwillison.net/sqlite

  • Show HN: Trogon – An automatic TUI for command line apps
    11 projects | news.ycombinator.com | 21 May 2023
    This is really fun. I have an experimental branch of my sqlite-utils CLI tool (which has dozens of sub-commands) running with this now and it really did only take 4 lines of code - I'm treating Trogon as an optional dependency because people using my package as a Python library rather than a CLI tool may not want the extra installed components:

    https://github.com/simonw/sqlite-utils/commit/ec12b780d5dcd6...

    There's an animated GIF demo of the result here: https://github.com/simonw/sqlite-utils/issues/545#issuecomme...

  • I'm sure I'm being stupid.. Copying data from an API and making a database
    2 projects | /r/Database | 19 Jan 2023
    My project https://datasette.io/ is ideal for this kind of thing. You can use https://sqlite-utils.datasette.io/ to load JSON data into a SQLite database, then publish it with Datasette.
  • Just: A Command Runner
    27 projects | news.ycombinator.com | 9 Jan 2023
    I've been using this for about six months now and I absolutely love it.

    Make never stuck for me - I couldn't quite get it to fit inside my head.

    Just has the exact set of features I want.

    Here's one example of one of my Justfiles: https://github.com/simonw/sqlite-utils/blob/fc221f9b62ed8624... - documented here: https://sqlite-utils.datasette.io/en/stable/contributing.htm...

    I also wrote about using Just with Django in this TIL: https://til.simonwillison.net/django/just-with-django

  • Ask HN: What Do You Use for a Personal Database
    4 projects | news.ycombinator.com | 16 Nov 2022
    SQLite with the open source toolchain I've been building over the past five years:

    https://datasette.io as the interface for running queries against (and visualizing) my data.

    https://sqlite-utils.datasette.io/ as a set of tools for creating and modifying my databases (inserting JSON or CSV data, enabling full text search text)

    https://dogsheep.github.io as a suite of tools for importing my personal data - see also this talk I gave about that project: https://simonwillison.net/2020/Nov/14/personal-data-warehous...

  • The Perfect Commit
    1 project | /r/programming | 30 Oct 2022
    Here's an example: https://github.com/simonw/sqlite-utils/pull/468
    3 projects | news.ycombinator.com | 29 Oct 2022
    > After identifying about 7 commits (with pretty basic/useless messages, and no PR link!), I then had to find the corresponding PRs based on timestamps, and search the PR history for PRs merged around those timestamps.

    Not sure if this would save any time, but it is possible to search PRs by commit. For example, say git blame led me to this commit: https://github.com/simonw/sqlite-utils/commit/129141572f249e...

    I could have found PR #373 via this search: https://github.com/simonw/sqlite-utils/pulls?q=bb16f52681b6d...

    > I thus treat PRs as ephemeral

    I think I see what you're saying but as others have pointed out, sometimes you want to add screenshots etc to the context, and you can't capture this kind of info in commit messages. So then you have two choices: issues or PRs.

    > Then any review comments are preferably not addressed directly in the PR

    I would think that sometimes you really do want to have a back and forth conversation in the PR, rather than just a "make this change" -> "ok done" type of feedback loop.

    I view the PR as an decent place for all of this because it's basically a commit of commits, capturing the related changes/conversation/context all in a single place at the point of merge.

simdjson

Posts with mentions or reviews of simdjson. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-20.
  • Tips on adding JSON output to your command line utility. (2021)
    2 projects | news.ycombinator.com | 20 Apr 2024
    It's also supported by simdjson [0] (which has a lot of language bindings [1]):

    > Multithreaded processing of gigantic Newline-Delimited JSON (ndjson) and related formats at 3.5 GB/s

    [0] https://simdjson.org/

    [0] https://github.com/simdjson/simdjson?tab=readme-ov-file#bind...

  • 1BRC Merykitty's Magic SWAR: 8 Lines of Code Explained in 3k Words
    4 projects | news.ycombinator.com | 9 Mar 2024
  • Training great LLMs from ground zero in the wilderness as a startup
    3 projects | news.ycombinator.com | 6 Mar 2024
  • simdjson: Parsing Gigabytes of JSON per Second
    1 project | news.ycombinator.com | 23 Jan 2024
  • Use any web browser as GUI, with Zig in the back end and HTML5 in the front end
    17 projects | news.ycombinator.com | 1 Jan 2024
    String parsing is negligible compared to the speed of the DOM which is glacially slow: https://news.ycombinator.com/item?id=38835920

    Come on, people, make an effort to learn how insanely fast computers are, and how insanely inefficient our software is.

    String parsing can be done at gigabytes per second: https://github.com/simdjson/simdjson If you think that is the slowest operation in the browser, please find some resources that talk about what is actually happening in the browser?

  • Cray-1 performance vs. modern CPUs
    4 projects | news.ycombinator.com | 25 Dec 2023
    Thanks for all the detailed information! That answers a bunch of my questions and the implementation of strlen is nice.

    The instruction I was thinking of is pshufb. An example ‘weird’ use can be found for detecting white space in simdjson: https://github.com/simdjson/simdjson/blob/24b44309fb52c3e2c5...

    This works as follows:

    1. Observe that each ascii whitespace character ends with a different nibble.

    2. Make some vector of 16 bytes which has the white space character whose final nibble is the index of the byte, or some other character with a different final nibble from the byte (eg first element is space =0x20, next could be eg 0xff but not 0xf1 as that ends in the same nibble as index)

    3. For each block where you want to find white space, compute pcmpeqb(pshufb(whitespace, input), input). The rules of pshufb mean (a) non-ascii (ie bit 7 set) characters go to 0 so will compare false, (b) other characters are replaced with an element of whitespace according to their last nibble so will compare equal only if they are that whitespace character.

    I’m not sure how easy it would be to do such tricks with vgather.vv. In particular, the length of the input doesn’t matter (could be longer) but the length of white space must be 16 bytes. I’m not sure how the whole vlen stuff interacts with tricks like this where you (a) require certain fixed lengths and (b) may have different lengths for tables and input vectors. (and indeed there might just be better ways, eg you could imagine an operation with a 256-bit register where you permute some vector of bytes by sign-extending the nth bit of the 256-bit register into the result where the input byte is n).

  • Codebases to read
    5 projects | /r/cpp | 5 Dec 2023
    Additionally, if you like low level stuff, check out libfmt (https://github.com/fmtlib/fmt) - not a big project, not difficult to understand. Or something like simdjson (https://github.com/simdjson/simdjson).
  • Simdjson: Parsing Gigabytes of JSON per Second
    1 project | news.ycombinator.com | 30 Nov 2023
  • Building a high performance JSON parser
    19 projects | news.ycombinator.com | 5 Nov 2023
    Everything you said is totally reasonable. I'm a big fan of napkin math and theoretical upper bounds on performance.

    simdjson (https://github.com/simdjson/simdjson) claims to fully parse JSON on the order of 3 GB/sec. Which is faster than OP's Go whitespace parsing! These tests are running on different hardware so it's not apples-to-apples.

    The phrase "cannot go faster than this" is just begging for a "well ackshully". Which I hate to do. But the fact that there is an existence proof of Problem A running faster in C++ SIMD than OP's Probably B scalar Go is quite interesting and worth calling out imho. But I admit it doesn't change the rest of the post.

  • New package : lspce - a simple LSP Client for Emacs
    4 projects | /r/emacs | 30 Jun 2023
    I have same question as /u/JDRiverRun : how do you deal with JSON, do you parse json on Rust side or on Emacs side. I see that you are requiring json.el in your lspce.el, but I haven't looked through entire file carefully. If you parse on Rust side, do you use simdjson (there are at least two Rust bindings to it)? If yes, what are your impressions, experiences compared to more "standard" json library?

What are some alternatives?

When comparing sqlite-utils and simdjson you can also consider the following projects:

sqlmodel - SQL databases in Python, designed for simplicity, compatibility, and robustness.

RapidJSON - A fast JSON parser/generator for C++ with both SAX/DOM style API

sqliteviz - Instant offline SQL-powered data visualisation in your browser

jsoniter - jsoniter (json-iterator) is fast and flexible JSON parser available in Java and Go

ImportExcel - PowerShell module to import/export Excel spreadsheets, without Excel

json - JSON for Modern C++

octosql - OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

json-schema-validator - JSON schema validator for JSON for Modern C++

q - q - Run SQL directly on delimited files and multi-file sqlite databases

JsonCpp - A C++ library for interacting with JSON.

Scoop - A command-line installer for Windows.

json - A C++11 library for parsing and serializing JSON to and from a DOM container in memory.