Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • roapi

    Create full-fledged APIs for slowly moving datasets without writing a single line of code.

  • py-spy

    Sampling profiler for Python programs

  • It is pretty cool. py-spy has also been doing this for a few years

    https://github.com/benfred/py-spy

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • blaze

    Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core. (by kwai)

  • Datafusion out performs spark by a large margin. It is on par with photon, see benchmark at https://github.com/blaze-init/blaze.

    Oh, okay... https://lnav.org is a log file viewer for the terminal that integrates with SQLite so you can use SQL to query your log files.

  • dsq

    Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

  • I am currently evaluating dsq and its partner desktop app DataStation. AIUI, the developer of DataStation realised that it would be useful to extract the underlying pieces into a standalone CLI, so they both support the same range of sources.

    dsq CLI - https://github.com/multiprocessio/dsq

  • dasel

    Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

  • octosql

    OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • xsv

    A fast CSV command line toolkit written in Rust.

  • zsv

    zsv+lib: world's fastest (simd) CSV parser, bare metal or wasm, with an extensible CLI for SQL querying, format conversion and more

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts