Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
Nutrient – The #1 PDF SDK Library, trusted by 10K+ developers
Other PDF SDKs promise a lot - then break. Laggy scrolling, poor mobile UX, tons of bugs, and lack of support cost you endless frustrations. Nutrient’s SDK handles billion-page workloads - so you don’t have to debug PDFs. Used by ~1 billion end users in more than 150 different countries.
www.nutrient.io
featured
  1. roapi

    Create full-fledged APIs for slowly moving datasets without writing a single line of code.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. py-spy

    Sampling profiler for Python programs

    It is pretty cool. py-spy has also been doing this for a few years

    https://github.com/benfred/py-spy

  4. blaze

    Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core. (by kwai)

    Datafusion out performs spark by a large margin. It is on par with photon, see benchmark at https://github.com/blaze-init/blaze.

  5. dsq

    Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

    I am currently evaluating dsq and its partner desktop app DataStation. AIUI, the developer of DataStation realised that it would be useful to extract the underlying pieces into a standalone CLI, so they both support the same range of sources.

    dsq CLI - https://github.com/multiprocessio/dsq

  6. dasel

    Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.

  7. octosql

    OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

  8. Nutrient

    Nutrient – The #1 PDF SDK Library, trusted by 10K+ developers. Other PDF SDKs promise a lot - then break. Laggy scrolling, poor mobile UX, tons of bugs, and lack of support cost you endless frustrations. Nutrient’s SDK handles billion-page workloads - so you don’t have to debug PDFs. Used by ~1 billion end users in more than 150 different countries.

    Nutrient logo
  9. xsv

    A fast CSV command line toolkit written in Rust.

  10. zsv

    zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • How to use SQL to directly query files

    5 projects | dev.to | 2 Feb 2022
  • Show HN: I built jq-like scriptable tool to query CSV and JSON with SQLite

    3 projects | news.ycombinator.com | 24 Feb 2024
  • Analyzing multi-gigabyte JSON files locally

    14 projects | news.ycombinator.com | 18 Mar 2023
  • Tool to interact with CSV

    9 projects | /r/commandline | 27 Feb 2023
  • Yq is a portable yq: command-line YAML, JSON, XML, CSV and properties processor

    11 projects | news.ycombinator.com | 4 Feb 2023

Did you know that Rust is
the 5th most popular programming language
based on number of references?