go-duckdb VS db-benchmark

Compare go-duckdb vs db-benchmark and see what are their differences.

go-duckdb

go-duckdb provides a database/sql driver for the DuckDB database engine. (by marcboeker)

db-benchmark

reproducible benchmark of database-like ops (by h2oai)
Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
go-duckdb db-benchmark
4 91
488 319
- 0.9%
8.2 0.0
19 days ago 10 months ago
Go R
MIT License Mozilla Public License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

go-duckdb

Posts with mentions or reviews of go-duckdb. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-05-27.
  • Embeddable Database for Go which have Date/Time type
    1 project | /r/golang | 28 Nov 2022
    DuckDB also has date functions and Go bindings
  • Range Joins in DuckDB
    2 projects | news.ycombinator.com | 27 May 2022
    I've been beating my head trying to get duckdb to statically link into a Go program (I'm neither an expert with cgo nor ld). If anyone else has been able to do this I'd love to see your build steps.

    https://github.com/marcboeker/go-duckdb produces a non-static binary by default.

  • Friendlier SQL with DuckDB
    8 projects | news.ycombinator.com | 12 May 2022
    Here is a solved Github Issue related to CGO for the Go bindings! If you have another issue, please feel free to post it on their Github page!

    https://github.com/marcboeker/go-duckdb/issues/4

  • Dsq: Commandline tool for running SQL queries against JSON, CSV, Parquet, etc.
    5 projects | news.ycombinator.com | 11 Jan 2022
    Yeah frankly the q benchmark isn't the best even though dsq compares favorably in it. It isn't well documented and exercises a very limited amount of functionality and isn't very rigorous from what I can see. That said, the caching q does is likely very solid (and not something dsq does).

    The biggest risk I think with octosql (and cube2222 is here somewhere to disagree with me if I'm wrong) is that they have their own entire SQL engine whereas textql, q and dsq use SQLite. But q is also in Python whereas textql, octosql, and dsq are in Go.

    In the next few weeks I'll be posting some benchmarks that I hope are a little fairer (or at least well-documented and reproducible). Though of course it would be appropriate to have independent benchmarks too since I now have a dog in the fight.

    On a tangent, once the go-duckdb binding [0] matures I'd love to offer duckdb as an alternative engine flag within dsq (and DataStation). Would be neat to see.

    [0] https://github.com/marcboeker/go-duckdb

db-benchmark

Posts with mentions or reviews of db-benchmark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-08.
  • Database-Like Ops Benchmark
    1 project | news.ycombinator.com | 28 Jan 2024
  • Polars
    11 projects | news.ycombinator.com | 8 Jan 2024
    Real-world performance is complicated since data science covers a lot of use cases.

    If you're just reading a small CSV to do analysis on it, then there will be no human-perceptible difference between Polars and Pandas. If you're reading a larger CSV with 100k rows, there still won't be much of a perceptible difference.

    Per this (old) benchmark, there are differences once you get into 500MB+ territory: https://h2oai.github.io/db-benchmark/

  • DuckDB performance improvements with the latest release
    8 projects | news.ycombinator.com | 6 Nov 2023
    I do think it was important for duckdb to put out a new version of the results as the earlier version of that benchmark [1] went dormant with a very old version of duckdb with very bad performance, especially against polars.

    [1] https://h2oai.github.io/db-benchmark/

  • Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster
    16 projects | news.ycombinator.com | 7 Oct 2023
    https://news.ycombinator.com/item?id=33270638 :

    > Apache Ballista and Polars do Apache Arrow and SIMD.

    > The Polars homepage links to the "Database-like ops benchmark" of {Polars, data.table, DataFrames.jl, ClickHouse, cuDF, spark, (py)datatable, dplyr, pandas, dask, Arrow, DuckDB, Modin,} but not yet PostgresML? https://h2oai.github.io/db-benchmark/ *

    LLM -> Vector database: https://en.wikipedia.org/wiki/Vector_database

    /? inurl:awesome site:github.com "vector database"

  • Pandas vs. Julia โ€“ cheat sheet and comparison
    7 projects | news.ycombinator.com | 17 May 2023
    I agree with your conclusion but want to add that switching from Julia may not make sense either.

    According to these benchmarks: https://h2oai.github.io/db-benchmark/, DF.jl is the fastest library for some things, data.table for others, polars for others. Which is fastest depends on the query and whether it takes advantage of the features/properties of each.

    For what it's worth, data.table is my favourite to use and I believe it has the nicest ergonomics of the three I spoke about.

  • Any faster Python alternatives?
    6 projects | /r/learnprogramming | 12 Apr 2023
    Same. Numba does wonders for me in most scenarios. Yesterday I've discovered pola-rs and looks like I will add it to the stack. It's API is similar to pandas. Have a look at the benchmarks of cuDF, spark, dask, pandas compared to it: Benchmarks
  • Pandas 2.0 (with pyarrow) vs Pandas 1.3 - Performance comparison
    1 project | /r/datascience | 8 Apr 2023
    The syntax has similarities with dplyr in terms of the way you chain operations, and itโ€™s around an order of magnitude faster than pandas and dplyr (thereโ€™s a nice benchmark here). Itโ€™s also more memory-efficient and can handle larger-than-memory datasets via streaming if needed.
  • Pandas v2.0 Released
    5 projects | news.ycombinator.com | 3 Apr 2023
    If interested in benchmarks comparing different dataframe implementations, here is one:

    https://h2oai.github.io/db-benchmark/

  • Database-like ops benchmark
    1 project | /r/dataengineering | 16 Feb 2023
  • Python "programmers" when I show them how much faster their naive code runs when translated to C++ (this is a joke, I love python)
    2 projects | /r/ProgrammerHumor | 17 Jan 2023
    Bad examples. Both numpy and pandas are notoriously un-optimized packages, losing handily to pretty much all their competitors (R, Julia, kdb+, vaex, polars). See https://h2oai.github.io/db-benchmark/ for a partial comparison.

What are some alternatives?

When comparing go-duckdb and db-benchmark you can also consider the following projects:

dsq - Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.

polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust

textql - Execute SQL against structured text like CSV or TSV

arrow-datafusion - Apache DataFusion SQL Query Engine

roapi - Create full-fledged APIs for slowly moving datasets without writing a single line of code.

Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

better-sqlite3 - The fastest and simplest library for SQLite3 in Node.js. [Moved to: https://github.com/WiseLibs/better-sqlite3]

databend - ๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com

postgres_scanner

DataFramesMeta.jl - Metaprogramming tools for DataFrames

q - q - Run SQL directly on delimited files and multi-file sqlite databases

sktime - A unified framework for machine learning with time series