duckdf VS tidyquery

Compare duckdf vs tidyquery and see what are their differences.

duckdf

🦆 SQL for R dataframes, with ducks (by phillc73)

tidyquery

Query R data frames with SQL (by ianmcook)
Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
duckdf tidyquery
3 2
41 167
- -
0.0 0.0
4 months ago over 1 year ago
R R
GNU General Public License v3.0 only Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

duckdf

Posts with mentions or reviews of duckdf. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-02-10.
  • DuckDB – in-process SQL OLAP database management system
    4 projects | news.ycombinator.com | 10 Feb 2023
    Quite a while ago, when duckdb was just a duckling, I wrote an R package that supported direct manipulation of R dataframes using SQL.[1] duckdb was the engine for this.

    The approach was never as fast as data.table but did approach the speed of dplyr for more complex queries.

    Life had other things in store for me and I haven’t touched this library for a while now.

    At the time there was no Julia connector for duckdb, but now that there is, I’d like to try this approach in that language.

    [1] https://github.com/phillc73/duckdf

  • ClickHouse as an alternative to Elasticsearch for log storage and analysis
    13 projects | news.ycombinator.com | 2 Mar 2021
    Yeah, I agree sqldf is quite slow. Fair point.

    As you've seen, duckdb registers an "R data frame as a virtual table." I'm not sure what they mean by "yet" either.

    Of course it is possible to write an R dataframe to an on-disk duckdb table, if that's what you want to do.

    There are some simple benchmarks on the bottom of the duckdf README[1]. Essentially I found for basic SQL SELECT queries, dplyr is quicker, but for much more complex queries, the duckdf/duckdb combination performs better.

    If you really want speed of course, just use data.table.

    [1] https://github.com/phillc73/duckdf

  • Julia 1.6: what has changed since Julia 1.0?
    9 projects | news.ycombinator.com | 14 Feb 2021
    That's a really good point that I'd not really thought about. I'd never really considered the difference between calling just functions versus macros.

    Thinking about Query.jl and DataFramesMeta.jl, and I am for sure not an expert in either, I can't specifically speak to your `head` example, but other base functions can be combined with macros. For example, see the LINQ examples from DataFramesMeta.jl[1] where `mean` is being used. Or again the LINQ style examples in Query.jl[2], where `descending` is used in the first example, or `length` later in the Grouping examples.

    Is that the kind of thing you meant?

    For whatever reason, with the way my brain is wired, the LINQ style of query just works for me. I have never directly used LINQ, but do have some SQL experience. In fact, I wrote some dinky little wrapper functions[3] around duckdb[4] so I could directly query R dataframes and datatables with SQL using that backend, rather than sqldf[5].

    [1] https://juliadata.github.io/DataFramesMeta.jl/stable/#@linq-...

    [2] https://www.queryverse.org/Query.jl/stable/linqquerycommands...

    [3] https://github.com/phillc73/duckdf

    [4] https://duckdb.org/

    [5] https://cran.r-project.org/web/packages/sqldf/index.html

tidyquery

Posts with mentions or reviews of tidyquery. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-03-02.
  • Can "dplyr" code automatically be converted to SQL code?
    1 project | /r/rstats | 15 Sep 2021
    tidyquery
  • ClickHouse as an alternative to Elasticsearch for log storage and analysis
    13 projects | news.ycombinator.com | 2 Mar 2021
    > SQL is a perfect language for analytics.

    Slightly off topic, but I strongly agree with this statement and wonder why the languages used for a lot of data science work (R, Python) don't have such a strong focus on SQL.

    It might just be my brain, but SQL makes so much logical sense as a query language and, with small variances, is used to directly query so many databases.

    In R, why learn the data.tables (OK, speed) or dplyr paradigms, when SQL can be easily applied directly to dataframes? There are libraries to support this like sqldf[1], tidyquery[2] and duckdf[3] (author). And I'm sure the situation is similar in Python.

    This is not a post against great libraries like data.table and dplyr, which I do use from time to time. It's more of a question about why SQL is not more popular as the query language de jour for data science.

    [1] https://cran.r-project.org/web/packages/sqldf/index.html

    [2] https://github.com/ianmcook/tidyquery

    [3] https://github.com/phillc73/duckdf

What are some alternatives?

When comparing duckdf and tidyquery you can also consider the following projects:

Typesense - Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

clickhousedb_fdw - PostgreSQL's Foreign Data Wrapper For ClickHouse

julia - The Julia Programming Language

meilisearch-js-plugins - The search client to use Meilisearch with InstantSearch.

loki - Like Prometheus, but for logs.

tidyquant - Bringing financial analysis to the tidyverse

Makie.jl - Interactive data visualizations and plotting in Julia

tidyverse - Easily install and load packages from the tidyverse

MeiliSearch - A lightning-fast search API that fits effortlessly into your apps, websites, and workflow

tidylog - Tidylog provides feedback about dplyr and tidyr operations. It provides wrapper functions for the most common functions, such as filter, mutate, select, and group_by, and provides detailed output for joins.