R SQL

Open-source R projects categorized as SQL | Edit details

Top 3 R SQL Projects

  • GitHub repo tidyexplain

    🤹‍♀ Animations of tidyverse verbs using R, the tidyverse, and gganimate

    Project mention: All You Need To Know About Merging (Joining) Datasets in R | reddit.com/r/rstats | 2021-02-03
  • GitHub repo tidyquery

    Query R data frames with SQL

    Project mention: ClickHouse as an alternative to Elasticsearch for log storage and analysis | news.ycombinator.com | 2021-03-02

    > SQL is a perfect language for analytics.

    Slightly off topic, but I strongly agree with this statement and wonder why the languages used for a lot of data science work (R, Python) don't have such a strong focus on SQL.

    It might just be my brain, but SQL makes so much logical sense as a query language and, with small variances, is used to directly query so many databases.

    In R, why learn the data.tables (OK, speed) or dplyr paradigms, when SQL can be easily applied directly to dataframes? There are libraries to support this like sqldf[1], tidyquery[2] and duckdf[3] (author). And I'm sure the situation is similar in Python.

    This is not a post against great libraries like data.table and dplyr, which I do use from time to time. It's more of a question about why SQL is not more popular as the query language de jour for data science.

    [1] https://cran.r-project.org/web/packages/sqldf/index.html

    [2] https://github.com/ianmcook/tidyquery

    [3] https://github.com/phillc73/duckdf

  • GitHub repo duckdf

    🦆 SQL for R dataframes, with ducks

    Project mention: ClickHouse as an alternative to Elasticsearch for log storage and analysis | news.ycombinator.com | 2021-03-02

    Yeah, I agree sqldf is quite slow. Fair point.

    As you've seen, duckdb registers an "R data frame as a virtual table." I'm not sure what they mean by "yet" either.

    Of course it is possible to write an R dataframe to an on-disk duckdb table, if that's what you want to do.

    There are some simple benchmarks on the bottom of the duckdf README[1]. Essentially I found for basic SQL SELECT queries, dplyr is quicker, but for much more complex queries, the duckdf/duckdb combination performs better.

    If you really want speed of course, just use data.table.

    [1] https://github.com/phillc73/duckdf

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-03-02.

Index

What are some of the best open-source SQL projects in R? This list will help you:

Project Stars
1 tidyexplain 592
2 tidyquery 144
3 duckdf 21