Top 3 R SQL Projects
🤹♀ Animations of tidyverse verbs using R, the tidyverse, and gganimateProject mention: All You Need To Know About Merging (Joining) Datasets in R | reddit.com/r/rstats | 2021-02-03
Query R data frames with SQLProject mention: ClickHouse as an alternative to Elasticsearch for log storage and analysis | news.ycombinator.com | 2021-03-02
> SQL is a perfect language for analytics.
Slightly off topic, but I strongly agree with this statement and wonder why the languages used for a lot of data science work (R, Python) don't have such a strong focus on SQL.
It might just be my brain, but SQL makes so much logical sense as a query language and, with small variances, is used to directly query so many databases.
In R, why learn the data.tables (OK, speed) or dplyr paradigms, when SQL can be easily applied directly to dataframes? There are libraries to support this like sqldf, tidyquery and duckdf (author). And I'm sure the situation is similar in Python.
This is not a post against great libraries like data.table and dplyr, which I do use from time to time. It's more of a question about why SQL is not more popular as the query language de jour for data science.
Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
🦆 SQL for R dataframes, with ducksProject mention: ClickHouse as an alternative to Elasticsearch for log storage and analysis | news.ycombinator.com | 2021-03-02
Yeah, I agree sqldf is quite slow. Fair point.
As you've seen, duckdb registers an "R data frame as a virtual table." I'm not sure what they mean by "yet" either.
Of course it is possible to write an R dataframe to an on-disk duckdb table, if that's what you want to do.
There are some simple benchmarks on the bottom of the duckdf README. Essentially I found for basic SQL SELECT queries, dplyr is quicker, but for much more complex queries, the duckdf/duckdb combination performs better.
If you really want speed of course, just use data.table.