tidyquery | duckdf | |
2 | 3 | |
167 | 44 | |
- | - | |
0.0 | 0.0 | |
about 2 years ago | about 1 year ago | |
R | R | |
Apache License 2.0 | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Can "dplyr" code automatically be converted to SQL code?
ClickHouse as an alternative to Elasticsearch for log storage and analysis
> SQL is a perfect language for analytics.
Slightly off topic, but I strongly agree with this statement and wonder why the languages used for a lot of data science work (R, Python) don't have such a strong focus on SQL.
It might just be my brain, but SQL makes so much logical sense as a query language and, with small variances, is used to directly query so many databases.
In R, why learn the data.tables (OK, speed) or dplyr paradigms, when SQL can be easily applied directly to dataframes? There are libraries to support this like sqldf[1], tidyquery[2] and duckdf[3] (author). And I'm sure the situation is similar in Python.
This is not a post against great libraries like data.table and dplyr, which I do use from time to time. It's more of a question about why SQL is not more popular as the query language de jour for data science.
[1] https://cran.r-project.org/web/packages/sqldf/index.html
[2] https://github.com/ianmcook/tidyquery
[3] https://github.com/phillc73/duckdf
DuckDB – in-process SQL OLAP database management system
Quite a while ago, when duckdb was just a duckling, I wrote an R package that supported direct manipulation of R dataframes using SQL.[1] duckdb was the engine for this.
The approach was never as fast as data.table but did approach the speed of dplyr for more complex queries.
Life had other things in store for me and I haven’t touched this library for a while now.
At the time there was no Julia connector for duckdb, but now that there is, I’d like to try this approach in that language.
[1] https://github.com/phillc73/duckdf
ClickHouse as an alternative to Elasticsearch for log storage and analysis
Yeah, I agree sqldf is quite slow. Fair point.
As you've seen, duckdb registers an "R data frame as a virtual table." I'm not sure what they mean by "yet" either.
Of course it is possible to write an R dataframe to an on-disk duckdb table, if that's what you want to do.
There are some simple benchmarks on the bottom of the duckdf README[1]. Essentially I found for basic SQL SELECT queries, dplyr is quicker, but for much more complex queries, the duckdf/duckdb combination performs better.
If you really want speed of course, just use data.table.
[1] https://github.com/phillc73/duckdf
Julia 1.6: what has changed since Julia 1.0?
That's a really good point that I'd not really thought about. I'd never really considered the difference between calling just functions versus macros.
Thinking about Query.jl and DataFramesMeta.jl, and I am for sure not an expert in either, I can't specifically speak to your `head` example, but other base functions can be combined with macros. For example, see the LINQ examples from DataFramesMeta.jl[1] where `mean` is being used. Or again the LINQ style examples in Query.jl[2], where `descending` is used in the first example, or `length` later in the Grouping examples.
Is that the kind of thing you meant?
For whatever reason, with the way my brain is wired, the LINQ style of query just works for me. I have never directly used LINQ, but do have some SQL experience. In fact, I wrote some dinky little wrapper functions[3] around duckdb[4] so I could directly query R dataframes and datatables with SQL using that backend, rather than sqldf[5].
[1] https://juliadata.github.io/DataFramesMeta.jl/stable/#@linq-...
[2] https://www.queryverse.org/Query.jl/stable/linqquerycommands...
[3] https://github.com/phillc73/duckdf
[4] https://duckdb.org/
[5] https://cran.r-project.org/web/packages/sqldf/index.html
What are some alternatives?
tidylog - Tidylog provides feedback about dplyr and tidyr operations. It provides wrapper functions for the most common functions, such as filter, mutate, select, and group_by, and provides detailed output for joins.
Makie.jl - Interactive data visualizations and plotting in Julia
tidyverse - Easily install and load packages from the tidyverse
julia - The Julia Programming Language
janitor - simple tools for data cleaning in R
meilisearch-js-plugins - The search client to use Meilisearch with InstantSearch.
cloki-go-legacy - Clickhouse Loki API in GO (WIP)
clickhousedb_fdw - PostgreSQL's Foreign Data Wrapper For ClickHouse
tidyquant - Bringing financial analysis to the tidyverse
loki - Like Prometheus, but for logs.
MeiliSearch - A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.