Tidier.jl vs dtplyr

Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse. (by TidierOrg)

Suggest topics

Source Code

Suggest alternative

Edit details

dtplyr

Data table backend for dplyr (by tidyverse)

R dplyr Datatable

Source Code

dtplyr.tidyverse.org

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Tidier.jl		dtplyr
	Project
5	Mentions	24
492	Stars	655
4.7%	Growth	-0.2%
8.5	Activity	7.5
7 days ago	Latest Commit	3 months ago
Julia	Language	R
MIT License	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Tidier.jl

Posts with mentions or reviews of Tidier.jl. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-27.

Tidier.jl: Meta-package for data analysis in Julia, modeled after R tidyverse
1 project | news.ycombinator.com | 15 Feb 2024
Julia 1.10 Released
15 projects | news.ycombinator.com | 27 Dec 2023

btw, there has been a pretty nice effort of reimplementing the tidyverse in julia with https://github.com/TidierOrg/Tidier.jl and it seems to be quite nice to work with, if you were missing that from R at least
Pandas vs. Julia – cheat sheet and comparison
7 projects | news.ycombinator.com | 17 May 2023

Indeed DataFrames.jl isn't and won't be the fastest way to do many things. It makes a lot of trade offs in performance for flexibility. The columns of the dataframe can be any indexable array, so while most examples use 64-bit floating point numbers, strings, and categorical arrays, the nice thing about DataFrames.jl is that using arbitrary precision floats, pointers to binaries, etc. are all fine inside of a DataFrame without any modification. This is compared to things like the Pandas allowed datatypes (https://pbpython.com/pandas_dtypes.html). I'm quite impressed by the DataFrames.jl developers given how they've kept it dynamic yet seem to have achieved pretty good performance. Most of it is smart use of function barriers to avoid the dynamism in the core algorithms. But from that knowledge it's very clear that systems should be able to exist that outperform it even with the same algorithms, in some cases just by tens of nanoseconds but in theory that bump is always there.
In the Julia world the one which optimizes to be fully non-dynamic is TypedTables (https://github.com/JuliaData/TypedTables.jl) where all column types are known at compile time, removing the dynamic dispatch overhead. But in Julia the minor performance gain of using TypedTables vs the major flexibility loss is the reason why you pretty much never hear about it. Probably not even worth mentioning but it's a fun tidbit.
> For what it's worth, data.table is my favourite to use and I believe it has the nicest ergonomics of the three I spoke about.
I would be interested to hear what about the ergonomics of data.table you find useful. if there are some ideas that would be helpful for DataFrames.jl to learn from data.table directly I'd be happy to share it with the devs. Generally when I hear about R people talk about tidyverse. Tidier (https://github.com/TidierOrg/Tidier.jl) is making some big strides in bringing a tidy syntax to Julia and I hear that it has had some rapid adoption and happy users, so there are some ongoing efforts to use the learnings of R API's but I'm not sure if someone is looking directly at the data.table parts.
Tidyverse 2.0.0
9 projects | news.ycombinator.com | 9 Apr 2023

“Tidier.jl is a 100% Julia implementation of the R tidyverse mini-language in Julia.”
https://github.com/TidierOrg/Tidier.jl
What's Julia's biggest weakness?
7 projects | /r/Julia | 18 Mar 2023

A recent package, Tidier.jl, is coming from a R package developer: https://github.com/kdpsingh/Tidier.jl

dtplyr

Posts with mentions or reviews of dtplyr. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-09.

Tidyverse 2.0.0
9 projects | news.ycombinator.com | 9 Apr 2023

Can’t say I’ve used it, but isn’t that what dtplyr is supposed to provide?
https://dtplyr.tidyverse.org/
Error when trying to use dtplyr::lazy_dt, "invalid argument to unary operator"
1 project | /r/Rlanguage | 7 Apr 2023

# I am trying to follow the example at https://dtplyr.tidyverse.org/
Millions of rows
1 project | /r/rprogramming | 6 Apr 2023

FYI the developer of tidytable has been developing dtplyr for the Tidyverse. You might like that too!
fuzzyjoin - "Error in which(m) : argument to 'which' is not logical"
2 projects | /r/rstats | 6 Apr 2023

If you need speed, you should consider using dtplyr (or tidytable), or even dbplyr with duckdb.
Best alternative to Pandas 2023?
3 projects | /r/datascience | 13 Jan 2023

https://dtplyr.tidyverse.org/ ?
R Dialects Broke Me
2 projects | /r/Rlanguage | 4 Jan 2023

If you want data.table speed, but using dplyr/tidy then dtplyr is a good package to have handy. Personally I love R, and choose R + NodeJS as my gotos for everything I do, and use Python only when I have to.
Merging csv from environment.
2 projects | /r/RStudio | 8 Oct 2022

Also, that dataset is quite big, and the "base" Tidyverse will be excessively slow. You should supplement the "base" Tidyverse packages (i.e. dplyr and tidyr) with either dtplyr or dbplyr (+ duckDB). I'd suggest starting with dtplyr, which should handle 10M+ rows fine.
mutate ( ) function is only working in code chunk I run it in. It does not change the column in my data frame other than in that one code chunk.
1 project | /r/RStudio | 30 Sep 2022

If you want, there's a "substitute" for dplyr called dtplyr (also part of the Tidyverse), which "translates" your dplyr/tidyr code into data.table behind the scenes, and allows you to make your modifications apply directly to the original dataset by default:
R process taking over 2 hours to run suddenly
1 project | /r/Rlanguage | 14 Aug 2022

Install the dtplyr package and change your code to:
DS student here: why use R over Python?
2 projects | /r/datascience | 13 Jul 2022

Get the best of both worlds (tidyverse + data.tables) with dtplyr, a data.table backend for dplyr.

What are some alternatives?

When comparing Tidier.jl and dtplyr you can also consider the following projects:

Julia-DataFrames-Tutorial - A tutorial on Julia DataFrames package

tidytable - Tidy interface to 'data.table'

polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust

py-shiny - Shiny for Python

tidypolars - Tidy interface to polars

DataFramesMeta.jl - Metaprogramming tools for DataFrames

vaex - Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

julia - The Julia Programming Language

Datamancer - A dataframe library with a dplyr like API

db-benchmark - reproducible benchmark of database-like ops

explorer - Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir