FromFile.jl
duckdf
FromFile.jl | duckdf | |
---|---|---|
6 | 3 | |
131 | 41 | |
- | - | |
1.5 | 0.0 | |
about 1 year ago | 4 months ago | |
Julia | R | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
FromFile.jl
-
A Programming language ideal for Scientific Sustainability and Reproducibility?
On include-- you might like FromFile.jl as an alternative.
- Modules in Julia
-
How to import an own module from the current directory?
For this and other oddities with Julia's include/import system (and especially as you're coming from Python), I'd recommend FromFile as a readable way to approach things.
-
Why not Julia?
You might like FromFile.jl.
-
Problems with nested `include`s and solutions?
However, if you prefer a Python-like experience, checkout FromFile.jl
-
Julia 1.6: what has changed since Julia 1.0?
I'm not using modules. I usually start with one file with a demo or similarly named function that is called if the file is called as an entry point (like if __name__ == '__main__', except Julia makes it even worse).
I tend to refactor code out of there to separate files, and then somehow import it. An ugly way is include, and I've tried Revise.jl with includet.
But I think the least ugly approach is the @from macro from here: https://github.com/Roger-luo/FromFile.jl Judging from some opinion in bug trackers, this is probably gonna get totally shunned by core devs and they'll keep on bikeshedding about the import stuff forever.
With this setup I have about 400 lines of code in three files. It compiles for 15 seconds. After every single change, and actually without any changes too.
I think performance wise this should be equivalent to using modules, but saving some pointless ceremony.
duckdf
-
DuckDB – in-process SQL OLAP database management system
Quite a while ago, when duckdb was just a duckling, I wrote an R package that supported direct manipulation of R dataframes using SQL.[1] duckdb was the engine for this.
The approach was never as fast as data.table but did approach the speed of dplyr for more complex queries.
Life had other things in store for me and I haven’t touched this library for a while now.
At the time there was no Julia connector for duckdb, but now that there is, I’d like to try this approach in that language.
[1] https://github.com/phillc73/duckdf
-
ClickHouse as an alternative to Elasticsearch for log storage and analysis
Yeah, I agree sqldf is quite slow. Fair point.
As you've seen, duckdb registers an "R data frame as a virtual table." I'm not sure what they mean by "yet" either.
Of course it is possible to write an R dataframe to an on-disk duckdb table, if that's what you want to do.
There are some simple benchmarks on the bottom of the duckdf README[1]. Essentially I found for basic SQL SELECT queries, dplyr is quicker, but for much more complex queries, the duckdf/duckdb combination performs better.
If you really want speed of course, just use data.table.
[1] https://github.com/phillc73/duckdf
-
Julia 1.6: what has changed since Julia 1.0?
That's a really good point that I'd not really thought about. I'd never really considered the difference between calling just functions versus macros.
Thinking about Query.jl and DataFramesMeta.jl, and I am for sure not an expert in either, I can't specifically speak to your `head` example, but other base functions can be combined with macros. For example, see the LINQ examples from DataFramesMeta.jl[1] where `mean` is being used. Or again the LINQ style examples in Query.jl[2], where `descending` is used in the first example, or `length` later in the Grouping examples.
Is that the kind of thing you meant?
For whatever reason, with the way my brain is wired, the LINQ style of query just works for me. I have never directly used LINQ, but do have some SQL experience. In fact, I wrote some dinky little wrapper functions[3] around duckdb[4] so I could directly query R dataframes and datatables with SQL using that backend, rather than sqldf[5].
[1] https://juliadata.github.io/DataFramesMeta.jl/stable/#@linq-...
[2] https://www.queryverse.org/Query.jl/stable/linqquerycommands...
[3] https://github.com/phillc73/duckdf
[4] https://duckdb.org/
[5] https://cran.r-project.org/web/packages/sqldf/index.html
What are some alternatives?
julia - The Julia Programming Language
tidyquery - Query R data frames with SQL
DaemonMode.jl - Client-Daemon workflow to run faster scripts in Julia
Typesense - Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
JET.jl - An experimental code analyzer for Julia. No need for additional type annotations.
DataFramesMeta.jl - Metaprogramming tools for DataFrames
loki - Like Prometheus, but for logs.
SymbolicRegression.jl - Distributed High-Performance Symbolic Regression in Julia
Makie.jl - Interactive data visualizations and plotting in Julia
TwoBasedIndexing.jl - Two-based indexing
MeiliSearch - A lightning-fast search API that fits effortlessly into your apps, websites, and workflow