DaemonMode.jl
db-benchmark
Our great sponsors
DaemonMode.jl | db-benchmark | |
---|---|---|
22 | 91 | |
269 | 319 | |
- | 0.9% | |
4.7 | 0.0 | |
4 months ago | 10 months ago | |
Julia | R | |
MIT License | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DaemonMode.jl
-
Potential of the Julia programming language for high energy physics computing
Thats for an entry point, you can search `Base.@main` to see a little summary of it. Later it will be able to be callable with `juliax` and `juliac` i.e. `~juliax test.jl` in shell.
DynamicalSystems looks like a heavy project. I don't think you can do much more on your own. There have been recent features in 1.10 that lets you just use the portion you need (just a weak dependency), and there is precompiletools.jl but these are on your side.
You can also look into https://github.com/dmolina/DaemonMode.jl for running a Julia process in the background and do your stuff in the shell without startup time until the standalone binaries are there.
-
Julia 1.9.0 lives up to its promise
> If I were to use e.g. Rust with polars, load time would be virtually none.
Because you're compiling...
And if you need to do the same in Julia, you should also pre-compile or some other method like https://github.com/dmolina/DaemonMode.jl (their demo shows loading a database, with subsequent loads after the first one taking roughly ~0.2% of the first)
- Administrative Scripting with Julia
- GNU Octave 8.1
-
Ask HN: Why is Julia so underrated?
Well, not nicely certainly, but:
https://github.com/dmolina/DaemonMode.jl
> portable
Neither is python - it just relies on universal availability. Over time…
-
Is Julia suitable today as a scripting language?
You can get around a lot of these problems with DaemonMode.jl though.
-
Julia performance, startup.jl, and sysimages
You might want DaemonMode.jl
-
Can I execute code in Julia REPL if I'm connected to a remote server?
https://github.com/dmolina/DaemonMode.jl can possibly help in the future. Leaving it here so that people know this is planned.
- Ask HN: Why hasn't the Deep Learning community embraced Julia yet?
-
Compile for faster execution?
If you strongly prefer to run scripts though, then you can use the package https://github.com/dmolina/DaemonMode.jl in order to re-use a Julia session between multiple scripts, saving you recompilation time.
db-benchmark
- Database-Like Ops Benchmark
-
Polars
Real-world performance is complicated since data science covers a lot of use cases.
If you're just reading a small CSV to do analysis on it, then there will be no human-perceptible difference between Polars and Pandas. If you're reading a larger CSV with 100k rows, there still won't be much of a perceptible difference.
Per this (old) benchmark, there are differences once you get into 500MB+ territory: https://h2oai.github.io/db-benchmark/
-
DuckDB performance improvements with the latest release
I do think it was important for duckdb to put out a new version of the results as the earlier version of that benchmark [1] went dormant with a very old version of duckdb with very bad performance, especially against polars.
[1] https://h2oai.github.io/db-benchmark/
-
Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster
https://news.ycombinator.com/item?id=33270638 :
> Apache Ballista and Polars do Apache Arrow and SIMD.
> The Polars homepage links to the "Database-like ops benchmark" of {Polars, data.table, DataFrames.jl, ClickHouse, cuDF, spark, (py)datatable, dplyr, pandas, dask, Arrow, DuckDB, Modin,} but not yet PostgresML? https://h2oai.github.io/db-benchmark/ *
LLM -> Vector database: https://en.wikipedia.org/wiki/Vector_database
/? inurl:awesome site:github.com "vector database"
-
Pandas vs. Julia – cheat sheet and comparison
I agree with your conclusion but want to add that switching from Julia may not make sense either.
According to these benchmarks: https://h2oai.github.io/db-benchmark/, DF.jl is the fastest library for some things, data.table for others, polars for others. Which is fastest depends on the query and whether it takes advantage of the features/properties of each.
For what it's worth, data.table is my favourite to use and I believe it has the nicest ergonomics of the three I spoke about.
-
Any faster Python alternatives?
Same. Numba does wonders for me in most scenarios. Yesterday I've discovered pola-rs and looks like I will add it to the stack. It's API is similar to pandas. Have a look at the benchmarks of cuDF, spark, dask, pandas compared to it: Benchmarks
-
Pandas 2.0 (with pyarrow) vs Pandas 1.3 - Performance comparison
The syntax has similarities with dplyr in terms of the way you chain operations, and it’s around an order of magnitude faster than pandas and dplyr (there’s a nice benchmark here). It’s also more memory-efficient and can handle larger-than-memory datasets via streaming if needed.
-
Pandas v2.0 Released
If interested in benchmarks comparing different dataframe implementations, here is one:
https://h2oai.github.io/db-benchmark/
- Database-like ops benchmark
-
Python "programmers" when I show them how much faster their naive code runs when translated to C++ (this is a joke, I love python)
Bad examples. Both numpy and pandas are notoriously un-optimized packages, losing handily to pretty much all their competitors (R, Julia, kdb+, vaex, polars). See https://h2oai.github.io/db-benchmark/ for a partial comparison.
What are some alternatives?
julia - The Julia Programming Language
polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust
Makie.jl - Interactive data visualizations and plotting in Julia
datafusion - Apache DataFusion SQL Query Engine
HTTP.jl - HTTP for Julia
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
FromFile.jl - Julia enhancement proposal (Julep) for implicit per file module in Julia
databend - 𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
julia-numpy-fortran-test - Comparing Julia vs Numpy vs Fortran for performance and code simplicity
DataFramesMeta.jl - Metaprogramming tools for DataFrames
sktime - A unified framework for machine learning with time series