|2 months ago||8 days ago|
|Mozilla Public License 2.0||MIT License|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Rust and what it needs to gain space in computation-oriented applications
7 projects | reddit.com/r/rust | 24 Nov 2021
You should check out polars, datafusion, influxdb iox and databend, all written in native Rust and powered by the Apache Arrow format. Polars in particular is pretty dam fast and has bindings for Python.
Database-Like Ops Benchmark
1 project | news.ycombinator.com | 20 Nov 2021
A better dtypes for pandas dataframes pulled from Postgres
1 project | reddit.com/r/datascience | 14 Nov 2021
Here is a good comparison: https://h2oai.github.io/db-benchmark/
Introducing tidypolars - a Python data frame package with syntax familiar to R tidyverse users
4 projects | reddit.com/r/datascience | 10 Nov 2021
The biggest difference with this one is that it's built on top of the polars package, which is probably the fastest data frame manipulation library out there. All of the other dplyr-to-python packages are build on top of pandas (which is very slow in comparison).
Introducing tidypolars - a Python data frame package for R tidyverse users
9 projects | reddit.com/r/rstats | 10 Nov 2021
I think having a basic understanding of pandas, given how broadly it's used, is beneficial. That being said, polars seems to be matching or beating data.table in performance, so I think it'd be very worth it to take it up. Wes McKinney, creator of pandas, has been quite vocal about architecture flaws of pandas -- which is why he's been working on the Arrow project. polars is based on Arrow, so in principle it's kinda like pandas 2.0 (adopting the changes that Wes proposed).9 projects | reddit.com/r/rstats | 10 Nov 2021
tidypolars uses the polars package as a backend, which might be the fastest data frame manipulation library out there. (Faster even than R's data.table, which has been the king of speed for many years.)
Your perfect program/language for experience studies?
1 project | reddit.com/r/actuary | 4 Nov 2021
Julia has ExperienceStudies.jl to help with exposure calculations and MortalityTables.jl for mortality rate data. It also performs very well in data science benchmarks: https://h2oai.github.io/db-benchmark/
Comparing SQLite, DuckDB and Arrow
5 projects | news.ycombinator.com | 27 Oct 2021
this benchmark is more comprehensive for this type of analytical work:
1 project | reddit.com/r/datascience | 23 Oct 2021
Data too big to work with memory you can do in R too, using SparkR. I agree the documentation to something like PySpark is better though. For data within memory, data.table in R beats pandas. Loses to Polars (implemented in Rust that has bindings in Python) but that is not in use much as its new: https://github.com/h2oai/db-benchmark.
Turning database into a searchable dashboard?
3 projects | reddit.com/r/datascience | 21 Oct 2021
Is rust good for mathematical computing?
4 projects | reddit.com/r/rust | 16 Nov 2021
*: This is possible in Julia using PackageCompiler.jl but it ships the entire runtime, so big binaries, and the process isn't too smooth yet. In theory, you're definitely capable of compiling into small binaries that don't embed the runtime or at least a big chunk of it, but nobody has worked on this enough yet
Compile for faster execution?
4 projects | reddit.com/r/Julia | 10 Oct 2021
Simply importing a library takes 30-40 seconds?
1 project | reddit.com/r/Julia | 31 Aug 2021
Why Julia's multiple dispatch is so greated explained with Pokemons
2 projects | news.ycombinator.com | 20 Jul 2021
Julia is fairly fast, since its type system _only_ does dynamic/runtime typing, the JIT is optimized towards that. You'll experience some minor startup lag, typically due to initial JIT'ing of any new used functions. However, this has largely be remedied with a compiler backend that completely precomputes this behavior. https://julialang.github.io/PackageCompiler.jl/dev/
[D] fast.ai's Jeremy Howard on Why Python is not the future of machine learning - Gradient Dissent Clip
1 project | reddit.com/r/MachineLearning | 15 Jun 2021
Python has one performance advantage over other VM/jit-based languages (including Java) which is its very short startup time, which is well suited for horizontal scaling, but otherwise your point is fair. In fact, Julia is better designed for AOT compilation, which means that eventually it will be much better-suited for production than pure dockerized python.
Why not Julia?
11 projects | reddit.com/r/Julia | 1 May 2021
Livebook: A collaborative and interactive code notebook for Elixir
6 projects | news.ycombinator.com | 18 Apr 2021
I'm considering Rust, Go, or Julia for my next language and I'd like to hear your thoughts on these
12 projects | reddit.com/r/rust | 16 Apr 2021
Package load times were cut by roughly a factor of two, in my experience, but that doesn't bring the initialization overhead down to the point where it's usable as a standalone microservice. Your best options at this point are https://github.com/dmolina/DaemonMode.jl, which keeps a Julia process running in the background using a client/server model, or https://github.com/JuliaLang/PackageCompiler.jl, which allows for ~zero-overhead package loading (at the cost of some up-front complexity).
Julia 1.6 Highlights
9 projects | news.ycombinator.com | 25 Mar 2021
We call these "system images" and you can generate them with [PackageCompiler](https://github.com/JuliaLang/PackageCompiler.jl). Unfortunately, it's still a little cumbersome to create them, but this is something that we're improving from release to release. One possible future is where an environment can be "baked", such that when you start Julia pointing to that environment (via `--project`) it loads all the packages more or less instantaneously.
The downside is that generating system images can be quite slow, so we're still working on ways to generate them incrementally. In any case, if you're inspired to work on this kind of stuff, it's definitely something the entire community is interested in!
Data Science in Julia for Hackers
3 projects | news.ycombinator.com | 19 Mar 2021
I think they're referring to the JIT compilation time. There are some ways to mitigate it, such as creating an image using PackageCompiler.jl, but it's definitely a noticeable issue IME. https://julialang.github.io/PackageCompiler.jl/dev/
What are some alternatives?
arrow-datafusion - Apache Arrow DataFusion and Ballista query engines
polars - Fast multi-threaded DataFrame library in Rust and Python
julia - The Julia Programming Language
LuaJIT - Mirror of the LuaJIT git repository
ModelingToolkit.jl - A modeling framework for automatically parallelized scientific machine learning (SciML) in Julia. A computer algebra system for integrated symbolics for physics-informed machine learning and automated transformations of differential equations
DataFramesMeta.jl - Metaprogramming tools for DataFrames
StaticCompiler.jl - Compiles Julia code to a standalone library (experimental)
Revise.jl - Automatically update function definitions in a running Julia session
Genie.jl - The highly productive Julia web framework
AlgebraOfGraphics.jl - Combine ingredients for a plot