Our great sponsors
-
> Now let's get a sane concurrency story
This is in very active development[1]! And seems like the Core Team is not totally against the idea[2].
[1] https://github.com/colesbury/nogil
[2] https://pyfound.blogspot.com/2022/05/the-2022-python-languag...
-
They aren't, which is why the statement doesn't make sense haha. The difference is really just that tuples of isbits types can be stack allocated (and tuples of isbits types are isbits, so they can nest, etc.), and so in some cases with sufficiently small amounts of data, creating stack allocated objects is much faster than creating heap allocated objects.
But if you compare concretely typed heap allocated arrays in Julia and C, there's no real performance difference (Steven Johnson has a nice notebook displaying this https://scpo-compecon.github.io/CoursePack/Html/languages-be..., and if you see the work on LoopVectorization.jl it makes it really clear that the only major difference is that C++ tends to know how to prove non-aliasing (ivdep) in a bit more places (right now)). So the real question is, did you actually want to use a heap allocated object? I think this really catches people new to performance optimization off guard since in other dynamic languages you don't have such control to "know" things will be placed on the stack, so you generally heap allocate everything (Python even heap allocates numbers), which is one of the main reasons for the performance difference against C. Julia gives you the tools to write code that doesn't heap allocate objects, but also makes it easy to heap allocate objects (because if you had to `malloc` and `free` everywhere... well you'd definitely lose the "like Python" and and be much closer to C++ or Rust in terms of "ease of use", which would defeat the purpose for many cases). But if you come from a higher level language, there's this magical bizarre land of "things that don't allocate" (on the heap) and so you learn "oh I got 30x faster from Python, but then someone on a forum showed me how to do 100x better by not making arrays?", which is somewhat obvious from a C++ mindset but less obvious from a higher level language mindset.
And FWIW, this is probably the biggest performance issue newcomers run into, and I think one of the things to which a solution is required to make it mainstream. Thankfully, there's already prototype PRs that are well underway, for example https://github.com/JuliaLang/julia/pull/43573 automatically stack-allocates small arrays which can prove certain escape properties, https://github.com/JuliaLang/julia/pull/43777 is looking to hoist allocations out of loops so even if they are required they are minimized in the loop contexts automatically, etc. The biggest impediment is that EscapeAnalysis.jl is not in Base, and it requires JET.jl which is not in Base, and so both of those need to be made "Base friendly" to join the standard library and then the compiler can start to rely on its analysis (which will be nice because JET.jl can do things like throw more statically-deducible compile time errors, another thing people ask for with Julia). There's a few people who are working really hard on doing this, so it is a current issue but it's a known one with prototyped solutions and a direction to get it into the shipped compiler. When that's all said and done, of course "good programming" will still help the compiler in some cases, but in most people shouldn't have to worry about stack vs heap (it's purposefully not part of the user API in Julia and considered a compiler detail for exactly this reason, so it's non-breaking for the compiler to change where objects live and improve performance over time).
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
I see your point, but it directly conflicts with the effort many people put into producing extremely fast libraries for specific purposes, such as web frameworks (benchmarked extensively), ORMs and things like json and date parsing, as seen in the excellent ciso8601 [1] for example.
-
Graal
GraalVM compiles Java applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀
-
https://github.com/mypyc/mypyc
> Mypyc compiles Python modules to C extensions. It uses standard Python type hints to generate fast code. Mypyc uses mypy to perform type checking and type inference.
-
The goal with faster cpython is for small compounding improvements with each point release[0]. So in the end it should be much more than a tiny improvement.
[0] https://github.com/markshannon/faster-cpython/blob/master/pl...
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
-
Hold mine :D https://github.com/anchpop/genomics_viz/blob/master/genomics...
That's one expression because it used to be part of a giant comprehension, but I moved it into a function for a bit more readability. I'm considering moving it back just for kicks though.
-
I use a beautiful hack in the Cosmopolitan Libc codebase (x86 only) where we rewrite NOPs into function calls at runtime for all locking operations as soon as clone() is called. https://github.com/jart/cosmopolitan/blob/5df3e4e7a898d223ce... The big ugly macro that makes it work is here https://github.com/jart/cosmopolitan/blob/master/libc/intrin... An example of how it's used is here. https://github.com/jart/cosmopolitan/blob/5df3e4e7a898d223ce... What it means is that things like malloc() goes 3x faster if you're not actually using threads. The tradeoff is it's highly architecture specific and requires self-modifying code.
-
> My mistake in retrospect was using small arrays as part of a struct, which being immutable got replaced at each time step with a new struct requiring new arrays to be allocated and initialized. I would not have done that in c++, but julia puts my brain in matlab mode.
I see. Yes, it's an interesting design space where Julia makes both heap and stack allocations easy enough, so sometimes you just reach for the heap like in MATLAB mode. Hopefully Prem and Shuhei's work lands soon enough to stack allocate small non-escaping arrays so that user's done need to think about this.
> Alignment I'd assumed, but padding the struct instead of the tuple did nothing, so probably extra work to clear a piece of an simd load. Any insight on why avx availability didn't help would be appreciated. I did verify some avx instructions were in the asm it generated, so it knew, it just didn't use.
The major differences at this point seem to come down to GCC (g++) vs LLVM and proofs of aliasing. LLVM's auto-vectorizer isn't that great, and it seems to be able to prove 2 arrays are not aliasing less reliably. For the first part, some people have just improved the loop analysis code from the Julia side (https://github.com/JuliaSIMD/LoopVectorization.jl), forcing SIMD onto LLVM can help it make the right choices. But for the second part you do need to do `@simd ivdep for ...` (or use LoopVectorization.jl) to match some C++ examples. This is hopefully one of the things that the JET.jl and other new analysis passes can help with, along with the new effects system (see https://github.com/JuliaLang/julia/pull/43852, this is a pretty huge new compiler feature in v1.8, but right now it's manually specified and will take time before things like https://github.com/JuliaLang/julia/pull/44822 land and start to make it more pervasive). When that's all together, LLVM will have more ammo for proving things more effectively (pun intended).