-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
zig
General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
-
ixy-languages
A high-speed network driver written in C, Rust, C++, Go, C#, Java, OCaml, Haskell, Swift, Javascript, and Python
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
ripgrep
ripgrep recursively searches directories for a regex pattern while respecting your gitignore
-
regex
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
There is undefined behavior in Rust affecting real-world code, including Tokio's scheduler, and code produced by async fn definitions. UnsafeCell doesn't solve the problem. There's more discussion at https://gist.github.com/Darksonn/1567538f56af1a8038ecc3c664a....
Bug report at https://github.com/rust-lang/rust/issues/63818.
Reddit threads at (older) https://www.reddit.com/r/rust/comments/l4roqk/a_fix_for_the_... and (newer) https://www.reddit.com/r/rust/comments/lxw6cl/update_to_llvm....
Somewhat related HN thread at https://news.ycombinator.com/item?id=26406989.
FTR, there are some efforts to integrate GCC & Rust:
https://github.com/antoyo/rustc_codegen_gcc
> computed goto
I did a deep dive into this topic lately when exploring whether to add a language feature to zig for this purpose. I found that, although finnicky, LLVM is able to generate the desired machine code if you give it a simple enough while loop continue expression. So I think it's reasonable to not have a computed goto language feature.
More details here, with lots of fun godbolt links: https://github.com/ziglang/zig/issues/8220
To practise Rust, I rewrote my small C99 library in it [1]. Performance is more or less the same, I only had to use unchecked array access in one small hot loop (details in README.md). I haven't ported multithreading yet, but I expect Rust's Rayon parallel iterators will likewise be comparable to OpenMP.
[1] https://github.com/GreatAttractor/libskry_r
https://users.rust-lang.org/t/link-the-rust-standard-library... and https://github.com/johnthagen/min-sized-rust
I'd be interested in any up-to-date trick to do better than this.
We've implemented network drivers in C and Rust and did a performance comparison. Interestingly, the C-to-Rust-transpiled code ended up being faster than the original C implementation: https://github.com/ixy-languages/ixy-languages/blob/master/R...
https://github.com/emmericp/ixy/blob/0e00605be4153b06df06184...
Looks like you're compiling C code with -O2. Does Rust build set -O3 on clang? Did you try -O3 with C? I know it's not guaranteed to be faster, just curious.
No you don't. I've written multiple programs that load things instantly off the file system via memory maps. See the fst crate[1], for example, which is designed to work with memory maps.
Rust "works badly with memory mapped files" doesn't mean, "Rust can't use memory mapped files." It means, "it is difficult to reconcile Rust's safety story with memory maps." ripgrep for example uses memory maps because they are faster sometimes, and its safety contract[2] is a bit strained. But it works.
[1] - https://github.com/BurntSushi/fst/
[2] - https://docs.rs/grep-searcher/0.1.7/grep_searcher/struct.Mma...
I’ve been using smartstrings, which is both excellent and maintained. https://github.com/bodil/smartstring
You don't have to retain objects in an internal linked list when the refcount drops to zero, but Python does.
Type-specific free lists:
* https://github.com/python/cpython/blob/master/Objects/floato...
* https://github.com/python/cpython/blob/master/Objects/tupleo...
And just wrapping malloc in general; there's no refcounting reason for this, they just assume system malloc is slow (which might be true, for glibc) and wrap it in the default build configuration:
https://github.com/python/cpython/blob/master/Objects/obmall...
So many layers of wrapping malloc, just because system allocators were slow in 2000. Defeats free() poisoning and ASAN. obmalloc can be disabled by turning off PYMALLOC, but that doesn't disable the per-type freelists IIRC. And PYMALLOC is enabled by default.
Why would you guess about how much C or Rust code that ripgrep contains when you could very quickly look? https://github.com/BurntSushi/ripgrep
> Languages like Idris and Agda are different because sometimes code isn’t executed at all. A proof may depend on knowing that some code will terminate without running it.
Yes. They are rather different in other respects as well. Though you can produce executable code from Idris and Agda, of course.
> With respect to deadlocks, there’s little practical difference between an infinite loop and a loop that holds the lock for a very long time.
Yes, that's true. Though as a practical matter, I have heard that it's much harder to produce the latter by accident, even though only the former is forbidden.
For perhaps a more practical example, have a look at https://dhall-lang.org/ which also terminates, but doesn't have nearly as much involved proving.
It couldn't figure it out from looking through ripgrep's website: does ripgrep support intersection and complement of expressions? Like eg https://github.com/google/redgrep does.
Regular languages are closed under those operations after all.
I've made some attempts, but nothing production grade.
About large character classes: how are those harder than in approaches? If you build any FSM you have to deal with those, don't you?
One way to handle them that works well when the characters in your classes are mostly next to each other unicode, is to express your state transition function as an 'interval map'
What I mean is that eg a hash table or an array lets you build representations of mathematical functions that map points to values.
You want something that can model a step function.
You can either roll your own, or write something around a sorted-map data structure.
Eg in C++ you'd base the whole thing around https://en.cppreference.com/w/cpp/container/map/upper_bound (or https://hackage.haskell.org/package/containers-0.4.0.0/docs/... in Haskell.)
The keys in your sorted map are the 'edges' of your characters classes (eg where they start and end).
Does that make sense? Or am I misunderstanding the problem?
> I personally always get stuck at how to handle things like captures [...]
Let me think about that one for a while. Some Googling suggests https://github.com/elfsternberg/barre though
> About large character classes: how are those harder than in approaches? If you build any FSM you have to deal with those, don't you?
I mean specifically in the context of derivatives. IIRC, the formulation used in Turon's paper wasn't amenable to large classes.
Yes, interval sets work great: https://github.com/rust-lang/regex/blob/master/regex-syntax/...
This is why I asked if a production grade regex engine based on derivatives exists. Because I want to see how the engineering is actually done.
> What do you want your capture groups to do? Do you eg just want to return pointers to where you captured them (if any)?
Look at any production grade regex engine. It will implement captures. It should do what they do.
> I have an inkling that something inspired by https://en.wikipedia.org/wiki/Viterbi_algorithm might work.
Nothing about Viterbi is fast, in my experience implementing it in the past. :-)
> https://github.com/google/redgrep/blob/main/parser.yy mentions something about capture, but not sure if that has anything to do with capture groups.
It looks like it does, and in particular see: https://github.com/google/redgrep/blob/6b9d5b02753c4ece17e2f...
But that's only for parsing the regex itself. I don't see any match APIs that utilize them. I wouldn't expect to either, because you can't implement capturing inside a DFA. (You need a tagged DFA, which is a strictly more powerful thing. But in that case, the DFA size explodes. See the re2c project and their associated papers.)
If I'm remembering correctly, I think the problem with derivatives is that they jump straight to a DFA. You can't do that a production regex engine because a DFA's worst case size is exponential in the size of the regex.