Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
src
Read-only git conversion of OpenBSD's official CVS src repository. Pull requests not accepted - send diffs to the tech@ mailing list.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
sqlx
🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, and SQLite. (by launchbadge)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Interesting. Looking at this repo, they have
Rust -> Ruby -> Java -> Golang
https://github.com/mariomka/regex-benchmark
Though it appears the numbers are two years old or so, and only for 3 specific regexes.
Totally agreed: almost all users (me/GoAWK included) want performance and don't care nearly as much about simplicity under the hood. Simplicity of implementation is of value for educational purposes, but we could easily have a small, simple 3rd party package for that. Go's regexp package is kinda too complex for a simple educational demonstration and too simple to be fast. :-)
I actually tried BurntSushi's https://github.com/BurntSushi/rure-go (bindings to Rust's regex engine) with GoAWK and it made regex handling 4-5x as fast for many regexes, despite the CGo overhead. However, rure-go (and CGo in general) is a bit painful to build, so I'm not going to use that. Maybe I'll create a branch for speed freaks who want it.
I've also thought of using https://gitlab.com/cznic/ccgo to convert Mawk's fast regex engine to Go source and see how that performs. Maybe on the next rainy day...
Totally agreed: almost all users (me/GoAWK included) want performance and don't care nearly as much about simplicity under the hood. Simplicity of implementation is of value for educational purposes, but we could easily have a small, simple 3rd party package for that. Go's regexp package is kinda too complex for a simple educational demonstration and too simple to be fast. :-)
I actually tried BurntSushi's https://github.com/BurntSushi/rure-go (bindings to Rust's regex engine) with GoAWK and it made regex handling 4-5x as fast for many regexes, despite the CGo overhead. However, rure-go (and CGo in general) is a bit painful to build, so I'm not going to use that. Maybe I'll create a branch for speed freaks who want it.
I've also thought of using https://gitlab.com/cznic/ccgo to convert Mawk's fast regex engine to Go source and see how that performs. Maybe on the next rainy day...
There's a lot of room for improvement on the compiler and library end. RE2 and Hyperscan demonstrate the ceiling here.
The NFA simulator is heavily optimized for readability, which means lots of recursion, and fewer special cases outside of small regexps. The compiler also doesn't perform certain optimizations like vectorization and emitting jump tables, which might be useful here.
There isn't a metaprogramming facility to generate equivalent Go code like in re2go: https://re2c.org/manual/manual_go.html. The best we can do is pre-declare the regexps globally as to initialize them once, but we still have to run the interpreter.
Moreover, thus far, a DFA matcher is out of a picture, as discussed here: https://github.com/golang/go/issues/11646.
Would using a library like Hyperscan improve Go's regex performance?
Reference: https://www.hyperscan.io/
For something like awk, I think you'd look before compiling, then create your own matcher. With an abstract Matcher interface that regexp implements.
It's C, but openbsd grep does something like this because libc regex is super slow. Look for fastcomp on https://github.com/openbsd/src/blob/master/usr.bin/grep/util... It's not super sophisticated, but enough to beat the full regex engine.
In the go code where I did this, it was a little different, with a static pattern. Something like "(\w+) apple" to find all apple adjectives or whatever, but the regexp wasted so much time matching words before not apples. A quick scan for "apple" to eliminate impossible matches made it faster. This depends more on knowing regex and corpus, so probably less relevant for awk.
I have done some optimisations in Go regex recently; I have a talk coming up on Saturday:
https://fosdem.org/2022/schedule/event/go_finite_automata/
This repo collects all the changes so you can try them out: https://github.com/grafana/regexp/tree/speedup#readme
That's not how software works. Millions-per-second of what? That's a serious load for any stack and depends on far more than just the language.
And it can also be done by other stacks as clearly seen in the TechEmpower benchmarks that runs standardized test suites. Here's .NET and Java doing 7 million HTTP requests/second and maxing out the network card throughput: https://www.techempower.com/benchmarks/#section=data-r20&hw=...
FWIW for SQL, sqlc[1] is probably the nicest SQL layer I've used in any language.
[1] https://github.com/kyleconroy/sqlc
I wrote a REST API / boilerplate reducing codegen tool for this exact reason for the Go Fiber framework. Maybe might help.
https://github.com/tompston/gomakeme
Why don't you try a pure-Go implementation? Should have enough features implemented for basic use
https://github.com/cvilsmeier/sqinn-go
The only popular crate that I can think of that still requires nigthly is Rocket[0], the 0.5 release does not but the lack of maintenance means that it has been three years since 0.4 and months since the last update. In that time Warp, Axum, Tide, Actix and many more frameworks that are all on stable has eaten its lunch. As for long compile times, I agree, it has gotten heaps better but is still far from Go (nor will it ever get that good), but for most my projects with incremental debug builds its in the ballpark of ~2 seconds, which is good enough for me and `cargo check` is close to instant. Release builds, yeah, they are slow.
[0]: https://github.com/SergioBenitez/Rocket
In Haskell: https://hackage.haskell.org/package/esqueleto
Either it analyzes the given SQL to determine the in/out types of each SQL query, or it calls the database describe feature at compile-time.