Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
> A single IR with multiple passes is a good way to build a compiler
https://mlir.llvm.org/, which is using, is largely claiming the opposite. Most passes more naturally are not "a -> a", but "a -> b". data structures and data structures work hand in hand, it is very nice to produce "evidence" for what is done in the output data structure.
This is why https://cakeml.org/, which "can't cheat" with partial functions, has so many IRs!
Using just a single IR was historically done for cost-control, the idea being that having many IRs was a disaster in repetitive boilerplate. MLIR seeks to solve that exact problem!
One of the most surprising things I learned about "clang" was how relatively poor the "libClang" capabilities are.
I wanted to write a codegen tool that would auto-generate bindings for C++ code, and it turns out that "libTooling" is the only reasonable way to get access to the proper info you need from C++.
Another alternative is "libClangSharp", from Tanner Gooding who works on C# at Microsoft.
https://github.com/dotnet/ClangSharp
Have you seen https://github.com/RosettaCommons/binder ?
python aside, having gone down this rabbithole, and still not infrequently revisiting said rabbithole, I don't believe using *clang like this a winning strategy. Because of the number of corner cases there are in eg C++17, you will end reimplementing effectively all of the "middle-end" (the parts that lower to llvm) for your target language. At that point you're building bindings anymore but a whole-ass transpiler. Binder fails to be complete in the way.
My current theory is to try "synthesize" bindings from the llvm ir (a much smaller representational surface). Problems abound here too (ABI).