vast
psychec
vast | psychec | |
---|---|---|
2 | 4 | |
335 | 496 | |
1.5% | - | |
9.9 | 7.3 | |
7 days ago | 5 days ago | |
C++ | C++ | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
vast
-
Print(“lol”) doubled the speed of my Go function
Most languages target C or LLVM, and C and LLVM have a fundamentally lossy compilation processes.
To get around this, you'd need a hodge podge of pre compiler directives, or take a completely different approach.
I found a cool project that uses a "Tower of IRs" that can restablish source to binary provenance, which, seems to me, to be on the right track:
https://github.com/trailofbits/vast
I'd definitely like to see the compilation processes be more transparent and easy to work with.
-
Compilers and IRS: LLVM IR, SPIR-V, and MLIR
At Trail of Bits, we are creating a new compiler front/middle end for Clang called VAST [1]. It consumes Clang ASTs and creates a high-level, information-rich MLIR dialect. Then, we progressively lower it through various other dialects, eventually down to the LLVM dialect in MLIR, which can be translated directly to MLIR.
Our goals with this pipeline are to enable static analyses that can choose the right abstraction level(s) for their goals, and using provenance, cross abstraction levels to relate results back to source code.
Neither Clang ASTs nor LLVM IR alone meet our needs for static analysis. Clang ASTs are too verbose and lack explicit representations for implicit behaviours in C++. LLVM IR isn't really "one IR," it's a two IRs (LLVM proper, and metadata), where LLVM proper is an unspecified family of dialects (-O0, -O1, -O2, -O3, then all the arch-specific stuff). LLVM IR also isn't easy to relate to source, even in the presence of maximal debug information. The Clang codegen process does ABI-specific lowering takes high-level types/values and transforms them to be more amenable to storing in target-cpu locations (e.g. registers). This actively works against relating information across levels; something that we want to solve with intermediate MLIR dialects.
Beyond our static analysis goals, I think an MLIR-based setup will be a key enabler of library-aware compiler optimizations. Right now, library-aware optimizations are challenging because Clang ASTs are hard to mutate, and by the time things are in LLVM IR, the abstraction boundaries provided by libraries are broken down by optimizations (e.g. inlining, specialization, folding), forcing optimization passes to reckon with the mechanics of how libraries are implemented.
We're very excited about MLIR, and we're pushing full steam ahead with VAST. MLIR is a technology that we can use to fix a lot of issues in Clang/LLVM that hinder really good static analysis.
[1] https://github.com/trailofbits/vast
psychec
-
The Jotai Benchmark Collection
We, at UFMG, have been working on a methodology to generate benchmarks in C. We have a working collection of benchmarks here with a bit more than 30K executable programs. Benchmarks are single functions mined from open-source repositories. We have designed a domain-specific language to generate inputs for them. We use psyche-c to infer missing types and declarations. We use kcc and AddressSanitizier to filter out as much undefined behavior as possible. We use CFGGrind to check input coverage and to count the number of instructions executed. These benchmarks can be used in many ways: to stress test compilers; to autotune predictive compilation tasks; to analyze the dynamic behavior of programs; to improve compiler optimizations; etc. We have a technical report here.
-
Getting AST of C source code programmatically!
Did you take a look at psyche-C? https://github.com/ltcmelo/psychec
- Psyche: A C front end for implementation of static analysis tools
- adding a C# Roslyn-like API as part of the rewrite of my C compiler frontend project
What are some alternatives?
clangir - A new (MLIR based) high-level IR for clang.
ImGuiColorTextEdit - Colorizing text editor for ImGui
GrayC - GrayC: Greybox Fuzzing of Compilers and Analysers for C
OpenMLDB - OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.
thorin2 - The Higher ORder INtermediate representation - next gen
ccache - ccache – a fast compiler cache
dfir-orc - Forensics artefact collection tool for systems running Microsoft Windows
timemory - Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
FFMpeg-Online - This repository catalogs a list of FFMpeg commands for different situations. By https://hotpot.ai.
color_coded - A vim plugin for libclang-based highlighting of C, C++, ObjC
exo - A process manager & log viewer for dev
jotai-benchmarks - Collection of executable benchmarks