criu
rr
criu | rr | |
---|---|---|
14 | 102 | |
2,663 | 8,665 | |
1.7% | 1.1% | |
8.9 | 9.6 | |
10 days ago | 4 days ago | |
C | C++ | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
criu
-
When "letting it crash" is not enough
Checkpoint/Restore I feel is a bigger concept than just saving state. At the zeroth level it's a system that can correctly stop and serialize a running process (as criu https://github.com/checkpoint-restore/criu has shown is a huge pain in the ass to still not be perfect) in a way that can initiated from within the process itself.
The 1st level more-work-but-easier way to do this is to build or use a heavily constrained VM/language you run from within your main application that doesn't allow for most of the hard problems to even exist.
I can't find any ready-made tools to do this that I wouldn't consider an endeavor.
- CRIU – Checkpoint/restore Linux tasks
-
Live Switching Pods to another Node on Resource Limits
That being said the Checkpoint Restore In Userspace project has been around for a number of years and is the closest thing to what you are talking about: taking a linux process on one machine and moving it to another. It is messy but can be done in some cases. There are folks looking at how to integrate CRIU with k8s but it’s all research at this point.
- Criu: Checkpoint/Restore Functionality for Linux
- checkpoint-restore/criu: Checkpoint/Restore tool
- checkpoint-restore/criu: Linux Checkpoint/Restore tool
-
The intersection of shadow stacks and CRIU
I would love to make more use of CRIU. E.g. I considered to use CRIU for my Python preloaded logic (https://github.com/albertz/python-preloaded). Unfortunately, at that point in time, CRIU must be used with root access, which was not an option. However, I see that the PR was merged now, so maybe it works now? (https://github.com/checkpoint-restore/criu/pull/1930)
There is also DMTCP (https://github.com/dmtcp/dmtcp/) but this might have other problems for my use case.
My solution was to use a fork server instead, which works almost equally well. There are not really much downsides with this approach. And this is actually quite simple, and also quite cross-platform (except Windows).
-
Python Preloaded
CRIU currently needs root access for dump/restore. However, there is ongoing work to support a non-root option in https://github.com/checkpoint-restore/criu/pull/1930.
-
How-to "freeze" a process to disk?
There have been multiple checkpointing attempts over the years. Criu is the only one I know of that's still kicking. That's probably your best and only bet.
- I made a plugin to suspend games and apps similar to how consoles do (Deck Suspender)
rr
- rr: Lightweight Recording and Deterministic Debugging
-
Hermit is a hermetic and reproducible sandbox for running programs
I think this tool must share a lot techniques and use cases with rr. I wonder how it compares in various aspects.
https://rr-project.org/
rr "sells" as a "reversible debugger", but it obviously needs the determinism for its record and replay to work, and AFAIK it employs similar techniques regarding system call interception and serializing on a single CPU. The reversible debugger aspect is built on periodic snapshotting on top of it and replaying from those snapshots, AFAIK. They package it in a gdb compatible interface.
Hermit also lists record/replay as a motivation, although it doesn't list reversible debugging in general.
- Rr: Lightweight Recording and Deterministic Debugging
-
Deep Bug
Interesting. Perhaps you can inspect the disassembly of the function in question when using Graal and HotSpot. It is likely related to that.
Another debugging technique we use for heisenbugs is to see if `rr` [1] can reproduce it. If it can then that's great as it allows you to go back in time to debug what may have caused the bug. But `rr` is often not great for concurrency bugs since it emulates a single-core machine. Though debugging a VM is generally a nightmare. What we desperately need is a debugger that can debug both the VM and the language running on top of it. Usually it's one or the other.
> In general I’d argue you haven’t fixed a bug unless you understand why it happened and why your fix worked, which makes this frustrating, since every indication is that the bug exists within proprietary code that is out of my reach.
Were you using Oracle GraalVM? GraalVM community edition is open source, so maybe it's worth checking if it is reproducible in that.
[1]: https://github.com/rr-debugger/rr
-
So you think you want to write a deterministic hypervisor?
https://rr-project.org/ had the same problem. They use the retired conditional branch counter instead of instruction counter, and then instruction steeping until at the correct address.
-
Is Something Bugging You?
That'll work great for your Distributed QSort Incorporated startup, where the only product is a sorting algorithm.
Formal software verification is very useful. But what can be usefully formalized is rather limited, and what can be formalized correctly in practice is even more limited. That means you need to restrict your scope to something sane and useful. As a result, in the real world running thousands of tests is practically useful. (Well, it depends on what those tests are; it's easy to write 1000s of tests that either test the same thing, or only test the things that will pass and not the things that would fail.) They are especially useful if running in a mode where the unexpected happens often, as it sounds like this system can do. (It's reminiscent of rr's chaos mode -- https://rr-project.org/ linking to https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo... )
-
When "letting it crash" is not enough
The approach of check-pointing computation such that it is resumable and restartable sounds similar to a time-traveling debugger, like rr or WinDbg:
https://rr-project.org/
https://learn.microsoft.com/windows-hardware/drivers/debugge...
- When I got started I debugged using printf() today I debug with print()
- Rr: Record and Replay Debugger – Reverse Debugger
-
OpenBSD KDE Plasma Desktop
https://github.com/rr-debugger/rr?tab=readme-ov-file#system-...
What are some alternatives?
nyrna - Suspend games and applications.
CodeLLDB - A native debugger extension for VSCode based on LLDB
FitM - FitM, the Fuzzer in the Middle, can fuzz client and server binaries at the same time using userspace snapshot-fuzzing and network emulation. It's fast and comparably easy to set up.
rrweb - record and replay the web
Regshot-Advanced - This is a fork of Regshot (original found at https://sourceforge.net/projects/regshot/) with very enhanced functionality.
gef - GEF (GDB Enhanced Features) - a modern experience for GDB with advanced debugging capabilities for exploit devs & reverse engineers on Linux
fpart - Sort files and pack them into partitions
Module Linker - browse modules by clicking directly on "import" statements on GitHub
DashLoader - Launch at the speed of light.
nbdev - Create delightful software with Jupyter Notebooks
nginx-link-function - It is a NGINX module that provides dynamic linking to your application in server context and call the function of your application in location directive
clog-cli - Generate beautiful changelogs from your Git commit history