archive-program vs wuffs

archive-program

The GitHub Archive Program & Arctic Code Vault (by github)

Suggest topics

Source Code

Suggest alternative

Edit details

wuffs

Wrangling Untrusted File Formats Safely (by google)

Parsing memory-safety programming-language Codec

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

archive-program		wuffs
	Project
8	Mentions	84
2,995	Stars	4,022
-0.1%	Growth	1.9%
0.0	Activity	9.4
2 months ago	Latest Commit	16 days ago
	Language	C
-	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

archive-program

Posts with mentions or reviews of archive-program. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-08.

Artic Code Vault
2 projects | news.ycombinator.com | 8 Sep 2023
In practice, cool URLs can become inaccessible even if they don't change
1 project | news.ycombinator.com | 1 Jul 2023

If you ever end up in the distant future, go to Svalbard and look for the Arctic World Archive. They have microfilm copies of a huge amount of data. They have Wikipedia pages in microfilm format, so all you need is a magnifying glass to get started. You can then look for the Github Code Vault slides that explain how to restart technology from scratch and run the code in the git repository archives.
https://github.com/github/archive-program/blob/master/GUIDE....
https://github.com/github/archive-program/blob/master/TheTec...
https://arcticworldarchive.org/
Will historians thousands of years from now have a significantly harder time studying us because we no longer store any information on stone tablets? Like if the Sumerians stored the Epic of Gilgamesh on the latest SSD we would know a lot less.
2 projects | /r/AskHistorians | 1 Jan 2023

According to Github:
Checked C
14 projects | news.ycombinator.com | 21 Dec 2022

> But why not for instance use a build system in some "container"?
I am not sure how this helps.
> I think the project could "bother" contributors with something like that, couldn't it?
Which project?
> An embedded C developer I've talked with quite often on some other forum, who imho is quite competent, said that Coverity is a poor tool that generates way too much false negatives and overlooks at the same time glaring issues.
He likely violated a license agreement with Coverity, since no one is allowed to say anything comparing Coverity to anything else.
> Said that's mostly an issue with all OpenSource tools for static C analysis.
I have been filing bug reports.
> OTOH the commercial ones are very expensive usually, with a target market of critical things like aviation of safety systems in cars and military use, places where they spend billions on projects. Nothing there for the average company, and especially not for (frankly often underfunded) OpenSource projects.
So you understand my pain.
> CodeQL? It's mostly an semantic search and replace tool, as I know? Is it that helpful? (I had a look, but the projects I'm working on don't require it. One would just use the IDE. No need for super large-scale refactorings, across projects, in our case).
I have never heard about this function. It is a static analyzer whose checks are written in the CodeQL language. However, it is very immature. When github acquired it, they banished the less reliable checks to the extended-and-security suite, leaving it only with about ~50 checks for C/C++ code. Those catch very little, although in the rare instances that they do catch things, the catches are somewhat amazing. Unfortunately, at least one of those checks provides technically correct, yet difficult to understand, explanations of the problem, so most developers would dismiss its reports as false positives despite it being correct:
https://github.com/github/codeql/issues/11744
There are probably more issues like that, but I have yet to see and report them.
> SonarCloud, hmm… This one I've used (around web development though). But am not a fan of. It bundles other "scanner" tools, with varying quality and utility. At least what they had for the languages I've actively used it was mostly about "style issues". And when it showed real errors, the IDE would do the same… (The question then is how this could be committed in the first place. But OK, some people just don't care. For them you need additional checks like SonarCloud I guess.)
It is supposed to be able to integrate into github's code scanning feature, so any newly detected issues are reported in the PR that generated them. Anyway, it is something that I am considering. I wanted to use it much sooner, but it required authorization to make changes to github on my behalf, which made me cautious about the manner in which I try it. It is basically at the bottom of my todo list right now.
> Wouldn't it be easy to add at least this to the build by using some "build container"?
I do not understand your question. To use it, we need a few things:
1. To be able to show any newly introduced defect reports in the PR that generated them shortly after it was filed.
2. To be able to scan the kernel modules since right now, it cannot due to a bad interaction between the build system and how compiler interposition is done. As of a few days ago, I have a bunch of hacks locally that enable kernel module scans, but this needs more work.
> Well, that's why I think something equivalent to `-Wall -Werror` should be switched on before writing the first line of code, in any language.
OpenZFS has had that in place for more than a decade. I do not know precisely when it was first used (although I could look if anyone is particularly interested), but my guess is 2008 when ZFSOnLinux started. Perhaps it was done at Sun before then, but both events predate me. I became involved in 2012 and it is amazing to think that I am now considered one of the early OpenZFS contributors.
Interestingly, the earliest commits in the OpenZFS repository referencing static analysis are from 2009 (with the oldest commit being from 2008 when ZFSOnLinux started). Those commits are ports of changes from OpenSolaris based on defect reports made by Coverity. There would be no more commits mentioning static analysis until 2014 when I wrote patches fixing things reported by Clang's static analyzer. Coverity was (re)introduced in 2016.
As far as the current OpenZFS repository is concerned, knowledge of static analysis died with OpenSolaris and we lost an entire form of QA until we rediscovered it during attempts to improve QA years later.
> But I guess I will stay with engraving my data into solid rock. Proven for at least hundred thousand years.
That method is no longer reliable due to acid rain. You would need to bury it in a tomb to protect it from acid rain. That has the pesky problem of the pointers being lost over time.
> At least someone needs to preserve the cat pictures and meme of our current human era for the cockroach people of the distant future. I'm not sure they will have a compatible Linux kernel and compiler available to build the ZFS drivers, or even punch card readers…
Github's code vault found a solution for that:
https://github.com/github/archive-program/blob/master/GUIDE....
I vaguely recall another effort trying to include the needed hardware in time capsules, but I could be misremembering.
Maybe a Weird Request.
1 project | /r/embedded | 6 Mar 2022

For long(er) therm you could check out the GitHub Arctic Code Vault.
LTO Tape data storage for Linux nerds
4 projects | news.ycombinator.com | 27 Jan 2022
Artic Code Vault Guide
1 project | news.ycombinator.com | 21 Feb 2021

wuffs

Posts with mentions or reviews of wuffs. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-04.

Still no love for JPEG XL: Browser maker love-in snubs next-gen image format
7 projects | news.ycombinator.com | 4 Feb 2024

Maybe this is what you are looking for:
https://github.com/google/wuffs
"Wuffs is a memory-safe programming language (and a standard library written in that language) for Wrangling Untrusted File Formats Safely."
4-year campaign backdoored iPhones using possibly the most advanced exploit
1 project | news.ycombinator.com | 27 Dec 2023

It could author its format parsers in https://github.com/google/wuffs, and make them BSD-like open source to maximize adoption.
An even bigger change: It could allow users to choose their iMessage client freely. Why not open up the protocol? I’m sure a security focused client would be popular and in the grand scheme of things easy to author.
Perhaps they could open up more of the OS and apps. Perhaps their claims about the security of users and the App Store is kind of BS.
Just about every Windows/Linux device vulnerable to new LogoFAIL firmware attack
4 projects | news.ycombinator.com | 6 Dec 2023

This is one of the reasons I'm a big fan of wuffs[0] - it specifically targets dealing with formats like pictures, safely, and the result drops in to a C codebase to make the compat/migration story easy.
[0] https://github.com/google/wuffs
Google assigns a CVE for libwebp and gives it a 10.0 score
5 projects | news.ycombinator.com | 26 Sep 2023

There are already huffman-decoding and some parts of webp algorithms in https://github.com/google/wuffs (language that finds missing bounds checks during compilations). In contrary, according to readme, this language allows to write more optimized code (compared to C). WEBP decoding is stated as a midterm target in the roadmap.
The WebP 0day
6 projects | news.ycombinator.com | 21 Sep 2023

Specifically, since performance is crucial for this type of work, it should be written in WUFFS. WUFFS doesn't emit bounds checks (as Java does and as Rust would where it's unclear why something should be in bounds at runtime) it just rejects programs where it can't see why the indexes are in-bounds.
https://github.com/google/wuffs
You can explicitly write the same checks and meet this requirement, but chances are since you believe you're producing a high performance piece of software which doesn't need checks you'll instead be pulled up by the fact the WUFFS tooling won't accept your code and discover you got it wrong.
This is weaker than full blown formal verification, but not for the purpose we care about in program safety, thus a big improvement on humans writing LGTM.
What If OpenDocument Used SQLite?
8 projects | news.ycombinator.com | 18 Sep 2023

> parsing encoded files tends to introduce vulnerabilities
If we are talking about binary formats, now there are systematic solutions like https://github.com/google/wuffs that protect against vulnerabilities. But SQLite is not just a format - it's an evolving ecosystem with constantly added features. And the most prominent issue was not even in core, it was in FTS3. What will SQLite add next? More json-related functions? Maybe BSON? It is useful, but does not help in this situation.
Regarding traces, there are many forensics tools and even books about forensic analysis of SQLite databases. In well-designed format such tools should not exist in the first place. This is hard requirement: if it requires rewriting the whole file - then so be it.
CVE-2023-4863: Heap buffer overflow in WebP (Chrome)
18 projects | news.ycombinator.com | 12 Sep 2023

I agree that Wuffs [1] would have been a very good alternative! If it can be made more generally. AFAIK Wuffs is still very limited, in particular it never allows dynamic allocation. Many formats, including those supported by Wuffs the library, need dynamic allocation, so Wuffs code has to be glued with unverified non-Wuffs code [2]. This only works with simpler formats.
[1] https://github.com/google/wuffs/blob/main/doc/wuffs-the-lang...
[2] https://github.com/google/wuffs/blob/main/doc/note/memory-sa...
NSO Group iPhone Zero-Click, Zero-Day Exploit Captured in the Wild
3 projects | news.ycombinator.com | 7 Sep 2023

There are efforts to do that, notably https://github.com/google/wuffs
RLBox is another interesting option that lets you sandbox C/C++ code.
I think the main reason is that security is one of those things that people don't care about until it is too late to change. They get to the point of having a fast PDF library in C++ that has all the features. Then they realise that they should have written it in a safer language but by that point it means a complete rewrite.
The same reason not enough people use Bazel. By the time most people realise they need it, you've already implemented a huge build system using Make or whatever.
Ask HN: Wuffs Examples for Text Files?
1 project | news.ycombinator.com | 22 May 2023

I finally have time to try out wuffs (https://github.com/google/wuffs), which I first heard about here on HN. I want to develop a low-level tokenizer for SDF files, a small-molecule structure file format which started in the 1970s, with lots of, let's call it 'heritage'. Wuffs' ability to process near the data, with a coroutine-like interface, seems like a good fit.
I got the "hello-wuffs-c" example to work, which took some tinkering (see wuffs issue #24). That reads a single string and returns an unsigned int. Despite looking at the example implementations for json parsing, I can't figure out how to go from that example to something which handles multiple input buffer blocks, with string tokens that might straddle two buffers.
Nor could I find third-party examples of people using wuffs-the-language beyond basic experimentation for simple binary data. The handful of non-trivial examples I found only used wuffs-the-library, as a vendored component in a larger project.
The lack of wuffs-the-language use after several years seems a strong sign that I shouldn't look to wuffs for my project. Given the 'workarounds' in #24 are still present after 3 years, it doesn't even seem that widely internally at Google.
Does anyone here have experience to share, or pointers to related projects?
FaaS in Go with WASM, WASI and Rust
5 projects | news.ycombinator.com | 7 May 2023

Here's an off-topic answer.
Depends on what you want your toy language to do and what sort of runtime support you'd like to lean on.
JVM is pretty good for a lot of script-y languages, does impose overhead of having a JVM around. Provides GC, Threads, Reflection, consistent semantics. Tons of tools, libraries, support.
WebAssembly is constrained (for running-in-a-browser safety reasons) but then you get to run your code in a browser, or as a service, etc, and Other People are working hard on the problem of getting your WA to go fast. That used to be a big reason for using JVM, but it turns out that Security Is Darn Hard.
I have used C in the (distant) past as an IL, and that works up to a point, implementing garbage collection can be a pain if that's a thing that you want. C compilers have had a lot of work on them over the years, and you also have access to some low-level stuff, so if you were E.G. trying to come up with a little language that had super-good performance, C might be a good choice. (See also, [Wuffs](https://github.com/google/wuffs), by Nigel Tao et al at Google).
A suggestion, if you do target C -- don't work too hard to find isomorphisms between C's data structures and YourToyLang's data structures. Back around 1990, I did my C-generating compiler for Modula-3, and a friend at Xerox PARC used C as a target for Cedar Mesa, and Hans used it in a lower-level way (so I was mapping between M-3 records and C structs, for example, Hans was not) and the lower-level way worked better -- i.e., I chose poorly. It worked, but lower-level worked better.
If you are targeting a higher-level language, Rust and Go both seem like interesting options to me. Both have the disadvantage that they are still changing slightly but you get interesting "services" from the underlying VM -- for Rust, the borrow checker, plus libraries, for Go, reflection, goroutines, and the GC, plus libraries.
Rust should get you slightly higher performance, but I'd worry that you couldn't hide the existence of the borrow checker from your toy language, especially if you wanted to interact with Rust libraries from YTL. If you wanted to learn something vaguely publishable/wider-interesting, that question right there ("can I compile a TL to Rust, touch the Rust libraries, and not expose the borrow checker? No+what-I-tried/Yes+this-worked") is not bad.
I have a minor conflict of interest suggesting Go; I work on Go, usually on the compiler, and machine-generated code makes great test data. But regarded as a VM, I am a little puzzled why it hasn't seen wider use, because the GC is great (for lower-allocation rates than Java however; JVM GC has higher throughout efficiency, but Go has tagless objects, interior pointer support, and tiny pause times. Go-the-language makes it pretty easy to allocate less.) Things Go-as-a-VM currently lacks:
- tail call elimination (JVM same)

What are some alternatives?

When comparing archive-program and wuffs you can also consider the following projects:

ltfs - Reference implementation of the LTFS format Spec for stand alone tape drive

png-decoder - A pure-Rust, no_std compatible PNG decoder

noplate - generic data structures

stb - stb single-file public domain libraries for C/C++

CodeHawk-C - CodeHawk C Analyzer: sound static analysis of memory safety (undefined behavior)

csharplang - The official repo for the design of the C# programming language

ikos - Static analyzer for C/C++ based on the theory of Abstract Interpretation.

image-png - PNG decoding and encoding library in pure Rust

codeql - CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security

highway - Performance-portable, length-agnostic SIMD with runtime dispatch

c2nim - c2nim is a tool to translate Ansi C code to Nim. The output is human-readable Nim code that is meant to be tweaked by hand before and after the translation process.

kandria - A post-apocalyptic actionRPG. Now on Steam!