squashfs-tools-ng
llvm-project
squashfs-tools-ng | llvm-project | |
---|---|---|
7 | 349 | |
187 | 25,563 | |
- | 4.0% | |
8.0 | 10.0 | |
about 1 month ago | 5 days ago | |
C | C++ | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
squashfs-tools-ng
-
C Strings and my slow descent to madness
... except that that is also subtly broken.
It works if you write multiple UTF-8 code-units in one go, but breaks if you send them in several writes, or if you use the ANSI API (with the A suffix). Guess what the Windows implementation of stdio (printf and friends) does.
I already had some fun with this: https://github.com/AgentD/squashfs-tools-ng/issues/96#issuec...
And we didn't even discuss command line argument passing yet :-)
I tried to test it with the only other two languages I know besides English: German and Mandarin. Specifically also, because the later requires multi-byte characters to work. Getting this to work at all in a Windows terminal on an existing, German Windows 7 installation was an adventure on it's own.
Turns out, trying to write language agnostic command line applications on Windows is a PITA.
-
Getting the maximum of your C compiler, for security
IIRC fanalyzer is a fairly recent addition to gcc. Has it become reasonable usable yet?
I recall getting a bit excited when I first read about it, but the results I got where rather bizarre (e.g. every single function that allocated memory and returned a pointer to it was labeled as leaking memory).
It did the fun exercise myself once to riffle through the gcc manpage, cobble together warning flags and massage them into autoconf[1][2].
There is a very handy m4 script in the util-linux source for testing supported warning flags[3].
[1] https://git.infradead.org/mtd-utils.git/blob/HEAD:/configure...
[2] https://github.com/AgentD/squashfs-tools-ng/blob/master/conf...
[3] https://github.com/karelzak/util-linux/blob/master/m4/compil...
-
Squashfs turning 20, Squashfs tools 4.5 released
> Honestly I think you could be a little more respectful of the project that inspired yours.
I do. I had a lot of great "Huh? That's clever!" moments while reverse engineering the format and formed a mental image of a clearly brilliant programmer who managed to squeeze the last bits out of some data structures using really clever tricks that I myself probably wouldn't have come up with. During that time I gained a lot of respect for the project and the author.
Also, please don't forget: the whole project is the filesystem, the tools are just a part of that. I care about this project, which is why I decided to start this effort in the first place. Which I explicitly did not advertise as a replacement, but an augmentation (see [2]).
> I'd be angry too ... Definitely understandable.
Yes, I agree! And I can understand why in the heat of the moment you might write something angry and threatening. But certainly not if you've had a few weeks time to calm down and think things over.
> And you plagiarized part of his readme.
https://github.com/plougher/squashfs-tools/blob/master/RELEA...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
https://github.com/AgentD/squashfs-tools-ng/blob/master/READ...
Oh yes? Which part?
> ... calling it spaghetti code (which isn't immediately verifiable)
Here you go, have fun: https://github.com/plougher/squashfs-tools/blob/master/squas...
However, I cannot blame anyone here, I totally get how those things happen and have witnessed it myself in action:
You write a simple tool supporting a larger project. It's written by the seat of your pants without much planning, since it's not big and does one simple job. Then it gets used in production, eventually requirements change, other people pile on patches, but try to keep the diff small, so it's reviewable and it receives maybe a little less care than the actual project it supports. Nobody bothers to overhaul it or write documentation because, hey, it works, and any large changes might risk breaking things.
Even if nobody is to blame for it, the end result is still the same: an undocumented mess that is hard to wrap your head around if you aren't the original author, who is the only one with the bigger picture.
I tried for roughly a week to pull the code (there are some more files than this and some of the inter dependencies are nasty) apart into stacked utility libraries and a pure command line parsing front end, with the hopes to maybe get this upstream once it is done. I gave up and decided that at this point I understood enough about the format to start afresh and not touch what I believed to be an unmaintained mess.
-
The Byte Order Fiasco
FWIW there is an on various BSDs that contains "beXXtoh", "leXXtoh", "htobeXX", "htoleXX" where XX is a number of bits (16, 32, 64).
That header is also available on Linux, but glibc (and compatible libraries) put named it instead.
See: man 3 endian (https://linux.die.net/man/3/endian)
Of course it gets a bit hairier if the code is also supposed to run on other systems.
MacOS has OSSwapHostToLittleIntXX, OSSwapLittleToHostIntXX, OSSwapHostToBigIntXX and OSSwapBigToHostIntXX in .
I'm not sure if Windows has something similar, or if it even supports running on big endian machines (if you know, please tell).
My solution for achieving some portability currently entails cobbling together a "compat.h" header that defines macros for the MacOS functions and including the right headers. Something like this:
https://github.com/AgentD/squashfs-tools-ng/blob/master/incl...
This is usually my go-to-solution for working with low level on-disk or on-the-wire binary data structures that demand a specific endianness. In C I use "load/store" style functions that memcpy the data from a buffer into a struct instance and do the endian swapping. The copying is also necessary because the struct in the buffer may not have proper alignment.
In C++ code, all of this can of course be neatly stowed away in a special class with overloaded operators that transparently takes care of everything and "decays" into a single integer and exactly the above code after compilation, but is IMO somewhat cleaner to read and adds much needed type safety.
-
Tar is an ill-specified format
I once foolishly thought, I'll write a tar parser because, "how hard can it be" [1].
I simply tried to follow the tar(5) man page[2], and got a reference test set from another website posted previously on HN[3].
Along the way I discovered that NetBSD pax apparently cannot handle the PAX format[3] and my parser inadvertently uncovered that git-archive was doing the checksums wrong, but nobody noticed because other tar parsers were more lax about it[4].
As the article describes (as does the man page), tar is actually a really simple format, but there are just so many variants to choose from.
Turns out, if you strive for maximum compatibility, it's easiest to stick to what GNU tar does. If you think about it, IMO in many ways the GNU project ended up doing "embrace, extend, extinguish" with Unix.
[1] https://github.com/AgentD/squashfs-tools-ng/tree/master/lib/...
[2] https://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5
[3] https://mgorny.pl/articles/portability-of-tar-features.html
[4] https://www.spinics.net/lists/git/msg363049.html
-
LZ4, an Extremely Fast Compression Algorithm
A while ago I did some simplistic SquashFS pack/unpack benchmarks[1][2]. I was primarily interested in looking at the behavior of my thread-pool based packer, but as a side effect I got a comparison of compressor speed & ratios over the various available compressors for my Debian test image.
I must say that LZ4 definitely stands out for both compression and uncompression speed, while still being able to cut the data size in half, making it probably quite suitable for life filesystems and network protocols. Particularly interesting was also comparing Zstd and LZ4[3], the former being substantially slower, but at the same time achieving a compression ratio somewhere between zlib and xz, while beating both in time (in my benchmark at least).
[1] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...
[2] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...
[3] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...
llvm-project
-
Ask HN: Which books/resources to understand modern Assembler?
'Computer Architeture: A Quantitative Apporach" and/or more specific design types (mips, arm, etc) can be found under the Morgan Kaufmann Series in Computer Architeture and Design.
"Getting Started with LLVM Core Libraries: Get to Grips With Llvm Essentials and Use the Core Libraries to Build Advanced Tools "
"The Architecture of Open Source Applications (Volume 1) : LLVM" https://aosabook.org/en/v1/llvm.html
"Tourist Guide to LLVM source code" : https://blog.regehr.org/archives/1453
llvm home page : https://llvm.org/
llvm tutorial : https://llvm.org/docs/tutorial/
llvm reference : https://llvm.org/docs/LangRef.html
learn by examples : C source code to 'llvm' bitcode : https://stackoverflow.com/questions/9148890/how-to-make-clan...
-
Flang-new: How to force arrays to be allocated on the heap?
See
https://github.com/llvm/llvm-project/issues/88344
https://fortran-lang.discourse.group/t/flang-new-how-to-forc...
- The LLVM Compiler Infrastructure
-
Programming from Top to Bottom - Parsing
You can never mistake type_declaration with an identifier, otherwise the program will not work. Aside from that constraint, you are free to name them whatever you like, there is no one standard, and each parser has it own naming conventions, unless you are planning to use something like LLVM. If you are interested, you can see examples of naming in different language parsers in the AST Explorer.
-
Look ma, I wrote a new JIT compiler for PostgreSQL
> There is one way to make the LLVM JIT compiler more usable, but I fear it’s going to take years to be implemented: being able to cache and reuse compiled queries.
Actually, it's implemented in LLVM for years :) https://github.com/llvm/llvm-project/commit/a98546ebcd2a692e...
-
C++ Safety, in Context
> It's true, this was a CVE in Rust and not a CVE in C++, but only because C++ doesn't regard the issue as a problem at all. The problem definitely exists in C++, but it's not acknowledged as a problem, let alone fixed.
Can you find a link that substantiates your claim? You're throwing out some heavy accusations here that don't seem to match reality at all.
Case in point, this was fixed in both major C++ libraries:
https://github.com/gcc-mirror/gcc/commit/ebf6175464768983a2d...
https://github.com/llvm/llvm-project/commit/4f67a909902d8ab9...
So what C++ community refused to regard this as an issue and refused to fix it? Where is your supporting evidence for your claims?
-
Clang accepts MSVC arguments and targets Windows if its binary is named clang-cl
For everyone else looking for the magic in this almost 7k lines monster, look at line 6610 [1].
[1] https://github.com/llvm/llvm-project/blob/8ec28af8eaff5acd0d...
-
Rewrite the VP9 codec library in Rust
Through value tracking. It's actually LLVM that does this, GCC probably does it as well, so in theory explicit bounds checks in regular C code would also be removed by the compiler.
How it works exactly I don't know, and apparently it's so complex that it requires over 9000 lines of C++ to express:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Anal...
-
Fortran 2023
https://github.com/llvm/llvm-project/blob/main/flang/docs/F2...
-
MiniScript Ports
• Go • Rust • Lua • pure C (sans C++) • 6502 assembly • WebAssembly • compiler backends, like LLVM or Cranelift
What are some alternatives?
squashfs-tools - tools to create and extract Squashfs filesystems
zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
7-Zip-zstd - 7-Zip with support for Brotli, Fast-LZMA2, Lizard, LZ4, LZ5 and Zstandard
Lark - Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
dracut - dracut the event driven initramfs infrastructure
gcc
zfs - OpenZFS on Linux and FreeBSD
SDL - Simple Directmedia Layer
genext2fs - genext2fs - ext2 filesystem generator for embedded systems
cosmopolitan - build-once run-anywhere c library
zstd - Zstandard - Fast real-time compression algorithm
windmill - Open-source developer platform to turn scripts into workflows and UIs. Fastest workflow engine (5x vs Airflow). Open-source alternative to Airplane and Retool.