blog
jbig2dec
blog | jbig2dec | |
---|---|---|
2 | 2 | |
0 | 35 | |
- | - | |
5.3 | 2.8 | |
about 1 month ago | about 2 months ago | |
C | ||
- | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
blog
-
Ask HN: What rabbit hole(s) did you dive into recently?
Reverse engineering android apps. I wrote a bit about it in [0]. In the weekend I also started doing another one. It's interesting to see how these apps behave.
[0] https://github.com/benhamad/blog/blob/main/2024-04-12-dramal...
- Reverse engineering an illegal IPTV application on the Google Play Store
jbig2dec
-
Ask HN: What rabbit hole(s) did you dive into recently?
> The worst offender (so far) is the JBIG2 format (several major libraries, including jbig2dec), a very popular format that gets EXTREMELY high compression ratios on bilevel images of types typical to scanned pdfs. But: it's also a format that's pretty slow to decompress—not something you want in a UI loop, like a PDF reader is! And, there's no way around that—if you look at the hot loop, which is arithmetic coding, it's a mess of highly branchy code that's purely serial and cannot be thread- nor SIMD- parallelized.
Looking at the jbig2dec code, there appears to be some room for improvement. If my observations are correct, each segment has its own arithmetic decoder state, and thus can be decoded in its own thread. The main reader loop[1] is basically a state machine which attempts to load each segment in sequence[2], but it should not need to. The file has segment headers which contains the segments offsets and sizes. It should be possible to first read this header, then spawn N-threads to decode N-segment in parallel. Obviously, you don't want the threads competing for the file resource, so you could load each segment into its own buffer first, or mmap the whole file into memory.
[1]:https://github.com/ArtifexSoftware/jbig2dec/blob/master/jbig...
[2]:https://github.com/ArtifexSoftware/jbig2dec/blob/master/jbig...
-
MuPDF WASM Viewer Demo
I still haven't found an tolerably fast PDF reader, and I'm permanently miserable with that file format. The example in OP works great, but that's only an "easy, modern" PDF made up of text. There's still nothing adequate (mupdf/mutool included) for the common case of scanned-page PDF's.
The root problem isn't an easy performance fix: it's that a very popular PDF image compression format, JBIG2 [0], is unlike modern formats slow in decompression as well as compression. Here's the decompress hot loop [1,2] from libjbig2dec.so, which MuPDF calls out to. I isn't thread- or SIMD- parallelized, and I suspect that it isn't possible at all. There's just no easy way forwards in the near future—other than "buy faster CPU's".
[0] https://en.wikipedia.org/wiki/JBIG2
[1] https://github.com/ArtifexSoftware/jbig2dec/blob/master/jbig...
[2] https://github.com/ArtifexSoftware/jbig2dec/blob/master/jbig...