file-format

Top 23 file-format Open-Source Projects

  • Kaitai Struct

    Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby

  • Project mention: Reverse-engineering an encrypted IoT protocol | news.ycombinator.com | 2024-02-14
  • PSD.rb

    Parse Photoshop files in Ruby with ease

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • xit

    A plain-text file format for todos and check lists

  • Project mention: My productivity app is a never-ending .txt file | news.ycombinator.com | 2024-02-19

    I use the same system but with highlighting/formatting of https://xit.jotaen.net

    I even learn how to create a plugin for the IntelliJ IDEA and created one for highlighting this format (love idea hotkeys and workflow).

  • AudioFile

    A simple C++ library for reading and writing audio files.

  • kaitai_struct_formats

    Kaitai Struct: library of binary file formats (.ksy)

  • Project mention: Magika: AI powered fast and efficient file type identification | news.ycombinator.com | 2024-02-15
  • bindata

    BinData - Reading and Writing Binary Data in Ruby

  • klog

    Command line tool for time tracking in a human-readable, plain-text file format. (by jotaen)

  • Project mention: Ask HN: What apps have you created for your own use? | news.ycombinator.com | 2023-12-12

    I came up with a file format for time-tracking, which lets me store the data in plain-text files in a human-friendly notation. I also built a corresponding CLI tool for evaluating the files on the terminal.

    I’ve been using it almost daily for the past couple of years, and so far it has served me quite well.

    Project site / docs: https://klog.jotaen.net

    File spec: https://github.com/jotaen/klog/blob/main/Specification.md

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • rres

    A simple and easy-to-use file-format to package resources

  • matio

    MATLAB MAT File I/O Library

  • uproot5

    ROOT I/O in pure Python and NumPy.

  • Project mention: Potential of the Julia programming language for high energy physics computing | news.ycombinator.com | 2023-12-04

    > I wasn't proposing ROOT to be reimplemented in JS. That was what the GP attributed to me.

    Sorry for assuming that. I really felt the pain of thinking of possibility of combining two things I hate so much together (JS+ROOT)

    > "Laypeople" may also think that code is optimized to the last cycle in something like HEP simulations. It's made fast enough and the optimization is nowhere near the level of e.g. graphics heavy games.

    I understand that in other areas there might be more sophisticated optimizations, but does not change things much inside HEP field community. And it is not optimized only for simulations but for other things too. It is not one problem optimization.

    > Real-time usage like high frequency large data collection will probably never happen on the "single language". But I'd guess ROOT is not used at that level either? Also at least last time I checked, ROOT is moving to Python (probably not for the hottest loops of the simulation though).

    I did not mean to indicate that ROOT is being used to handle the online processing (In HEP terms). It is usually handled via optimized C++ compiled code. My idea is that you will probably never use JS or any interpreted language (or anything other than C++ to be pessimistic) for that. ROOT at the end of the day is much closer to C++ than anything else. So learning curve wouldn't be that much if you come with some C++ knowledge initially.

    > Also at least last time I checked, ROOT is moving to Python (probably not for the hottest loops of the simulation though).

    I think you mean PyROOT [1]? This is the official python ROOT interface It provides a set of Python bindings to the ROOT C++ libraries, allowing Python scripts to interact directly with ROOT classes and methods as if they were native Python. But that does not represent and re-writing. It makes things easier for end users who are doing analysis though, while be efficient in terms of performance, especially for operations that are heavily optimized in ROOT.

    There is also uproot [2] which is a purely Python-based reader and writer of ROOT files. It is not a part of the official ROOT project and does not depend on the ROOT libraries. Instead, uproot re-implements the I/O functionalities of ROOT in Python. However, it does not provide an interface to the full range of ROOT functionalities. It is particularly useful for integrating ROOT data into a Python-based data analysis pipeline, where libraries like NumPy, SciPy, Matplotlib, and Pandas ..etc are used.

    > Off-topic: C++ interpretation like done in ROOT seems like a really bad idea.)

    I will agree with you. But to be fair the purpose of ROOT is interactive data analysis but over the decades a lot of things gets added, and many experiments had their own soft forks and things started to get very messy quickly. So that there is no much inertia to fix problems and introduce improvements.

    [1] https://root.cern/manual/python/

    [2] https://github.com/scikit-hep/uproot5

  • libzim

    Reference implementation of the ZIM specification

  • Project mention: WikiReader | news.ycombinator.com | 2023-12-03

    I meant the Kiwix dump (https://download.kiwix.org/zim/wikipedia_en_all_nopic.zim – careful, 60GB!).

    At a first glance, the Wikimedia XML dump does not look substantially different from what Kiwix/ZIM does with compressed HTML: They're both compressed (bz2 for the Wikimedia dump, zstd or LZMA for Kiwix/ZIM), and both compress multiple files at once, so inter-file redundancy should hopefully be significantly reduced.

    HTML seems a bit more verbose than the Mediawiki syntax (plus the XML header for each article), but I'd be surprised if that actually accounted for a 3x difference in size.

    Then again, Kiwix seems to have experimented with shared dictionary brotli compression, which supposedly yields an >2x improvement: https://github.com/openzim/libzim/issues/144

    I wonder if their current zstd implementation also uses shared dictionaries. If not, that might just be the reason: If ZIM compression chunks are much smaller than the bz2 streams of the Wikimedia dumps, there would still be a lot of redundancy between chunks.

  • NTRGhidra

    A Nintendo DS binary loader for Ghidra

  • Project mention: Using an Ai like N-bref to speed up decompilation? | /r/REGames | 2023-12-06

    I'm not horribly familiar with how the whole "AI" thing works, but I'd imagine you'd have to have a dataset trained on the ARM9 architecture for it to work. You might be better off starting blind with NTRGhidra and just poking around to see how it works.

  • matroska-specification

    Matroska specification.

  • cgif

    GIF encoder written in C

  • lines-are-rusty

    Rust File API for the reMarkable tablet

  • odict

    A blazingly-fast, offline-first format and toolchain for lexical data 📖

  • fit2gpx

    A simple Python library for converting .FIT files to .GPX files. It also includes tools to convert Strava data downloads in bulk to GPX.

  • RSV-Specification

    Rows of String Values (RSV Data Format) Specification - A Simple Binary Alternative to CSV

  • Project mention: Show HN: Comma Separated Values (CSV) to Unicode Separated Values (USV) | news.ycombinator.com | 2024-03-12

    A similar concept that is (IMHO) much nicer: RSV

    It doesn't need any escaping or quoting: a field just has to be valid UTF-8.

    The trick is that the delimiters are bytes that are invalid UTF-8.

    The spec fits on a napkin, parsing is trivial, you can jump to the middle of a doc and find the nearest row, etc.

    Main downside is you need an editor/viewer that can handle it.

    https://github.com/Stenway/RSV-Specification

  • vach

    A simple archiving format, designed for storing assets in compact and secure containers

  • GuitarGame_ChartFormats

    A repository of documentation for chart files of guitar- or band-related rhythm games such as Guitar Hero and Rock Band

  • Project mention: Rhythm Games Resources | /r/gamedesign | 2023-06-19

    Some starting points: * Guitar Hero Charting Formats * osu! File Docs

  • mimesniffer

    A MIME type sniffer for Go.

  • JAPM

    Just Another PBO Manager: An Arma3 PBO Manager

  • hPDB

    PDB parser in Haskell

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

file-format related posts

Index

What are some of the best open-source file-format projects? This list will help you:

Project Stars
1 Kaitai Struct 3,828
2 PSD.rb 3,123
3 xit 1,029
4 AudioFile 890
5 kaitai_struct_formats 682
6 bindata 572
7 klog 515
8 rres 326
9 matio 324
10 uproot5 217
11 libzim 158
12 NTRGhidra 150
13 matroska-specification 120
14 cgif 101
15 lines-are-rusty 78
16 odict 78
17 fit2gpx 76
18 RSV-Specification 56
19 vach 51
20 GuitarGame_ChartFormats 38
21 mimesniffer 32
22 JAPM 27
23 hPDB 20

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com