Kaitai Struct
Camelot
Our great sponsors
Kaitai Struct | Camelot | |
---|---|---|
44 | 10 | |
3,828 | 2,631 | |
1.6% | 3.1% | |
7.5 | 6.9 | |
8 days ago | 21 days ago | |
Shell | Python | |
GPL-3.0-or-later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Kaitai Struct
- Reverse-engineering an encrypted IoT protocol
-
Parsing an Undocumented File Format
- ImHex [2], which has a pattern language [3] which allows parsing, and it seems more powerful than what Kaitai offers. I stumbled upon some limitations with it but it was still useful.
[1]: https://kaitai.io/
- Kaitai Struct – a declarative language used to describe binary data structures
-
HTTPie Desktop: cross-platform API testing client for humans
Beautiful. Didn't know something like this exists. Reminds me of Katai[0]
[0]. https://kaitai.io/
-
Hacking the LG Monitor's EDID
An EDID override like this would be helpful for macOS as well, where the monitors swapping around after standby is a real annoyance [0] [1]
EDID rewrites are 99% of the time blocked by the monitor firmware: https://notes.alinpanaitiu.com/Decoding-monitor-EDID-on-macO...
By the way, one helpful tool that helped me navigate the EDID dump was Kaitai Struct [2]. It shows a side by side view with the hex view and the EDID structure, and it highlights the hex values in real time as you navigate the structure. Unfortunately [3] it doesn't support the extension blocks that the author needs.
[0] https://notes.alinpanaitiu.com/Weird-monitor-bugs
[1] https://forums.macrumors.com/threads/external-displays-swapp...
[2] https://kaitai.io/
[3] https://github.com/kaitai-io/edid.ksy
- Kaitai Struct: new way to develop parsers for binary structures
-
Fq: Jq for Binary Formats
Kaitai Struct might be a good choice for that: https://kaitai.io/
-
Ingesting, parsing and making sense of device log data
For binary log format, there's the excellent Kaitai Struct frameworks, that make it very easy to generate parsers from a declarative schema
-
What is this tool? More info in comments
kaitai
-
Visual Programming with Elixir: Learning to Write Binary Parsers (2019)
https://kaitai.io/
Worth a look if you are writing binary parsers.
Camelot
-
Show HN: How do you OCR on a Mac using the CLI or just Python for free
I had good repeated success extracting tables from PDFs using Camelot (Python, https://github.com/camelot-dev/camelot)
-
How to query the table part of PDF
I found this today and it is working well https://github.com/camelot-dev/camelot
-
Camelot: DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.
here is the corresponding bug report in git: https://github.com/camelot-dev/camelot/issues/339
-
HELP! Data Prep from PDF file having Tabular Data?
Try https://github.com/camelot-dev/camelot. It seems like this should work on your case.
-
Camelot VS ExtractTable-py - a user suggested alternative
2 projects | 2 Feb 2022
- Need help with indexing pdf tables in python
-
exporting handwritten dataset as text, export it and use it as a csv
Yeah, I’m pretty sure the Remarkable OCR is not up to these kinds of tasks unfortunately. If you know some coding you could write something that’d likely work well in Python using for ex. this for receiving the mail attachment and this for converting the PDF to CSV. This is in case you’d write your data as a table on the Remarkable, which I guess is preferable to writing something like (0.5, 8.4, -0.3). If you’d rather do it that way, there are other more suitable OCR tools like this one. The checkbox use-case in the comment above would also be possible by modifying this approach. DM if you’d like to discuss further work.
- Camelot: PDF Table Extraction for Humans
-
Show HN: I made a tool to convert images of tables to CSV
Looks like it's a bit in-progress: https://github.com/camelot-dev/camelot/pull/209
"Update docs" isn't checked, and that's what I was going on.
What are some alternatives?
Protobuf - Protocol Buffers - Google's data interchange format
image-table-ocr - Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.
csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.
PyPDF2 - A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
tablib - Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
PDFMiner - Python PDF Parser (Not actively maintained). Check out pdfminer.six.
pdftabextract - A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
rizin - UNIX-like reverse engineering framework and command-line toolset.
pytesseract - A Python wrapper for Google Tesseract
PyYAML
WeasyPrint - The awesome document factory