parquet-format
mdBook
parquet-format | mdBook | |
---|---|---|
4 | 101 | |
1,655 | 16,802 | |
1.8% | 2.2% | |
7.2 | 8.6 | |
5 days ago | 9 days ago | |
Thrift | Rust | |
Apache License 2.0 | Mozilla Public License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
parquet-format
-
Summing columns in remote Parquet files using DuckDB
Right, there's all sorts of metadata and often stats included in any parquet file: https://github.com/apache/parquet-format#file-format
The offsets of said metadata are well-defined (i.e. in the footer) so for S3 / blob storage so long as you can efficiently request a range of bytes you can pull the metadata without having to read all the data.
- FLaNK Stack for 4th of July
-
I have question related to Parquet files and AWS Glue
As i read here https://github.com/apache/parquet-format/blob/master/LogicalTypes.md , they are store in Integer formats and these integers represent the number of days (for Date) or number of milliseconds, microseconds or nanoseconds (for DateTime) since 1970-01-01. This works as expected with the parquet file that written by our ETL tool from internal database --> S3, all Data/DateTime columns are Integers, means that in Glue Job, i have to convert these Integers back to Date/Datetime value to do some transformation on them. But when parquet files are written by Spark, they are Date/DateTime (or TimeStamp to be more concise) format not Integers (i checked by read these files again in other Glue Job) and that make me confused.
-
Parquet: More than just “Turbo CSV”
Date is confusing with a timezone (UTC or otherwise) and the doco makes no such suggestion.
The Parquet datatypes documentation is pretty clear that there is a flag isAdjustedToUTC to define if the timestamp should be interpreted as having Instant semantics or Local semantics.
https://github.com/apache/parquet-format/blob/master/Logical...
Still no option to include a TZ offset in the data (so the same datum can be interpreted with both Local and Instant semantics) but not bad really.
mdBook
- Everything Curl
- Doks – Build a Docs Site
-
Ask HN: How do you organize software documentation at work?
I'm responsible for a number of Java products. I try to provide high-quality Javadoc for all public library interfaces, library user's guides where appropriate, and development guides for applications. The latter two take the form of MDBook documents (https://rust-lang.github.io/mdBook/), with the document source living in the GitHub repo so that it's tied to the particular software release in a natural way.
-
Outline: Self hostable, realtime, Markdown compatible knowledge base
My org has used mdBook: https://rust-lang.github.io/mdBook/ (That link is itself a rendered mdBook, so that'll give you an idea of the feature set.)
(While it's definitely a Rust "thing", if you just have a set of .md files, all you need is a "SUMMARY.md" (which contains the ToC) and a small config file; i.e., you don't have to have any Rust code to use it, and it works fine without. We document a large, mostly non-Rust codebase with it.)
-
Ask HN: Best tools for self-authoring books in 2023?
If you want the lowest friction, open source, easily extensible Markdown to Web, Kindle, PDF, etc. tool, highly recommend mdBook: https://github.com/rust-lang/mdBook it’s written in Rust, but you don’t have to know any Rust to use it. And then wing is all CSS; for which there are many good (free) themes.
- Early performance results from the prototype CHERI ARM Morello microarchitecture
- FLaNK Stack for 4th of July
- MdBook – A command line tool to create books with Markdown
- MdBook Create book from Markdown files. Like Gitbook but implemented in Rust
What are some alternatives?
rapidgzip - Gzip Decompression and Random Access for Modern Multi-Core Machines
gitbook - The open source frontend for GitBook doc sites
xgen - Salesforce open-source LLMs with 8k sequence length.
MkDocs - Project documentation with Markdown.
wizmap - Explore and interpret large embeddings in your browser with interactive visualization! 📍
Wiki.js - Wiki.js | A modern and powerful wiki app built on Node.js
FastSAM - Fast Segment Anything
bookdown - Authoring Books and Technical Documents with R Markdown
background-removal-js - Remove backgrounds from images directly in the browser environment with ease and no additional costs or privacy concerns. Explore an interactive demo.
obsidian-releases - Community plugins list, theme list, and releases of Obsidian.
graphic-walker - An open source alternative to Tableau. Embeddable visual analytic
Docusaurus - Easy to maintain open source documentation websites.