Our great sponsors
-
usv
Unicode Separated Values (USV) data markup for units, records, groups, files, streaming, and more.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
extism-csv-to-usv
Run the `csv-to-usv` Rust crate library function from any of 15+ Extism supported languages: https://github.com/extism/extism
-
RSV-Specification
Rows of String Values (RSV Data Format) Specification - A Simple Binary Alternative to CSV
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
They cover the reasoning for using the control picture characters instead of the control characters in the FAQ:
"We tried using the control characters, and also tried configuring various editors to show the control characters by rendering the control picture characters.
First, we encountered many difficulties with editor configurations, attempting to make each editor treat the invisible zero-width characters by rendering with the visible letter-width characters.
Second, we encountered problems with copy/paste functionality, where it often didn't work because the editor implementations and terminal implementations copied visible letter-width characters, not the underlying invisible zero-width characters.
Third, users were unable to distinguish between the rendered control picture characters (e.g. the editor saw ASCII 31 and rendered Unicode Unit Separator) versus the control picture characters being in the data content (e.g. someone actually typed Unicode Unit Separator into the data content)."
- https://github.com/SixArm/usv/tree/main/doc/faq#why-use-cont...
CSV is great because excel can import it, but it can't import USV, so at that point, why use USV when you can use JSON?
https://github.com/tyleradams/json-toolkit/
const extism = await import("https://esm.sh/@extism/extism");
If you would like to run csv-to-usv from 15+ languages (not only rust!) then check out this demo I made, converting the library to an Extism plugin function: https://github.com/extism/extism-csv-to-usv
Here's a snippet that runs it in your browser:
// Simple example to run this in your browser! But will work in Go, PHP, Ruby, Java, Python, etc...
A similar concept that is (IMHO) much nicer: RSV
It doesn't need any escaping or quoting: a field just has to be valid UTF-8.
The trick is that the delimiters are bytes that are invalid UTF-8.
The spec fits on a napkin, parsing is trivial, you can jump to the middle of a doc and find the nearest row, etc.
Main downside is you need an editor/viewer that can handle it.
https://github.com/Stenway/RSV-Specification
I wrote one of the most popular translators for MLLP, which converts it to HTTP [1].
---
P.S. Ironically, HL7 messages have something literally called a "field separator" but don't use the field separator character, usually they use vertical bar.
[1] https://github.com/rivethealth/mllp-http
Funnily enough, I published a Python library two days ago that uses emojis to indicate where certain non-msgpackable builtin types have been forced into msgpackable objects: https://github.com/umarbutler/persist-cache/blob/main/src/pe...
is used for tuples, for sets, for frozen sets, for pickles, for bytes and for bytearrays.
I thought it was pretty ingenious but clearly I’m not the only one to think of it.