easyjevko.js
spyql
easyjevko.js | spyql | |
---|---|---|
7 | 23 | |
0 | 902 | |
- | - | |
10.0 | 0.0 | |
over 1 year ago | over 1 year ago | |
JavaScript | Jupyter Notebook | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
easyjevko.js
- Jc – JSONifies the output of many CLI tools
-
Jevko: a minimal general-purpose syntax
Short answer: in https://github.com/jevko/easyjevko.js a thing like [ my text ] is converted to a JS string " my text " -- all whitespace is preserved.
Responding to some points I left off here https://news.ycombinator.com/item?id=33336789
I guess the main one is this:
> If your audience is people like me, I think it would probably be worthwhile for you to spend some time up front describing the intended semantics of a data model, as I've attempted above, rather than leaving people to infer it from the grammar. (Maybe OCaml is not a good way to explain it, though.) You might also want to specify that leading and trailing whitespace in prefixes is not significant, though it is in the suffix ("body"); this would enable people to format their name-value pairs readably without corrupting the data. As far as I can tell, this addendum wouldn't interfere with any of your existing uses for Jevko, though in some cases it would simplify their implementations.
You're right, things should be explained more clearly (TODO). Especially the exact role of Jevko and treatment of whitespace. I'll try to improve that.
Here is a sketch of an explanation.
Plain Jevko is meant to be a low-level syntactic layer.
It takes care of turning a unicode sequence into a tree.
On this level, all whitespace is preserved in the tree.
To represent key-value pairs and other data, you most likely want another layer above Jevko -- this would be a Jevko-based format, such as queryjevko (somewhat explained below) or, a very similar one, easyjevko, implemented and very lightly documented here: https://github.com/jevko/easyjevko.js
Or you could have a markup format, such as https://github.com/jevko/markup-experiments#asttoxml5
This format layer defines certain restrictions which may make a subset of Jevkos invalid in it.
It also specifies how to interpret the valid Jevkos. This includes the treatment of whitespace, e.g. that a leading or trailing whitespace in prefixes is insignificant, but conditionally significant in suffixes, etc.
Different formats will define different restrictions and interpretations.
For example:
# queryjevko
queryjevko is a format which uses (a variant of) Jevko as a syntax. Only a subset of Jevko is valid queryjevko.
> I think this is a more useful level of abstraction, and it's more or less the level used by, for example, queryjevko.js's jevkoToJs, although that erroneously uses () instead of [].
The `()` are used on purpose -- queryjevko is meant to be used in URL query strings and be readable. If square brackets were used, things like JS' encodeURIComponent would escape them, making the string unreadable. Using `()` solves that. "~" is used instead of "`" for the same reason. So technically we are dealing not with a spec-compliant Jevko, but a trivial variant of it. Maybe I should write a meta-spec which allows one to pick the three special characters before instantiating itself into a spec. Anyway the parser implementation is configurable in that regard, so I simply configure it to use "~()" instead of "`[]".
> (Also, contrary to your assertion above that this is an example of "leaving [Jevko's data model] as-is", it forgets the order of the name-value pairs as well as I guess all but one of any duplicate set of fields with the same name and also the possibility that there could be both fields and a body.)
I meant [whitespace] rather than [Jevko's data model].
Again, queryjevko is a format which uses Jevko as an underlying syntax. It specifies how syntax trees are converted to JS values, by restricting the range of valid Jevkos. It also specifies conversion in the opposite direction, likewise placing restrictions on JS values that can be safely converted to queryjevko.
The order of name-value pairs happens to get preserved (because of the way JS works), but that's not necessarily relevant. If I were to write a cross-language spec for queryjevko, I'd probably specify that this shouldn't be relied upon.
Duplicate fields and Jevkos with both fields and a non-whitespace body will produce an error when converting Jevko->JS.
I hope this clarifies things somewhat.
Lastly, I'll respond to this for completeness:
> (By the way, if you want to attribute your JSON example for copyright reasons, you need to attribute it to its author or authors, not to the Wikipedia, which is just the site they posted it on.)
According to this:
https://en.wikipedia.org/wiki/Wikipedia:Reusing_Wikipedia_co...
there are 3 options, one of them being what I did, which is to include a link.
I think that's all.
Have a good one!
spyql
-
Fq: Jq for Binary Formats
I prefer a SQL-like format. It’s not as complete but it cover most of the day-to-day use cases. Take a look at https://github.com/dcmoura/spyql (I am the author). Congrats on fq!
-
Command-line data analytics made easy with SPyQL
SPyQL documentation: spyql.readthedocs.io
-
This Week In Python
spyql – Query data on the command line with SQL-like SELECTs powered by Python expressions
- Command-line data analytics made easy
-
Jc – JSONifies the output of many CLI tools
This is great!
I am the author of SPyQL [1]. Combining JC with SPyQL you can easily query the json output and run python commands on top of it from the command-line :-) You can do aggregations and so forth in a much simpler and intuitive way than with jq.
I just wrote a blogpost [2] that illustrates it. It is more focused on CSV, but the commands would be the same if you were working with JSON.
[1] https://github.com/dcmoura/spyql
- The fastest command-line tools for querying large JSON datasets
-
Working with more than 10gb csv
You can import the data into a PostgreSQL/MySQL/SQLite/... database and then query the database. However, even with the right choice of indexes, it might take a while to run queries on a table with hundreds of millions of records. You can easily import your data to these databases with SpyQL: $ spyql "SELECT * FROM csv TO sql(table=my_table_name) | sqlite3 my.db" (you would need to create the table my_table_name before running the command).
-
ClickHouse Cloud is now in Public Beta
https://github.com/dcmoura/spyql/blob/master/notebooks/json_...
And ClickHouse looks like a normal relational database - there is no need for multiple components for different tiers (like in Druid), no need for manual partitioning into "daily", "hourly" tables (like you do in Spark and Bigquery), no need for lambda architecture... It's refreshing how something can be both simple and fast.
- A SQLite extension for reading large files line-by-line
-
I want to convert a large JSON file into Tabular Format.
I thought this library was pretty nifty for json. It's also relatively fast compared to most json parsers: https://github.com/dcmoura/spyql
What are some alternatives?
easyjevko.lua - An Easy Jevko library for Lua.
prql - PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
yapl - YAml Programming Language
malloy - Malloy is an experimental language for describing data relationships and transformations.
binary-experiments - Experiments with various binary formats based on Jevko.
tresql - Shorthand SQL/JDBC wrapper language, providing nested results as JSON and more
community - Features Jevko-related things created by various authors
Preql - An interpreted relational query language that compiles to SQL.
prosto - Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
pxi - 🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
partiql-lang-kotlin - PartiQL libraries and tools in Kotlin.
octosql - OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.