DataGristle
xsv
DataGristle | xsv | |
---|---|---|
5 | 64 | |
137 | 10,089 | |
- | - | |
0.0 | 0.0 | |
3 months ago | 2 months ago | |
Python | Rust | |
GNU General Public License v3.0 or later | The Unlicense |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DataGristle
- What are your weekend side projects?
- Instant data model from 1000s of unique files?
- Using Hashing to detect data changes in ELT
-
How do you sort a CSV file with several million rows?
DataGristle: this one contains some more unusual csv utilities, and what's in master includes the ability to sort by field names rather than offsets: https://github.com/kenfar/DataGristle
-
Open source contributions for a Data Engineer?
DataGristle by u/kenfar who influenced many of us in this sub.
xsv
-
Show HN: TextQuery – Query and Visualize Your CSV Data in Minutes
I realize it's not really that comparable since these tools don't support SQL, but a more fully functioned CLI tool is - https://github.com/BurntSushi/xsv
They are both fairly good
- Qsv: Efficient CSV CLI Toolkit
-
Joining CSV Data Without SQL: An IP Geolocation Use Case
I have done some similar, simpler data wrangling with xsv (https://github.com/BurntSushi/xsv) and jq. It could process my 800M rows in a couple of minutes (plus the time to read it out from the database =)
-
Qsv: CSVs sliced, diced and analyzed (fork of xsv)
xsv, which seems to be why qsv was created.
[1] https://github.com/BurntSushi/xsv/issues/267
-
I wrote this iCalendar (.ics) command-line utility to turn common calendar exports into more broadly compatible CSV files.
CSV utilities (still haven't pick a favorite one...): https://github.com/harelba/q https://github.com/BurntSushi/xsv https://github.com/wireservice/csvkit https://github.com/johnkerl/miller
- Icsp – Command-line iCalendar (.ics) to CSV parser
-
ripgrep is faster than {grep, ag, git grep, ucg, pt, sift}
$ git remote -v origin [email protected]:rust-lang/rust (fetch) origin [email protected]:rust-lang/rust (push) $ git rev-parse HEAD 3b0d4813ab461ec81eab8980bb884691c97c5a35 $ time grep -ri burntsushi ./ ./src/tools/cargotest/main.rs: repo: "https://github.com/BurntSushi/ripgrep", ./src/tools/cargotest/main.rs: repo: "https://github.com/BurntSushi/xsv", grep: ./target/debug/incremental/cargotest-2dvu4f2km9e91/s-gactj3ma2j-1b10l4z-2l60ur55ixe6n/query-cache.bin: binary file matches grep: ./target/debug/incremental/cargotest-38cpmhhbdgdyq/s-gactj3luwq-1o12vgp-t61hd8qdyp7t/query-cache.bin: binary file matches grep: ./target/debug/incremental/cargotest-17632op6djxne/s-gawuq5468i-1h69nfw-4gm0s8yhhiun/query-cache.bin: binary file matches grep: ./target/debug/incremental/cargotest-2trm4kt5yom3r/s-gawuq53qqg-bjiezj-lo0gha8ign8w/query-cache.bin: binary file matches grep: ./target/debug/deps/libregex_automata-c74a6d9fd0abd77b.rmeta: binary file matches grep: ./target/debug/deps/libsame_file-a0e0363a2985455d.rlib: binary file matches grep: ./target/debug/deps/libsame_file-a0e0363a2985455d.rmeta: binary file matches grep: ./target/debug/deps/libsame_file-7251d8d3586a319b.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaho_corasick-999a08e2b700420d.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/rustlib/x86_64-unknown-linux-gnu/lib/libregex_automata-0d168be5d25b3ac5.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libregex_automata-7d6bec0156f15da1.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libregex_automata-7d6bec0156f15da1.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-07dee4514b87d99b.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-07dee4514b87d99b.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-999a08e2b700420d.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libaho_corasick-999a08e2b700420d.rmeta: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libregex_automata-0d168be5d25b3ac5.rlib: binary file matches grep: ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/libregex_automata-0d168be5d25b3ac5.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libaho_corasick-992e1ba08ef83436.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libignore-54d41239d2761852.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libsame_file-9a5e3ddd89cfe599.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libregex_automata-8e700951c9869a66.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libignore-54d41239d2761852.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libaho_corasick-992e1ba08ef83436.rlib: binary file matches grep: ./build/bootstrap/debug/deps/libregex_automata-8e700951c9869a66.rmeta: binary file matches grep: ./build/bootstrap/debug/deps/libsame_file-9a5e3ddd89cfe599.rmeta: binary file matches real 16.683 user 15.793 sys 0.878 maxmem 8 MB faults 0
-
Any Linux admins willing to try Pygrep?
Unrelated, are you the same burntsushi that wrote xsv?
-
Analyzing multi-gigabyte JSON files locally
If it could be tabular in nature, maybe convert to sqlite3 so you can make use of indexing, or CSV to make use of high-performance tools like xsv or zsv (the latter of which I'm an author).
https://github.com/BurntSushi/xsv
https://github.com/liquidaty/zsv/blob/main/docs/csv_json_sql...
-
What monitoring tool do you use or recommend?
Oh and there's rad cli shit out there for CSV files too, like xsv
What are some alternatives?
Skytrax-Data-Warehouse - A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
csvtk - A cross-platform, efficient and practical CSV/TSV toolkit in Golang
soda-sql - Data profiling, testing, and monitoring for SQL accessible data.
miller - Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Prefect - The easiest way to build, run, and monitor data pipelines at scale.
ripgrep - ripgrep recursively searches directories for a regex pattern while respecting your gitignore
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
Servo - Servo, the embeddable, independent, memory-safe, modular, parallel web rendering engine
didact-engine - The REST API and execution engine for the Didact Platform.
Fractalide - Reusable Reproducible Composable Software
spark-rapids - Spark RAPIDS plugin - accelerate Apache Spark with GPUs
svgcleaner - svgcleaner could help you to clean up your SVG files from the unnecessary data.