bu
czkawka
bu | czkawka | |
---|---|---|
16 | 361 | |
52 | 17,595 | |
- | - | |
9.2 | 7.7 | |
6 days ago | 7 days ago | |
Nim | Rust | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
bu
-
Nim
I think Nim is great for small CLIs. Some examples are over at: https://github.com/c-blake/bu . To quantify "small", using tools themselves in bu/ (and Zsh *):
wc -l --total=never **.nim|cols 1|cstats ms q.05 q.95
-
fdupes: Identify or Delete Duplicate Files
200 lines of Nim [1] seems to run about 9X faster than the 8000 lines of C in fdupes on a little test dir I have. If you need C, I think jdupes [2] is faster as @TacticalCoder points out a couple of times here. In my testing, `dups` is usually faster than `jdupes`, though.
[1] https://github.com/c-blake/bu/blob/main/dups.nim
[2] https://github.com/jbruchon/jdupes
-
Things I've learned about building CLI tools in Python
You better off with using a compiled language.
If you interested in a language that's compiled, fast, but as easy and pleasant as Python - I'd recommend you take a look at [Nim](https://nim-lang.org).
And to prove what Nim's capable of - here's a cool repo with 100+ cli apps someone wrote in Nim: [c-blake/bu](https://github.com/c-blake/bu)
-
Removing Garbage Collection from the Rust Language (2013)
20 milliseconds? On my 7 year old Linux box, this little Nim program https://github.com/c-blake/bu/blob/main/wsz.nim runs to completion in 275 microseconds when fully statically linked with musl libc on Linux. That's with a stripped environment (with `env -i`). It takes more like 318 microseconds with my usual 54 environment variables. The program only does about 17 system calls, though.
Additionally, https://github.com/c-blake/cligen makes decent CLI tools a real breeze. If you like some of Go's qualities but the language seems too limited, you might like Nim: https://nim-lang.org. I generally find getting good performance much less of a challenge with Nim, but Nim is undeniably less well known with a smaller ecosystem and less corporate backing.
-
The Awk book’s 60-line version of Make
Often whole program generation in a prog.lang (& ecosystem!) that you already know can substitute for a new prog.lang. Python even has eval. You may be interested in: https://github.com/c-blake/bu/blob/main/doc/rp.md
You can actually get pretty far depending upon boundaries with the always implicit command-option language (when launched from the shell language, anyway). For example, Ben's example can be adapted to:
rp -m^\[A-Za-z\] 'echo nr," ",s[1]'
-
Learn GNU Awk with hundreds of examples and exercises
You might consider: https://github.com/c-blake/bu/blob/main/doc/cols.md
That's in Nim, though that may not be much a barrier. (There may also be other tools in bu/ of interest.)
-
GNU Parallel, where have you been all my life?
This sounds like a job for what standard C calls "popen". You can do `import posix; for line in popen("ls", "r"): echo line` in Nim, though you obviously need to replace `echo line` with other desired processing and learn how to do that.
You might also want to consider `rp` which is a program generator-compiler-runner along the lines of `awk` but with all the code just Nim snippets interpolated into a program template: https://github.com/c-blake/bu/blob/main/doc/rp.md . E.g.:
ls -l | rp -pimport\ stats -bvar\ r:RunningStat -wnf\>4 r.push\ 4.f -eecho\ r
-
The Bipolar Lisp Programmer
Nim is terse yet general and can be made even more so with effort. E.g., You can gin up a little framework that is even more terse than awk yet statically typed and trivially convertible to run much faster like https://github.com/c-blake/bu/blob/main/doc/rp.md
You can statically introspect code to then generate related/translated ASTs to create nearly frictionless helper facilities like https://github.com/c-blake/cligen .
You can do all of this without any real run-time speed sacrifices, depending upon the level of effort you put in / your expertise. Since it generates C/C++ or Javascript you get all the abilities of backend compilers almost out of the box, like profile-guided-optimization or for JS JIT compilation.
-
Ask HN: Why did Nim not catch-on like wild fire as Rust did?
I don't know about all your other questions, but the https://github.com/c-blake/cligen CLI framework seems much lower effort / ceremony than even Rust's `argh` and is just about as old as `clap` (both started 8 years ago in 2015).
There are over 50 CLI utilities at https://github.com/c-blake/bu, many of which do something novel rather than just "re-doing ls/find/cat with a twist". While they are really more an "ls/ps construction toolkits" with some default configs to get people going, I think https://github.com/c-blake/lc and https://github.com/c-blake/procs are nicer than Rust alternatives. I mention these since you seem interested in such tools.
-
Self Hosted SaaS Alternatives
You are welcome. Thanks are too rarely offered. :-)
You may also be interested in word stemming ( such as used by snowball stemmer in https://github.com/c-blake/nimsearch ) or other NLP techniques, but I don't know how internationalized/multi-lingual that stuff is, but conceptually you might want "series of stemmed words" to be the content fragments of interest.
Similarity scores have many applications. Weights on graph of cancelled downloads ranked by size might be one. :)
Of course, for your specific "truncation" problem, you might also be able to just do an edit distance against the much smaller filenames and compare data prefixes in files or use a SHA256 of a content-based first slice. ( There are edit distance algos in Nim in https://github.com/c-blake/cligen/blob/master/cligen/textUt.... as well as in https://github.com/c-blake/suggest ).
Or, you could do a little program like ndup/sh/ndup to create a "mirrored file tree" of such content-based slices then you could use any true duplicate-file finder (like https://github.com/c-blake/bu/blob/main/dups.nim) on the little signature system to identify duplicates and go from path suffixes in those clusters back to the main filesystem. Of course, a single KV store within one or two files would be more efficient than thousands of tiny files. There are many possibilities.
czkawka
- Is there software to compress large but similar files?
- Merge three separate partial libraries from external USB drives
-
Tools to deduplicate files
https://github.com/qarmin/czkawka by far the best of anything iv tried
-
fdupes: Identify or Delete Duplicate Files
I've used Czkawka (https://github.com/qarmin/czkawka) because it does Lanczos-based image duplicate detection, which makes it more practical for me.
-
AllDup suddenly taking forever to process/delete selections
Maybe it's a setting you made or the files, not sure. You can try another software czkawka to see if you get better results with it.
-
Is there a file duplicate finder that works with animated jpegxl-gif?
For static images i used https://github.com/qarmin/czkawka and it works well enough. I think. But when i used it on a folder with gifs and their jxl conversions, it shows nothing. SURELY this could not be user error, rrrright?
-
PhotoPrism: Browse Your Life in Pictures
I used to use DupeGuru which has some photo-specific dupe detection where you can fuzzy match image dupes based on content: https://dupeguru.voltaicideas.net/
But I switched over to czkawka, which has a better interface for comparing files, and seems to be a bit faster: https://github.com/qarmin/czkawka
Unfortunately, neither of these are integrated into Photoprism, so you still have to do some file management outside the database before importing.
I also haven't used Photoprism extensively yet (I think it's running on one of my boxes, but I haven't gotten around to setting it up), but I did find that it wasn't really built for file-based libraries. It's a little more heavyweight, but my research shows that Nextcloud Memories might be a better choice for me (it's not the first-party Nextcloud photos app, but another one put together by the community): https://apps.nextcloud.com/apps/memories
-
Please don't post like 20 similar images to the art sites?
Czkawka can do this.
-
I'm amazed how I find anything & why I have so many dupes!
There's always the well-respected tool, Czkawka. Or, of the CLI is your thing, jdupes is a good option.
- I saw a post regarding crate to delete similar files
What are some alternatives?
NimForUE - Nim plugin for UE5 with native performance, hot reloading and full interop that sits between C++ and Blueprints. This allows you to do common UE workflows like for example to extend any UE class in Nim and extending it again in Blueprint if you wish so without restarting the editor. The final aim is to be able to do in Nim what you can do in C++
dupeguru - Find duplicate files
Nim - Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
jdupes - A powerful duplicate file finder and an enhanced fork of 'fdupes'.
ordiri
fdupes - FDUPES is a program for identifying or deleting duplicate files residing within specified directories.
OffensiveNim - My experiments in weaponizing Nim (https://nim-lang.org/)
AntiDupl - A program to search similar and defect pictures on the disk
awesome-selfhosted - A list of Free Software network services and web applications which can be hosted on your own servers
PhotoPrism - AI-Powered Photos App for the Decentralized Web 🌈💎✨
core - OPNsense GUI, API and systems backend
darktable - darktable is an open source photography workflow application and raw developer