dud
rupy
dud | rupy | |
---|---|---|
14 | 31 | |
166 | 136 | |
- | - | |
6.0 | 1.1 | |
5 days ago | about 1 year ago | |
Go | Java | |
BSD 3-clause "New" or "Revised" License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dud
-
Ask HN: How do your ML teams version datasets and models?
I've used DVC in the past and generally liked its approach. That said, I wholeheartedly agree that it's clunky. It does a lot of things implicitly, which can make it hard to reason about. It was also extremely slow for medium-sized dataset (low 10s of GBs).
In response, I created a command-line tool that addresses these issues[0]. To reduce the comparison to an analogy: Dud : DVC :: Flask : Django.
[0]: https://github.com/kevin-hanselman/dud
-
π πΎ Oxen.ai - Blazing Fast Unstructured Data Version Control, built in Rust
There is also https://github.com/kevin-hanselman/dud
- Data Version Control
-
Tup β an instrumenting file-based build system
I very much agree with you about DVC's feature creep. The other issue I have with it is speed. DVC has left me scratching my head at its sluggishness many times. Because of these factors, I've been working on an alternative that focuses on simplicity and speed[0]. My tool is often five to ten times faster than DVC[1]. I'd love to hear what you think.
[0]: https://github.com/kevin-hanselman/dud
[1]: https://kevin-hanselman.github.io/dud/benchmarks/
-
Non-Obvious Docker Uses
I don't know about replacing Make with Docker, but I use the two together to good effect. One of my favorite hacks is adding a 'docker-%' rule in my Makefile to run make commands in a Docker image[1]. It's a bit mind-bending, and there's a few gotchas, but it works surprisingly well for simple rules.
[1]: https://github.com/kevin-hanselman/dud/blob/e98de8fcdf7ad564...
-
Git-annex β Managing large files with Git
Thanks for sharing your experience. It's non-trivial and surprising behavior like this that drove me to build a custom system[0] myself. When I started researching version control tools for large files, I remember feeling like git-annex and Git LFS were awkwardly bolted onto Git; Git simply wasn't designed for large files. Then I found DVC[1], and its approach rang true for me. However, after using DVC for a year or so, I grew tired of DVC's many puzzling behaviors (most of which are outlined in the README at [0]). In the end, I built the tool I wanted for the job -- one that is exceptionally simple and fast.
[0]: https://github.com/kevin-hanselman/dud
- Alternative to Git LFS or DVC
- Show HN: A small and simple alternative to Git LFS or DVC
- Dud: a lightweight tool for versioning data alongside source code and building data pipelines.
- Dud: a tool for versioning data alongside source code. A faster and simpler alternative to DVC.
rupy
-
Considerations for a long-running Raspberry Pi
I have been running a Raspberry 2 cluster for 10 years: http://host.rupy.se
A few weeks back the first SD card to fail got so corrupted it failed to reboot!
My key learning is use oversized cards, because then the bitcycle will wear slower!
I'm going from 32GB to 256/512/1024!
-
What Kind of Asynchronous Is Right for You?
How this article does not mention SSE, comet or chunking escapes me.
What does their definition of event-driven really look like in practice.
Nobody has a clue.
Here is the ideal event driven system, it's async-to-async: https://github.com/tinspin/rupy/wiki/Fuse
The example is not working because I had to shut down the services for multiple reasons, but the high level of it is that you use 4 (potentially different) threads to do one request/response middle man transaction.
That way you have _zero_ io-wait or idling. I'm surprised nobody has copied this approach since I invented it 10 years ago. I understand why though you need your entire chain to be async and that means rewriting everything and that is a big risk when it's hard to debug.
But if you succeed you can build something that is 10x perf/watt than all other implementations. Which is going to be important when interest rates go higher and crash our entire industry.
-
An unknown Swedish startupβs β¬3B bid to build a green rival to AWS
The hardware is peaking.
So software is where you can make the difference: http://host.rupy.se
- Sandstorm: Open-source platform for self-hosting web app
-
You Want Modules, Not Microservices
I think we're all confused over the definition. Also one might understand what all the proponents are talking about better if they think about this more as a process and not some technological solution:
https://github.com/tinspin/rupy/wiki/Process
All input I have is you want your code to run on many machines, in fact you want it to run the same on all machines you need to deliver and preferably more. Vertically and horizontally at the same time, so your services only call localhost but in many separate places.
This in turn mandates a distributed database. And later you discover it has to be capable of async-to-async = no blocking ever anywhere in the whole solution.
The way I do this is I hot-deploy my applications async. to all servers in the cluster, this is what a cluster node looks like in practice (the name next to Host: is the node): http://host.rupy.se if you click "api & metrics" you'll see the services.
With this not only do you get scalability, but also redundancy and development is maintained at live coding levels.
-
I wish my web server were in the corner of my room
I have hosted my own web server both physically and codevise since 2014.
It's on a Raspberry 2 cluster:
http://host.rupy.se
Since 2016 i have my own database also coded from scratch:
http://root.rupy.se
We need to implement HTTP/1.1 with less bloat, a C non-blocking web server that can share memory between threads is probably the most interesting project for humans right now, is anyone working on that?
-
Ask HN: Free and open source distributed database written in C++ or C
I have one in Java: https://github.com/tinspin/rupy
Here is the 2000 lines of code of the entire database: http://root.rupy.se/code?path=/Root.java
And here you can try it out: http://root.rupy.se
-
Dokku β Free Heroku Alternative
The smallest PaaS you have ever seen is one order of magnitude larger than mine: https://github.com/tinspin/rupy
And I bet you the same goes for performance, if not two!
-
Server-Sent Events: the alternative to WebSockets you should be using
The data is here: http://fuse.rupy.se/about.html
Under Performance. Per watt the fuse/rupy platform completely crushes all competition because of 2 reasons:
- Event driven protocol design, averages at about 4 messages/player/second (means you cannot do spraying or headshots f.ex. which is another feature in my game design opinion).
- Java's memory model with atomic concurrency which needs a VM and GC (C++ copied that memory model in C++11, but it failed completely because they lack both VM and GC, but that model is still to this day the one C++ uses), you can read more about this here: https://github.com/tinspin/rupy/wiki
You can argue those points are bad arguments, but if you look at performance per watt with some consideration for developer friendlyness, I'm pretty sure in 100 years we will still be coding minimalist JavaSE on the server and vanilla C (compiled with C++ compiler) on the client.
- Jodd β The Unbearable Lightness of Java
What are some alternatives?
dvc - π¦ ML Experiments and Data Management with Git
huproxy
scalar - Scalar: A set of tools and extensions for Git to allow very large monorepos to run on Git without a virtualization layer
cmdg - Command line Gmail client
docker-merge - Docker images as git repositories, so you can merge them.
Nullboard - Nullboard is a minimalist kanban board, focused on compactness and readability.
Task - A task runner / simpler Make alternative written in Go
cakephp-swagger-bake - Automatically generate OpenAPI, Swagger, and Redoc documentation from your existing CakePHP code.
oxen-release - Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.
dbmate - :rocket: A lightweight, framework-agnostic database migration tool.
pachyderm - Data-Centric Pipelines and Data Versioning
Aerospike - Aerospike Database Server β flash-optimized, in-memory, nosql database