ideas4
minisketch
ideas4 | minisketch | |
---|---|---|
26 | 10 | |
89 | 301 | |
- | - | |
4.6 | 6.3 | |
7 months ago | 25 days ago | |
C++ | ||
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ideas4
-
WTF is going on with R7RS Large?
https://github.com/samsquire/ideas4#334-knowledgegraph-progr...
-
Async rust – are we doing it all wrong?
How would you do control flow and scheduling and parallelism and async efficiently with this code?
`db.save()`, `download()` are IO intensive whereas `document.query("a")` and `parse` is CPU intensive.
I think its work diagram looks like this: https://github.com/samsquire/dream-programming-language/blob...
I've tried to design a multithreaded architecture that is scalable which combines lightweight threads + thread pools for work + control threads for IO epoll or liburing loops:
Here's the high level diagram:
https://github.com/samsquire/ideas5/blob/main/NonblockingRun...
The secret is modelling control flow as a data flow problem and having a simple but efficient scheduler.
I wrote about schedulers here and binpacking work into time:
https://github.com/samsquire/ideas4#196-binpacking-work-into...
I also have a 1:M:N lightweight thread scheduler/multiplexer:
https://github.com/samsquire/preemptible-thread
-
It Took Me a Decade to Find the Perfect Personal Website Stack – Ghost+Fathom
My blogging/journalling setup is simple.
I just use GitHub. I just rely on the default repository view on GitHub.com
I create a README.md and add markdown headings to the bottom or to the top (bottom if its a journal, top if it's a blog) and then when I get to 100-800 I create a new repository and repeat.
https://github.com/samsquire/ideas (2013)
https://github.com/samsquire/ideas4
https://github.com/samsquire/ideas3
https://github.com/samsquire/ideas2
-
Ask HN: Could you show your personal blog here?
Thanks for posting this Ask HN question.
I journal ideas and thoughts about computers and software. I am interested in software architecture, parallelism, async, coroutines, database internals, programming language implementation, software design and the web.
https://github.com/samsquire/ideas (2013)
https://github.com/samsquire/ideas2
https://github.com/samsquire/ideas3
https://github.com/samsquire/ideas4 <-- this is recent but needs editing
https://github.com/samsquire/ideas5 <-- this is what I'm working on now
https://github.com/samsquire/startups
https://github.com/samsquire/blog <-- thoughts I want to write about, but incomplete
I use README.md on GitHub and create a heading at the bottom for each entry. I use Typora on Windows or the GitHub web interface to edit.
-
Our Plan for Python 3.13
My deep interest is multithreaded code. For a software engineer working on business software, I'm not sure if they should be spending too much time debugging multithreaded bugs because they are operating at the wrong level of abstraction from my perspective for business operations.
I'm looking for an approach to writing concurrent code with parallelism that is elegant and easy to understand and hard to introduce bugs. This requires alternative programming approaches and in my perspective, alternative notations.
One such design uses monotonic state machines which can only move in one direction. I've designed a syntax and written a parser and very toy runtime for the notation.
https://github.com/samsquire/ideas5#56-stateful-circle-progr...
https://github.com/samsquire/ideas4#558-state-machine-formul...
The idea is inspired by LMAX Disruptor and queuing systems.
-
io_uring support for libuv – 8x increase in throughput
This is really good. Thank you!
I've been studying how to create an asynchronous runtime that works across threads. My goal: neither CPU and IO bound work slow down event loops.
I've only written two Rust programs but in Rust you presumably you can use Rayon (CPU scheduling) and Tokio (IO scheduling)
I wrote about using the LMAX Disruptor ringbuffer pattern between threads.
https://github.com/samsquire/ideas4#51-rewrite-synchronous-c...
I am designing a state machine formulation syntax that is thread safe and parallelises effectively. It looks like EBNF syntax or a bash pipeline. Parallel steps go in curly brackets. There is an implied interthread ringbuffer between pipes.
states = state1 | {state1a state1b state1c} {state2a state2b state2d} | state3
-
What Is Type-Level Programming?
This is very interesting and could lead to some futuristic programming technology.
I kind of want to plot the state space of a program to see all available states.
In my exploration of distributed systems, microservices and multithreaded systems, it is extremely helpful to try and see what potential states the system can be in. Global and local reasoning of these kinds of software is rather difficult.
I've written about value tracing but I've not heard of treating values as types. I would love to be able to see the trajectory of a value through different states.
https://github.com/samsquire/ideas4#571-value-calculus-varia...
I've never written a TLA+ specification and I'm a complete beginner to this space but I've been trying to understand the dining philosophers one. TLA+ Toolbox is aware of discrete states in the state space, which is absolutely awesome. Types can inform us about future possible valid states.
I began writing a visualisation of memory and animated the movement of memory around to try reveal patterns.
https://replit.com/@Chronological/ProgrammingRTS#index.html
If we see types or values as positions, we can create animations of the state space unfolding in front of us. This is the dream.
-
Late Architecture with Functional Programming
Great comment!
>I think late architecture is orthogonal to functional, imperative
Absolutely. From a truly architectural view, procedural, functional, and method-oriented (current OO) are really only variations on the call/return architectural style. Good and sometimes important distinctions, but not really that far apart. They are very much about computing, results from inputs. That is an appropriate architecture for fewer and fewer programs.
See Why Architecture Oriented Programming matters
https://blog.metaobject.com/2019/02/why-architecture-oriente...
and
Can Programmers Escape the Gentle Tyranny of call/return?
https://2020.programming-conference.org/details/salon-2020-p...
> its solution is higher level than even functional programming
Yes. Well, functional actually gets most of its utility from being lower level as far as paradigms go (less powerful). But yes.
> and more abstract
No. Well, yes, if expressed with current programming languages. But that's part of the problem set, not part of the solution set. We should be able to express our architectures less abstractly, more concretely, but for that we need linguistic support. Which is why I am working on that:
http://objective.st
> I want software architecture to be cheap and easy to change without breaking any existing behaviours. I don't know much research on this subject.
There was quite a bit of research at CMU, for example on packaging mismatch. Famous paper Architectural Mismatch, Why Reuse is so hard, and the 10 year follow up in 2009: Architectural Mismatch: Why Reuse is Still So Hard
https://repository.upenn.edu/cgi/viewcontent.cgi?article=107...*
Not much has changed since.
> https://github.com/samsquire/ideas4
> https://devops-pipeline.com
Will check those out. Dataflow is definitely a big part of it, with the extension of dataflow constraints (make, spreadsheets, "FRP"/"Rx"). But so is in-process REST with Storage Combinators!
And breaking down barriers between scripting and "real" programming.
-
Service Mesh Use Cases
Thanks for this.
I have never deployed a server mesh or used one but I am designing something similar at the code layer. It is designed to route between server components. That is, at the architecture between threads in a multithreaded system.
The problem I want to solve is that I want architecture to be trivially easy to change with minimal code changes. This is the promise and allure of enterprise service buses and messaging queues.
I have managed RabbitMQ and I didn't enjoy it.
If I want a system that can scale up and down and that multiples of any system object can be introduced or removed without drastic rewrites.
I would like to decouple bottleneck from code and turn it into runtime configuration.
My understanding of things such as Traefik and istio is that they are frustrating to set up.
Specifically I am working on designing interthread communication patterns for multithreaded software.
How do you design an architecture that is easy to change, scales and is flexible?
I am thinking of a message routing definition format that is extremely flexible and allows any topology to be created.
https://github.com/samsquire/ideas4#526-multiplexing-setting...
I think there is application of the same pattern to the network layer too.
Each communication event has associated with it an environment of keyvalues that look similar to this:
petsserver1
-
Release engineering is exhausting so here's cargo-dist
Thanks for remembering me :-)
I would like things to run locally by default and then deployed to the cloud where they run.
Should be easier to debug problems if I can get the code to my machine and investigate issues with tools that my computer has such as "strace", "perf" and debug logging that I liberally apply to the build script.
In production we would have log aggregation and log search (such as ELK stack) and it is a good habit to get into the perspective of debugging production via tooling.
But CICD feels before that tooling in the pipeline. You could wire up your CICD to log to ELK but I would prefer local deployable software.
I think my focus on automating things means I want to be capable of seeing how the thing works without relying on a deployed black box in the cloud and using assumptions of how it works rather than direct investigation.
One of my journal entries is almost a lamentation of all the things that need to be done to release and use software.
This is that entry:
https://github.com/samsquire/ideas4#5-permanent-softwareplat...
I wonder if software could be deployed more like a URL that has all the information to configure a virtual machine. Docker over URL or something.
minisketch
-
Invertible Bloom Lookup Tables with Less Randomness and Memory
Anyone interested in IBLT with low failure probablity should also be aware of pinsketch and, particularly, our implementation of it: minisketch ( https://github.com/sipa/minisketch/ ).
Our implementation communicates a difference of N b-bit entries with exactly N*b bits with 100% success. The cost for this communications efficiency and reliability is that the decoder takes CPU time quadratic in N, instead of IBLT's linear decoder. However, when N is usually small, if the implementation is fast this can be fine -- especially since you wouldn't normally want to use set recon unless you were communications limited.
Pinsketches and iblt can also be combined-- one can use pinsketches as the cells of an iblt and one can also use a small pinsketch to improve the failure rate of an iblt (since when a correctly sized IBLT fails, it's usually just due to a single undecodable cycle).
- Minisketch: an optimized library for BCH-based set reconciliation
-
Peer-to-Peer Encrypted Messaging
Since the protocol appears to use adhoc synchronization, the authors might be interested in https://github.com/sipa/minisketch/ which is a library that implements a data structure (pinsketch) that allows two parties to synchronize their sets of m b-bit elements which differ by c entries using only b*c bits. A naive protocol would use m*b bits instead, which is potentially much larger.
I'd guess that under normal usage the message densities probably don't justify such efficient means-- we developed this library for use in bitcoin targeting rates on the order of a dozen new messages per second and where every participant has many peers with potentially differing sets--, but it's still probably worth being aware of. The pinsketch is always equal or more efficient than a naive approach, but may not be worth the complexity.
The somewhat better known IBLT data structure has constant overheads that make it less efficient than even naive synchronization until the set differences are fairly large (particular when the element hashes are small); so some applications that evaluated and eschewed IBLT might find pinsketch applicable.
-
Ask HN: What are some 'cool' but obscure data structures you know about?
I love the set reconciliation structures like the IBLT (Iterative Bloom Lookup Table) and BCH set digests like minisketch.
https://github.com/sipa/minisketch
Lets say you have a set of a billion items. Someone else has mostly the same set but they differ by 10 items. These let you exchange messages that would fit in one UDP packet to reconcile the sets.
-
Here is how Ethereum COULD scale without increasing centralisation and without depending on layer two's.
Sipa is working on a better version of that for a while. The technical term is a "set reconciliation protocol", but Bitcoin Core been doing a more basic version of this for a while. Note that the "BCH" there isn't the same as Bcash
-
ish: Sketches for Zig
I'd also have to say that Zig is a pretty neat library for this. In order to implement PBS I needed the MiniSketch-library (written in C/C++) and I'll have to say that integrating with it has been a breeze. Some fiddling in build.zig so that I can avoid Makefile, and after that everything has worked amazingly.
-
The Pinecone Overlay Network
Networks that need to constrain themselves to limited typologies to avoid traffic magnification do so at the expense of robustness, especially against active attackers that grind their identifiers to gain privileged positions.
Maybe this is a space where efficient reconciliation ( https://github.com/sipa/minisketch/ ) could help-- certainly if the goal were to flood messages to participants reconciliation can give almost optimal communication without compromising robustness.
- Is it any easier to find A, B such that sha256(A) ^ sha256(B) = sha256(C)?
What are some alternatives?
preemptible-thread - How to preempt threads in user space
wormhole-william-mobile - End-to-end encrypted file transfer for Android and iOS. A Magic Wormhole Mobile client.
ideas2 - Another 85+ Ideas for Computing https://samsquire.github.io/ideas2/
ctrie-java - Java implementation of a concurrent trie
wg-async - Working group dedicated to improving the foundations of Async I/O in Rust
t-digest - A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means
ideas - a hundred ideas for computing - a record of ideas - https://samsquire.github.io/ideas/
tries-T9-Prediction - Its artificial intelligence algorithm of T9 mobile
saddle-data-graph - where does it come from, where does it go?
sdsl-lite - Succinct Data Structure Library 2.0
periphery - A tool to identify unused code in Swift projects.
ann-benchmarks - Benchmarks of approximate nearest neighbor libraries in Python