awesome-sre
aperture
awesome-sre | aperture | |
---|---|---|
12 | 28 | |
11,526 | 590 | |
- | 1.7% | |
0.0 | 9.8 | |
5 months ago | 5 days ago | |
Go | ||
Creative Commons Zero v1.0 Universal | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
awesome-sre
-
24 GitHub repos with 372M views that you can't miss out as a software engineer
A curated list of Site Reliability and Production Engineering resources: https://github.com/dastergon/awesome-sre
-
5 GitHub Projects to Help You Become a Better DevOps Engineer ⚡
5. Awesome Site Reliability Engineering
- New to Devops: recomandări
-
SRE tools?
I found this “awesome” repo https://github.com/dastergon/awesome-sre that links to this other “awesome” repo about tools specifically that may be of interest https://github.com/SquadcastHub/awesome-sre-tools
- Splunk just for monitoring? Better / cheaper solution?
- A curated list of SRE resources
- Awesome SRE
- Best learning path for SRE/DevOps?
- Learn from the best - a curated list of 30+ articles on how top companies like Uber, Netflix are thinking about observability, monitoring and site reliability
- From SysAdmin to SRE
aperture
-
Defcon: Meta's system for preventing overload with graceful feature degradation
Anyone interested in load shedding and graceful degradation with request prioritization should check out the Aperture OSS project.
https://github.com/fluxninja/aperture
-
Queues Don't Fix Overload
I agree that queues can problem especially when misconfigured. But some amount of queuing is necessary, to absorb short spikes in demand vs capacity. Also, queues can be helpful to re-order requests based on criticality which won't be possible with zero queue size - in which case we have to immediately drop a request or admit it without considering it's priority.
I think it is beneficial to re-think how we tune queues. Instead of setting a queue size, we should be tuning the max permissible latency in the queue which is what a request timeout actually is. That way, you stay within the acceptable response time SLA while keeping only the serve-able requests in the queue.
Aperture, an open-source load management platform took this approach. Each request specifies a timeout for which it is willing to stay in the queue. And weighted fair queuing scheduler then allocates the capacity (a request quota or max number of in-flight request) across requests based on the priority and tokens (request heaviness) of each request.
Read more about the WFQ scheduler in Aperture: https://docs.fluxninja.com/concepts/scheduler
Link to Aperture's GitHub: https://github.com/fluxninja/aperture
Would love to hear your thoughts on our approach!
-
Kelsey Hightower's Twitter Spaces on Rate Limits & Flow Control
For those keen to dive deeper, I highly recommend exploring both the Twitter Space and Aperture: [Twitter Spaces]: https://twitter.com/kelseyhightower/status/1689355284802629633?s=20 [GitHub repo]: https://github.com/fluxninja/aperture
-
Graceful Behavior at Capacity
Very interesting blog post! Our team has been working intensively in this area for the last couple of years - flow control, load shedding, controllability (PID control), and so on.
We have open-sourced our work at - https://github.com/fluxninja/aperture
We would love feedback from folks reading this blog post!
Disclaimer: I am one of the co-authors of the Aperture project. There are several interesting ideas we have built into this project and I will be happy to dive into the technical details as well.
-
Why Adaptive Rate Limiting Is a Game-Changer
It's a blog on an open-source project that precisely tells you how to implement adaptive rate limiting.
Just click around a bit:
- https://github.com/fluxninja/aperture
- https://docs.fluxninja.com/use-cases/adaptive-service-protec...
Note: I am one of the authors' of this project.
-
Show HN: Review GitHub PRs with AI/LLMs
At the time of writing, the first sample image on that page is this:
https://coderabbit.ai/assets/section-1-f9a48066.png
which recommends adding a "maxIterations" counter to the "for len(executedComponents) ..." loop here:
https://github.com/fluxninja/aperture/blob/26e00ea818c7c28da...
HOWEVER
- the review has failed to notice the logic using "numExecutedBefore" (around line 377) that already prevents the specific bug it is suggesting a fix for
- the suggested change decrements "maxIterations" inside the "for ... range circuit.components {" loop which means it isn't counting iterations, it's counting components
This kind of suggestion is particularly nasty because it's unlikely that the test suite populates enough components to hit "maxIterations" - so an inattentive reader could accept it, get a green build, and then deploy a production bug!
-
June 25th, 2023 Deno Deploy Postmortem
The need an adaptive protection system like Aperture[0] to mitigate overloads.
[0]: https://github.com/fluxninja/aperture
-
Jsonnet – The Data Templating Language
It’s customized to our policy spec. But you can learn from this and adapt it to your spec.
https://github.com/fluxninja/aperture/blob/main/scripts/json...
- Show HN: Aperture – Unified Reliability Management for Microservices
- Failure Mitigation for Microservices: An Intro to Aperture
What are some alternatives?
howtheysre - A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
rules_jsonnet - Jsonnet rules for Bazel
awesome-chaos-engineering - A curated list of Chaos Engineering resources.
slo-exporter - Slo-exporter computes standardized SLI and SLO metrics based on events coming from various data sources.
awesome-scalability - The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
awesome-sre-tools - A curated list of Site Reliability and Production Engineering Tools
devops-exercises - Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
now-boltwall - Vercel lambda deployment for a Nodejs Lightning-powered Paywall
awesome-interview-questions - :octocat: A curated awesome list of lists of interview questions. Feel free to contribute! :mortar_board:
ai-pr-reviewer - AI-based Pull Request Summarizer and Reviewer with Chat Capabilities.
jaeger-ui - Web UI for Jaeger
etleneum - the centralized smart contract platform