|3 days ago||3 days ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
So many sparks on K8s... Could anyone give me a few explanations please ?
1 project | reddit.com/r/apachespark | 27 Oct 2021
Spark-Operator (as I understand, Spark-Operator integrates well with K8s while vanilla Spark integration with K8s seems extremely complex to maintain)
"Running Apache Spark on EKS Fargate"
1 project | dev.to | 14 Aug 2021
Spark on K8s Operator is a project from Google that allows submitting spark applications on Kubernetes cluster using CustomResource Definition SparkApplication. It uses mutating admission webhook to modify the pod spec and add the features not officially supported by spark-submit.
My Journey With Spark On Kubernetes... In Python (1/3)
4 projects | dev.to | 12 Apr 2021
In this section, you use Helm to deploy the Kubernetes Operator for Apache Spark from the incubator Chart repository. Helm is a package manager you can use to configure and deploy Kubernetes apps.
My Journey With Spark On Kubernetes... In Python (2/3)
2 projects | dev.to | 12 Apr 2021
Additional details of how SparkApplications are run can be found in the design documentation.
Gopher Gold #14 - Wed Oct 07 2020
22 projects | dev.to | 7 Oct 2020
GoogleCloudPlatform/spark-on-k8s-operator (Go): Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
The unbearable fussiness of the smart home
8 projects | news.ycombinator.com | 24 Nov 2021
> [...] that feed into a prometheus -> cortex store, so I can then map them on Grafana.
I had to Google because I've never heard of any of those. Did I find the right ones?
Mine is much more primitive. My indoor temperature monitor is an ESP8266 that uploads the temperature to a simple PHP page that saves it in an sqlite DB. A cron job runs a Perl script every few minutes that extract the data for the last hour, 3 hours, 12 hours, 48 hours, and since the beginning of time and uses gnuplot to produce PNG graphs. There's a static page on my server that displays those graphs.
My outdoor temperature monitor uses a cheap AcuRite 433 MHz indoor/outdoor thermometer I bought. I have an RPi with an RTL-SDR attached spying on the communications between the AcuRite sensor outside and the AcuRite display inside using rtl_433. A script looks at the rtl_433 and finds the AcuRite sensor data and puts it in an sqlite DB. I haven't yet gotten around to making something to graph it.
The nice thing about that approach is that it was also easy to add support for other 433 MHz wireless sensors near me, such as the AcuRite fridge/freezer thermometer I have. I can also see a few assorted sensors of neighbors (temperature, humidity, soil moisture, tire pressure, wind speed, wind direction, rain, and a few other random things). If I wanted to it would be easy to add them to the DB.
When I made a wireless tipping range gauge recently. I used a 433 MHz transmitter module  and added a decoder  to rtl_433 that understands my data stream format. That gets my data into the rtl_433 output. No need to futz around with 433 MHz receiver modules which appear to be a pain in the ass . An ATTiny85 counts the tips and runs the transmitter. The ATTiny85, the transmitter module, a battery holder, an RJ11 socket because the rain gauge has an RJ11 connector, a board to put those things on , and a small waterproof case is pretty much the complete parts list.
I think I'm going to standardize on this general approach. For things that do not have WiFi and only need to report data 433 MHz modules and custom decoders fro rtl_433 on the RPi. For things that do have WiFi, such as any future ESP projects I do, they will just use WiFi to talk to the RPi. If anything needs to get sent outside of my LAN the RPi will handle it.
The RPi is also currently controlling a space heater in my living room, getting connection data from my cable modem periodically and recording that in an sqlite DB, and serving a simple web page that lets me quickly change inputs and volume on my Denon receiver and so I'm already pretty much committed to keeping it running all the time.
 Decoders can be specified in a simple text file. Here's the one for my rain guage as an example:
Processing large datasets from mongodb in realtime
1 project | reddit.com/r/golang | 30 Jul 2021
Not a lot to go on in your post, but you might find some inspiration from this project (written in golang) which handles huge data sets (metrics). https://cortexmetrics.io/
How are you tracking your SLA's/SLO
2 projects | reddit.com/r/sre | 3 Apr 2021
Thanos or Cortex.
2 projects | reddit.com/r/networking | 22 Mar 2021
You will probably want to look at Cortex, it's designed to be the multi-tenant database. You can either build it up yourself, or use Grafana hosted version.
Sizing Considerations for Prometheus
3 projects | reddit.com/r/PrometheusMonitoring | 12 Mar 2021
But yes, Cortex is primarily Grafana Labs :)
Launch HN: Opstrace (YC S19) – open-source Datadog
11 projects | news.ycombinator.com | 1 Feb 2021
(3) Transparency and predictability of costs—you pay your cloud provider for the storage/network/compute for running Opstrace and can take advantage of any credits/discounts you negotiate with them. We are incentivized to help you understand exactly where you are spending money because you pay us for the value you get from our product with per-user pricing. (For more about costs, see our recent blog post here: https://opstrace.com/blog/pulling-cost-curtain-back). (4) It should be REAL Open Source with the Apache License, Version 2.0.
To get started, you install Opstrace into your AWS or GCP account with one command: `opstrace create`. This installs Opstrace in your account, creates a domain name and sets up authentication for you for free. Once logged in you can create tenants that each contain APIs for Prometheus, Fluentd/Loki and more. Each tenant has a Grafana instance you can use. A tenant can be used to logically separate domains, for example, things like prod, test, staging or teams. Whatever you prefer.
At the heart of Opstrace runs a Cortex (https://github.com/cortexproject/cortex) cluster to provide the above-mentioned scalable Prometheus API, and a Loki (https://github.com/grafana/loki) cluster for the logs. We front those with authenticated endpoints (all public in our repo). All the data ends up stored only in S3 thanks to the amazing work of the developers on those projects.
An "open source Datadog" requires more than just metrics and logs. We are actively working on a new UI for managing, querying and visualizing your data and many more features, like automatic ingestion of logs/metrics from cloud services (CloudWatch/Stackdriver), Datadog compatible API endpoints to ease migrations and side by side comparisons and synthetics (e.g. Pingdom). You can follow along on our public roadmap: https://opstrace.com/docs/references/roadmap.
We will always be open source, and we make money by charging a per-user subscription for our commercial version which will contain fine-grained authz, bring-your-own OIDC and custom domains.
We’d love to hear what your perspective is. What are your experiences related to the problems discussed here? Are you all happy with the tools you’re using today?11 projects | news.ycombinator.com | 1 Feb 2021
Thanks for bringing this topic to this thread. I'm a physicist by heart and education myself and observe that in the software/observability industry we like to collect data much more than we're interested in properly processing and interpreting it.
> Understanding the distribution of your data (rather than just averages) is arguably the most important feature you want from a monitoring dashboard, so the weak support for quantiles is very limiting.
So much yes! It's a relief to see that we have people here in this thread (and industry) who understand this :-).
People that have a deep background and experience in experimentation, measurement, and quantification rightfully have to see the nature of the data distribution first before they feel in any way OK about proceeding with aggregates.
Parent commenter knows this, but for people reading along: using aggregates (such as mean, standard deviation, standard error, quantiles, ...) implies dropping information. Going from the full distribution to a simplified representation naturally implies that what we talk about is a lossy transformation of data. Of course, one wants to be smart about _which_ information to drop. It should be intuitive that one can only be smart about this choice when having knowledge about the underlying distribution. Often, data is not normally distributed, not Poisson-distributed, but instead somewhat uniquely distributed based on the use case -- in a way that deserves brief characterization (a quick look is often enough!); which then allows for making informed decisions about which aggregate parameters to look at -- and which pieces of information are fine to drop.
> Histograms require manually specifying the distribution of your data, which is time-consuming, lossy, and can introduce significant error bands around your quantile estimates.
Yes! Great point. Honestly, I was a little bit shocked when I saw how this works in the Prometheus ecosystem. I happen to have an example for this I think: we (Opstrace) have contributed a tiny patch to Cortex where we changed the parameterization of a specific histogram metric, because the upper band was super broad, leading to a blind spot (a lack of resolution) in the range of values that was most interesting to us -- see https://github.com/cortexproject/cortex/issues/2530 and
Gopher Gold #14 - Wed Oct 07 2020
22 projects | dev.to | 7 Oct 2020
cortexproject/cortex (Go): A horizontally scalable, highly available, multi-tenant, long term Prometheus.
What are some alternatives?
thanos - Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
loki - Like Prometheus, but for logs.
enhancements - Enhancements tracking repo for Kubernetes
volcano - A Cloud Native Batch System (Project under CNCF)
github-actions-runner-operator - K8S operator for scheduling github actions runner pods
helm-operator - Successor: https://github.com/fluxcd/helm-controller — The Flux Helm Operator, once upon a time a solution for declarative Helming.
kubebuilder - Kubebuilder - SDK for building Kubernetes APIs using CRDs
charts - ⚠️(OBSOLETE) Curated applications for Kubernetes
velero - Backup and migrate Kubernetes applications and their persistent volumes
mysql-operator - Asynchronous MySQL Replication on Kubernetes using Percona Server and Openark's Orchestrator.
windows_exporter - Prometheus exporter for Windows machines
opstrace - The Open Source Observability Distribution