spark-on-k8s-operator VS cortex

Compare spark-on-k8s-operator vs cortex and see what are their differences.

spark-on-k8s-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes. (by GoogleCloudPlatform)

cortex

A horizontally scalable, highly available, multi-tenant, long term Prometheus. (by cortexproject)
Our great sponsors
  • Scout APM - A developer's best friend. Try free for 14-days
  • Nanos - Run Linux Software Faster and Safer than Linux with Unikernels
  • SaaSHub - Software Alternatives and Reviews
spark-on-k8s-operator cortex
5 8
1,748 4,447
2.7% 1.5%
7.4 9.5
3 days ago 3 days ago
Go Go
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

spark-on-k8s-operator

Posts with mentions or reviews of spark-on-k8s-operator. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-04-12.

cortex

Posts with mentions or reviews of cortex. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-11-24.
  • The unbearable fussiness of the smart home
    8 projects | news.ycombinator.com | 24 Nov 2021
    > [...] that feed into a prometheus -> cortex store, so I can then map them on Grafana.

    I had to Google because I've never heard of any of those. Did I find the right ones?

    https://prometheus.io/

    https://cortexmetrics.io/

    https://grafana.com/

    Mine is much more primitive. My indoor temperature monitor is an ESP8266 that uploads the temperature to a simple PHP page that saves it in an sqlite DB. A cron job runs a Perl script every few minutes that extract the data for the last hour, 3 hours, 12 hours, 48 hours, and since the beginning of time and uses gnuplot to produce PNG graphs. There's a static page on my server that displays those graphs.

    My outdoor temperature monitor uses a cheap AcuRite 433 MHz indoor/outdoor thermometer I bought. I have an RPi with an RTL-SDR attached spying on the communications between the AcuRite sensor outside and the AcuRite display inside using rtl_433. A script looks at the rtl_433 and finds the AcuRite sensor data and puts it in an sqlite DB. I haven't yet gotten around to making something to graph it.

    The nice thing about that approach is that it was also easy to add support for other 433 MHz wireless sensors near me, such as the AcuRite fridge/freezer thermometer I have. I can also see a few assorted sensors of neighbors (temperature, humidity, soil moisture, tire pressure, wind speed, wind direction, rain, and a few other random things). If I wanted to it would be easy to add them to the DB.

    When I made a wireless tipping range gauge recently. I used a 433 MHz transmitter module [1] and added a decoder [2] to rtl_433 that understands my data stream format. That gets my data into the rtl_433 output. No need to futz around with 433 MHz receiver modules which appear to be a pain in the ass [3]. An ATTiny85 counts the tips and runs the transmitter. The ATTiny85, the transmitter module, a battery holder, an RJ11 socket because the rain gauge has an RJ11 connector, a board to put those things on [4], and a small waterproof case is pretty much the complete parts list.

    I think I'm going to standardize on this general approach. For things that do not have WiFi and only need to report data 433 MHz modules and custom decoders fro rtl_433 on the RPi. For things that do have WiFi, such as any future ESP projects I do, they will just use WiFi to talk to the RPi. If anything needs to get sent outside of my LAN the RPi will handle it.

    The RPi is also currently controlling a space heater in my living room, getting connection data from my cable modem periodically and recording that in an sqlite DB, and serving a simple web page that lets me quickly change inputs and volume on my Denon receiver and so I'm already pretty much committed to keeping it running all the time.

    [1] https://www.sparkfun.com/products/10534

    [2] Decoders can be specified in a simple text file. Here's the one for my rain guage as an example:

      decoder {
  • Processing large datasets from mongodb in realtime
    1 project | reddit.com/r/golang | 30 Jul 2021
    Not a lot to go on in your post, but you might find some inspiration from this project (written in golang) which handles huge data sets (metrics). https://cortexmetrics.io/
  • How are you tracking your SLA's/SLO
    2 projects | reddit.com/r/sre | 3 Apr 2021
    Thanos or Cortex.
  • msp monitoring
    2 projects | reddit.com/r/networking | 22 Mar 2021
    You will probably want to look at Cortex, it's designed to be the multi-tenant database. You can either build it up yourself, or use Grafana hosted version.
  • Sizing Considerations for Prometheus
    3 projects | reddit.com/r/PrometheusMonitoring | 12 Mar 2021
    But yes, Cortex is primarily Grafana Labs :)
  • Launch HN: Opstrace (YC S19) – open-source Datadog
    11 projects | news.ycombinator.com | 1 Feb 2021
    (3) Transparency and predictability of costs—you pay your cloud provider for the storage/network/compute for running Opstrace and can take advantage of any credits/discounts you negotiate with them. We are incentivized to help you understand exactly where you are spending money because you pay us for the value you get from our product with per-user pricing. (For more about costs, see our recent blog post here: https://opstrace.com/blog/pulling-cost-curtain-back). (4) It should be REAL Open Source with the Apache License, Version 2.0.

    To get started, you install Opstrace into your AWS or GCP account with one command: `opstrace create`. This installs Opstrace in your account, creates a domain name and sets up authentication for you for free. Once logged in you can create tenants that each contain APIs for Prometheus, Fluentd/Loki and more. Each tenant has a Grafana instance you can use. A tenant can be used to logically separate domains, for example, things like prod, test, staging or teams. Whatever you prefer.

    At the heart of Opstrace runs a Cortex (https://github.com/cortexproject/cortex) cluster to provide the above-mentioned scalable Prometheus API, and a Loki (https://github.com/grafana/loki) cluster for the logs. We front those with authenticated endpoints (all public in our repo). All the data ends up stored only in S3 thanks to the amazing work of the developers on those projects.

    An "open source Datadog" requires more than just metrics and logs. We are actively working on a new UI for managing, querying and visualizing your data and many more features, like automatic ingestion of logs/metrics from cloud services (CloudWatch/Stackdriver), Datadog compatible API endpoints to ease migrations and side by side comparisons and synthetics (e.g. Pingdom). You can follow along on our public roadmap: https://opstrace.com/docs/references/roadmap.

    We will always be open source, and we make money by charging a per-user subscription for our commercial version which will contain fine-grained authz, bring-your-own OIDC and custom domains.

    Check out our repo (https://github.com/opstrace/opstrace) and give it a spin (https://opstrace.com/docs/quickstart).

    We’d love to hear what your perspective is. What are your experiences related to the problems discussed here? Are you all happy with the tools you’re using today?

    11 projects | news.ycombinator.com | 1 Feb 2021
    Thanks for bringing this topic to this thread. I'm a physicist by heart and education myself and observe that in the software/observability industry we like to collect data much more than we're interested in properly processing and interpreting it.

    > Understanding the distribution of your data (rather than just averages) is arguably the most important feature you want from a monitoring dashboard, so the weak support for quantiles is very limiting.

    So much yes! It's a relief to see that we have people here in this thread (and industry) who understand this :-).

    People that have a deep background and experience in experimentation, measurement, and quantification rightfully have to see the nature of the data distribution first before they feel in any way OK about proceeding with aggregates.

    Parent commenter knows this, but for people reading along: using aggregates (such as mean, standard deviation, standard error, quantiles, ...) implies dropping information. Going from the full distribution to a simplified representation naturally implies that what we talk about is a lossy transformation of data. Of course, one wants to be smart about _which_ information to drop. It should be intuitive that one can only be smart about this choice when having knowledge about the underlying distribution. Often, data is not normally distributed, not Poisson-distributed, but instead somewhat uniquely distributed based on the use case -- in a way that deserves brief characterization (a quick look is often enough!); which then allows for making informed decisions about which aggregate parameters to look at -- and which pieces of information are fine to drop.

    > Histograms require manually specifying the distribution of your data, which is time-consuming, lossy, and can introduce significant error bands around your quantile estimates.

    Yes! Great point. Honestly, I was a little bit shocked when I saw how this works in the Prometheus ecosystem. I happen to have an example for this I think: we (Opstrace) have contributed a tiny patch to Cortex where we changed the parameterization of a specific histogram metric, because the upper band was super broad, leading to a blind spot (a lack of resolution) in the range of values that was most interesting to us -- see https://github.com/cortexproject/cortex/issues/2530 and

  • Gopher Gold #14 - Wed Oct 07 2020
    22 projects | dev.to | 7 Oct 2020
    cortexproject/cortex (Go): A horizontally scalable, highly available, multi-tenant, long term Prometheus.

What are some alternatives?

When comparing spark-on-k8s-operator and cortex you can also consider the following projects:

thanos - Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.

loki - Like Prometheus, but for logs.

enhancements - Enhancements tracking repo for Kubernetes

volcano - A Cloud Native Batch System (Project under CNCF)

github-actions-runner-operator - K8S operator for scheduling github actions runner pods

helm-operator - Successor: https://github.com/fluxcd/helm-controller — The Flux Helm Operator, once upon a time a solution for declarative Helming.

kubebuilder - Kubebuilder - SDK for building Kubernetes APIs using CRDs

charts - ⚠️(OBSOLETE) Curated applications for Kubernetes

velero - Backup and migrate Kubernetes applications and their persistent volumes

mysql-operator - Asynchronous MySQL Replication on Kubernetes using Percona Server and Openark's Orchestrator.

windows_exporter - Prometheus exporter for Windows machines

opstrace - The Open Source Observability Distribution