jitsu VS superset

Compare jitsu vs superset and see what are their differences.

jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days (by jitsucom)
Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
jitsu superset
13 137
3,795 57,792
1.9% 2.4%
9.8 9.9
7 days ago 5 days ago
TypeScript TypeScript
MIT License Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

jitsu

Posts with mentions or reviews of jitsu. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-07-19.
  • Any examples of working activist, socialist, or community-organizing software?
    5 projects | /r/socialistprogrammers | 19 Jul 2022
  • Lesser Known Features of ClickHouse
    6 projects | news.ycombinator.com | 31 May 2022
    you may check: https://github.com/jitsucom/jitsu. "Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days"

    You can create an API endpoint, and send those JSON to it. In the "destination" part, it can sync to clickhouse (one of many choices, like redshift, snowflake,besides clickhouse) very quickly, and flatten the JSON into columns. If there is new key found in JSON, it will create a new column in clickhouse.

  • Reference Data Stack for Data-Driven Startups
    8 projects | dev.to | 3 Mar 2022
    We also have telemetry set up on our Monosi product which is collected through Snowplow,. As with Airbyte, we chose Snowplow because of its open source offering and because of their scalable event ingestion framework. There are other open source options to consider including Jitsu and RudderStack or closed source options like Segment. Since we started building our product with just a CLI offering, we didn’t need a full CDP solution so we chose Snowplow.
  • Data pipeline suggestions
    13 projects | /r/dataengineering | 4 Feb 2022
    Ingestion / Extraction: Airbyte, Singer, Jitsu
  • Where can I find free data engineering ( big data) projects online?
    14 projects | /r/dataengineering | 27 Jan 2022
    Ingestion / ETL: Airbyte, Singer, Jitsu Transformation: dbt Orchestration: Airflow, Dagster Testing: GreatExpectations Observability: Monosi Reverse ETL: Grouparoo, Castled Visualization: Lightdash, Superset
  • Ask HN: Good open source alternatives to Google Analytics?
    30 projects | news.ycombinator.com | 11 Jan 2022
  • Launch HN: Jitsu (YC S20) – Open-Source Segment Alternative
    7 projects | news.ycombinator.com | 4 Nov 2021
    Hey HN! Vlad here with Sergey, Ildar, and Kirill. We are building Jitsu, an open-source Segment alternative ((https://github.com/jitsucom/jitsu, https://jitsu.com/). We help companies collect events from their apps, websites, and APIs and send them to databases.

    I've been doing data engineering for more than ten years (half of that time, I didn't know that it's called "data engineering”). Before Jitsu, I was a co-founder and CTO of GetIntent, an ad-tech startup. Although it was ad-tech (I'm sorry for that!), we also built a quite fascinating technology platform. We processed up to 1 million events per second at peak, and all those events needed to be stored somewhere.

    We churned through a few data warehouse platforms along the way. In 2013, we started with Hadoop's HDFS and a bunch of map-reduce jobs on top of it. Then, when we decided to allow our customers to run ad-hoc reports, we switched to BigQuery. BigQuery was great, but expensive—especially with some customers obsessively clicking the refresh button. Finally, in 2017 we migrated to self-hosted ClickHouse which in my opinion is still the best analytics database in the world.

    All that time, we spent a fair amount of effort to get data to the database. When you're dealing with millions of events per minute, running an INSERT statement per event won't work. What if the DB is down for maintenance? How can you be sure that all 50+ edge nodes are aware of recent DB schema changes? Also, did you know streaming data to BigQuery is costly while batching data is free?

    We tried different approaches: first, we would write local log files, sync them to HDFS, and load data to BQ (or ClickHouse) with map-reduce jobs. To improve data freshness, we ditched HDFS and started to send data in batches to the DB directly from edge servers. We experimented with Kafka, but it felt too complex for that task at the time.

    I always dreamed about a straightforward service, to which I'd throw JSON objects, and it would take care of the rest: queueing, retrying, updating database schema, etc.

    Then I discovered Segment. I liked it at first. It seemed very developer-friendly with a nice API and excellent documentation. But the pricing model and data delays (the event gets to DB in 12 hours after it has been sent to Segment) killed the whole idea. And it was not open-sourced. In my opinion, being open-source and self-hostable is a must for such a fundamental part of the architecture as data collection.

    I left GetIntent and got accepted to YC with a different idea for the Summer 2020 batch. The idea was to build a churn prevention and BI tool for online retailers. It didn't take off, but in the process we made a component to collect customer's app events and put it to DB. We tried to hack a solution on top of the ELK stack, but I was frustrated with ElasticSearch’s lack of SQL support. Here I was back to square one: there's no good open-source event collection service yet, and we needed to build one, once again.

    So we decided to focus solely on that problem. We ditched all the previous code, which was in Java, rewrote the data collection server in Go and hacked together what we called EventNative [1]. It was received very well, and we started to get users.

    Over the last 11 months, we've been busy building the UI, adding Connectors (to pull data from external APIs), polishing data warehouse support, adding javascript support to transform incoming data, and implementing dozens of other features.

    Now we're launching Jitsu, an open-source Segment alternative. With Jitsu, we make it easy to collect data and send it to databases (we support all major players: ClickHouse, Redshift, Snowflake, BigQuery and Postgres). We’re deployed in production, including into a large gaming publisher, eSignature service, and many other great companies. We're going for an open-core model. So far we don't have paid features, but soon we'll have some, presumably around things like authorization and data masking. Also we run Jitsu.Cloud[2] which you can buy if you don’t want to self-host

    Give it a spin: https://github.com/jitsucom/jitsu.

    Thank you for reading this story - I hope it was interesting. I would love to read your feedback on Jitsu and answer questions!

    [1] https://news.ycombinator.com/item?id=24120325

    7 projects | news.ycombinator.com | 4 Nov 2021
    I’m just saying this is better:

    We are building Jitsu, (https://github.com/jitsucom/jitsu, https://jitsu.com/) We help companies collect events from their apps, websites, and APIs and send them to databases.

    Think of us as an open-source Segment alternative.

    7 projects | news.ycombinator.com | 4 Nov 2021
    Thanks! I take it this file is where I can get started to learn more:

    https://github.com/jitsucom/jitsu/blob/0aaa74b59eb9d8c885c80...

    I see that it instantiates an "AsyncLogger" - does the service wait until data is written to the log prior to returning success to the client?

    7 projects | news.ycombinator.com | 4 Nov 2021
    That's a good question. We're aiming to replace Kafka in some cases. There're many ways how people use Kafka. But it could be roughly divided into two buckets

    - Kafka as a company wide message bus: dozen's of (micro)services sending data there, and consumers listens to data. Each service doesn't know which other service will consume the data. For that case, we're not looking to replace Kafka — we're going to work along with it. We have a PR about supporting Kafka as destination [1] (Jitsu sends data to Kafka), and we will support Kafka as a source at some point (PRs are always welcome :))

    - Kafka is used just as a transport between web-app and DB. In that case Jitsu is a perfect replacement

    [1] https://github.com/jitsucom/jitsu/pull/537

    P. S. The same applies to Kinesis too

superset

Posts with mentions or reviews of superset. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-26.
  • Apache Superset
    14 projects | news.ycombinator.com | 26 Feb 2024
    Had a very good experience with Superset.

    Superset allowed us to replace Tableau and not looking back

    Took me a while figure out how to embed it into my app using Superset Embedded SDK.

    Superset Embedded SDK - "Embedded SDK allows you to embed dashboards from Superset into your own app, using your app's authentication. Embedding is done by inserting an iframe, containing a Superset page, into the host application."

    https://github.com/apache/superset/tree/master/superset-embe...

    Superset is based on very high quality and well maintained chart library eChart

    https://echarts.apache.org/examples/en/#chart-type-linesG

    Community Roadmap

    https://github.com/apache/superset/projects?query=is%3Aopen

    Huge respect to Preset.io and its team for contributing to the project and keep it in a great shape

    https://preset.io/blog/

    Superset source code is very easy to read and understand, and as a result it's possible to implement some advanced caching techniques reduce the load on charts.

    No BI is perfect.

    Watching Superset for years gives me confidence the project will work as supposed down the road, and eventually some of its packages can be reusable for all kind of visualizations and data hacking.

    14 projects | news.ycombinator.com | 26 Feb 2024
    Superset is absolutely phenomenal. I really hope Microsoft eventually releases all of their customizations they made to it internally to the OS community someday.

    https://www.youtube.com/watch?v=RY0SSvSUkMA

    https://github.com/apache/superset/discussions/20094

    14 projects | news.ycombinator.com | 26 Feb 2024
  • A modern data stack for startups
    2 projects | news.ycombinator.com | 30 Dec 2023
    Do you have any thoughts on Superset? Did you consider it as a candidate?

    For anyone who doesn't know: https://superset.apache.org/

    (There's at least one service that offers managed Superset hosting if that's what you're looking for; it's easy to find so I won't link it here.)

    2 projects | news.ycombinator.com | 30 Dec 2023
    I recently ran a little shootout between Superset, Metabase, and Lightdash. All have nontrivial weaknesses but I ended up picking Lightdash.

    Superset the best of them at _data visualization_ but I honestly found it almost useless for self-serve _BI_ by business users. This issue on how to do joins in Superset (with stalebot making a mess XD) is everything difficult about Superset for BI in a nutshell. https://github.com/apache/superset/issues/8645

    Metabase is pretty great and it's definitely the right choice for a startup looking to get low cost BI set up. It still has a very table centric view, but feels built for _BI_ rather than visualization alone.

    Lightdash has significant warts (YAML, pivoting being done in the frontend, no symmetric aggregates) but the Looker inspiration is obvious and it makes it easy to present _groups of tables_ to business users ready to rock. I liked Looker before Google acquired it. My business users are comfortable with star and snowflake schemas (not that they know those words) and it was easy to drop Lightdash on top of our existing data warehouse.

  • FLaNK Stack Weekly for 20 Nov 2023
    37 projects | dev.to | 20 Nov 2023
  • Yandex open sourced it's BI tool DataLens
    4 projects | news.ycombinator.com | 26 Sep 2023
    Or like not being able to delete a user without running some SQL:

    https://github.com/apache/superset/issues/13345

    Almostl instantly run into this issue setting up a test instance of Superset. And the issue has been around for years.

  • Apache Superset: Installing locally is easy using the makefile
    3 projects | dev.to | 20 Aug 2023
    Are you interested in trying out Superset, but you're intimidated by the local setup process? Worry not! Superset needs some initial setup to install locally, but I've got a streamlined way to get started - using the makefile! This file contains a set of scripts to simplify the setup process.
  • More public SQL-queryable databases?
    3 projects | /r/datasets | 10 Jul 2023
    Recently I discovered BigQuery public datasets - just over 200 datasets available for directly querying via SQL. I think this is a great thing! I can connect these direct to an analytics platform (we use Apache Superset which uses Python SQLAlchemy under the hood) for example and just start dashboarding.
  • Real-time data analytics with Apache Superset, Redpanda, and RisingWave
    3 projects | dev.to | 20 May 2023
    In today's fast-paced data-driven world, organizations must analyze data in real-time to make timely and informed decisions. Real-time data analytics enables businesses to gain valuable insights, respond to real-time events, and stay ahead of the competition. Also, the analytics engine must be capable of running analytical queries and returning results in real-time. In this article, we will explore how you can build a real-time data analytics solution using the open-source tools Redpanda a distributed streaming platform, Apache Superset, a data visualization, and a business intelligence platform, combined with RisingWave a streaming database.

What are some alternatives?

When comparing jitsu and superset you can also consider the following projects:

streamlit - Streamlit — A faster way to build and share data apps.

jupyter-dash - OBSOLETE - Dash v2.11+ has Jupyter support built in!

Apache Hive - Apache Hive

lightdash - Open source BI for teams that move fast ⚡️

Metabase - The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

django-project-template - The Django project template I use, for installation with django-admin.

react-admin - A frontend Framework for building data-driven applications running on top of REST/GraphQL APIs, using TypeScript, React and Material Design

airbyte - The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

nifi - Apache NiFi

Baserow - Open source no-code database and Airtable alternative. Create your own online database without technical experience. Performant with high volumes of data, can be self hosted and supports plugins

Airflow - Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Snowplow - The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP