Reference Data Stack for Data-Driven Startups

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • Rudderstack

    Privacy and Security focused Segment-alternative, in Golang and React

    We also have telemetry set up on our Monosi product which is collected through Snowplow,. As with Airbyte, we chose Snowplow because of its open source offering and because of their scalable event ingestion framework. There are other open source options to consider including Jitsu and RudderStack or closed source options like Segment. Since we started building our product with just a CLI offering, we didn’t need a full CDP solution so we chose Snowplow.

  • PostgreSQL

    Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

    Technologies include Snowplow, Airbyte, PostgreSQL, dbt, Snowflake, Metabase, and Monosi itself. All of it is hosted on AWS running in a VPC.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

  • grouparoo

    Discontinued 🦘 The Grouparoo Monorepo - open source customer data sync framework

    There are other tools that we will have to adopt in the future but haven’t yet due to lack of necessity. Specifically, one category that is popular in modern data stacks is Reverse ETL (Hightouch, Census, or Grouparoo). We currently don’t have a usecase for piping data back into 3rd party tools but it will definitely come up in the future.

  • superset

    Apache Superset is a Data Visualization and Data Exploration Platform

    To analyze the data stored in Snowflake and Postgres, we use Metabase. We chose Metabase because of it’s open source offering and easy to use interface. Other open source tools like Lightdash and Superset exist which we may add to the stack as our data team grows.

  • Snowplow

    The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

    We also have telemetry set up on our Monosi product which is collected through Snowplow,. As with Airbyte, we chose Snowplow because of its open source offering and because of their scalable event ingestion framework. There are other open source options to consider including Jitsu and RudderStack or closed source options like Segment. Since we started building our product with just a CLI offering, we didn’t need a full CDP solution so we chose Snowplow.

  • Metabase

    The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

    To analyze the data stored in Snowflake and Postgres, we use Metabase. We chose Metabase because of it’s open source offering and easy to use interface. Other open source tools like Lightdash and Superset exist which we may add to the stack as our data team grows.

  • jitsu

    Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

    We also have telemetry set up on our Monosi product which is collected through Snowplow,. As with Airbyte, we chose Snowplow because of its open source offering and because of their scalable event ingestion framework. There are other open source options to consider including Jitsu and RudderStack or closed source options like Segment. Since we started building our product with just a CLI offering, we didn’t need a full CDP solution so we chose Snowplow.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    First, we have data coming in from various third party tools such as Google Analytics, MailChimp, Github, Slack, etc. For extraction we use Airbyte.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts