Open Source Analytics Stack: Bringing Control, Flexibility, and Data-Privacy to Your Analytics

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • OPS - Build and Run Open Source Unikernels
  • Scout APM - Less time debugging, more time building
  • SonarQube - Static code analysis for 29 languages.
  • GitHub repo Matomo

    Liberating Web Analytics. Star us on Github? +1. Matomo is the leading open alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. We love Pull Requests!

    Matomo (website, GitHub) is an open-source web analytics tool and calls itself a Google Analytics alternative. Matomo gives you valuable insights into your website's visitors, marketing campaigns, etc., making it easy to optimize your strategy and online experience of your visitors.

  • GitHub repo dbt

    dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. [Moved to: https://github.com/dbt-labs/dbt-core]

    Due to the rise in cloud-based data warehouses, businesses can directly load all the raw data into the data warehouse without prior transformations. This process is known as ELT (Extract, Load, Transform) and gives data and analytics teams freedom to develop ad-hoc transformations based on their particular needs. ELT became popular as the cloud's processing power and scale became better suited to transforming data. DBT (website, GitHub) is a popular open-source tool recommended for ELT and allows businesses to transform data in their warehouses more effectively. It's a great pairing with with RudderStack's Cloud Extract ETL tool.

  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

  • GitHub repo PostgreSQL

    Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

    Moreover, using open-source warehouse tools can allow unlocking additional insights from your data in real-time and at a lesser cost. PostgreSQL (website, repo) is a popular example of an efficient and low-cost data warehousing solution. Another example is ClickHouse (website, GitHub), an open-source, analytics-focused DBMS that allows generating analytical reports from data in real-time using SQL.

  • GitHub repo Apache Kafka

    Mirror of Apache Kafka

    With the increase in real-time data streams and event streams, certain use cases emerged that require access to real-time data such as financial services risk reporting or detecting a credit card fraud. Real-time streams can be obtained using a stream processing framework like Apache Kafka (website, GitHub). The focus is to direct the stream of data from various sources into reliable queues where data can be automatically transformed, stored, analyzed and reported concurrently.

  • GitHub repo superset

    Apache Superset is a Data Visualization and Data Exploration Platform

    Open-source BI platforms such as Metabase (website, GitHub) and Apache SuperSet (website, GitHub) are easy to deploy without IT involvement. Metabase lets you build dashboards from the data in your warehouse easily, with no SQL, or, if you have data engineering or science know-how, inside more powerful and flexible notebooks or with SQL itself. Similarly, Apache SuperSet helps businesses explore and visualize data from simple line charts to detailed geospatial charts.

  • GitHub repo unomi

    Apache Unomi

    Talking about successful data ingestion tools, most businesses rely increasingly on different Customer Data Platforms (CDPs) that track, collect, and ingest data from multiple sources and systems into a single platform to get a unified customer view. Apache Unomi (website, GitHub) is a perfect example of an open-source CDP that ingests data and collects it in one place.

  • GitHub repo Snowplow

    The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

    However, limitations to traditional CDPs, especially around connecting to best-of-breed customer tooling and exposing data for use across an organization have driven a new generation of non-CDPs. Solutions like Snowplow's (website, GitHub) data delivery platform and RudderStack's (website, GitHub) customer data platform for developers ingest data from a multitude of sources, apply in-stream transformations, and route data to your data warehouse, like Snowplow, or your warehouse plus your preferred customer tooling destinations for activation, like RudderStack.

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo Rudderstack

    Privacy and Security focused Segment-alternative, in Golang and React

    However, limitations to traditional CDPs, especially around connecting to best-of-breed customer tooling and exposing data for use across an organization have driven a new generation of non-CDPs. Solutions like Snowplow's (website, GitHub) data delivery platform and RudderStack's (website, GitHub) customer data platform for developers ingest data from a multitude of sources, apply in-stream transformations, and route data to your data warehouse, like Snowplow, or your warehouse plus your preferred customer tooling destinations for activation, like RudderStack.

  • GitHub repo rudderstack-docs

    Documentation repository for RudderStack - the Customer Data Platform for Developers.

    However, limitations to traditional CDPs, especially around connecting to best-of-breed customer tooling and exposing data for use across an organization have driven a new generation of non-CDPs. Solutions like Snowplow's (website, GitHub) data delivery platform and RudderStack's (website, GitHub) customer data platform for developers ingest data from a multitude of sources, apply in-stream transformations, and route data to your data warehouse, like Snowplow, or your warehouse plus your preferred customer tooling destinations for activation, like RudderStack.

  • GitHub repo ClickHouse

    ClickHouse® is a free analytics DBMS for big data

    Moreover, using open-source warehouse tools can allow unlocking additional insights from your data in real-time and at a lesser cost. PostgreSQL (website, repo) is a popular example of an efficient and low-cost data warehousing solution. Another example is ClickHouse (website, GitHub), an open-source, analytics-focused DBMS that allows generating analytical reports from data in real-time using SQL.

  • GitHub repo PostHog

    🦔 PostHog provides open-source product analytics that you can self-host.

    The self-hosted PostHog (website, GitHub) is an excellent open-source alternative for product analytics and can be easily integrated into your infrastructure. You can easily analyze how customers interact with your product, the user traffic, and ways to improve your user retention.

  • GitHub repo Metabase

    The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

    Open-source BI platforms such as Metabase (website, GitHub) and Apache SuperSet (website, GitHub) are easy to deploy without IT involvement. Metabase lets you build dashboards from the data in your warehouse easily, with no SQL, or, if you have data engineering or science know-how, inside more powerful and flexible notebooks or with SQL itself. Similarly, Apache SuperSet helps businesses explore and visualize data from simple line charts to detailed geospatial charts.

  • GitHub repo Countly

    Countly helps you get insights from your application. Available self-hosted or on private cloud.

    Countly (website, GitHub) is also an open-source product analytics platform that is designed primarily for marketing organizations. It helps marketers track website information (website transactions, campaigns, and sources that led visitors to the website, etc.). Countly also collects real-time mobile analytics metrics like active users, time spent in-app, customer location, etc., in a unified view on your dashboard.

  • GitHub repo Apache Superset

    Apache Superset is a Data Visualization and Data Exploration Platform [Moved to: https://github.com/apache/superset]

    Open-source BI platforms such as Metabase (website, GitHub) and Apache SuperSet (website, GitHub) are easy to deploy without IT involvement. Metabase lets you build dashboards from the data in your warehouse easily, with no SQL, or, if you have data engineering or science know-how, inside more powerful and flexible notebooks or with SQL itself. Similarly, Apache SuperSet helps businesses explore and visualize data from simple line charts to detailed geospatial charts.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts