|7 days ago||about 12 hours ago|
|MIT License||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
MDS Newsletter #12
1 project | reddit.com/r/ModernDataStack | 15 Dec 2021
2/ Featured tools this week - Transform and RudderStack
How To Event Stream From Your Gatsby Website Using Open Source RudderStack
3 projects | dev.to | 8 Dec 2021
RudderStack is an open-source Customer Data Pipeline that allows you to track and send real-time events from your web, mobile, and server-side sources to your entire customer data stack. Our primary repository - rudder-server - is open-sourced on GitHub.
Customer Data Pipelines Play a Key Role in Data Privacy
2 projects | dev.to | 1 Dec 2021
This post will explain how your customer data pipeline can help improve your data privacy and how to ensure your data privacy with RudderStack.
Open Source Analytics Stack: Bringing Control, Flexibility, and Data-Privacy to Your Analytics
15 projects | dev.to | 25 Nov 2021
However, limitations to traditional CDPs, especially around connecting to best-of-breed customer tooling and exposing data for use across an organization have driven a new generation of non-CDPs. Solutions like Snowplow's (website, GitHub) data delivery platform and RudderStack's (website, GitHub) customer data platform for developers ingest data from a multitude of sources, apply in-stream transformations, and route data to your data warehouse, like Snowplow, or your warehouse plus your preferred customer tooling destinations for activation, like RudderStack.
RudderStack + Blendo: Better Together
1 project | dev.to | 25 Nov 2021
I learned many lessons from this journey - lessons that deserve a post of their own - but there's one lesson that I learned early on that stands out. In this blog, I talk about why we merged Blendo with RudderStack, building the team and working together to build a great product.
The Open Source Story - Open Sourcing RudderStack Blog and Docs
5 projects | dev.to | 18 Nov 2021
In fact, developers have already started contributing to our documentation. Recently, Benedikt from the Userlist team created the docs for the Userlist destination for RudderStack (see the pull request here). They also built the Userlist integration, submitted a pull request, and it is now live on our platform! This is the beauty of open source!
How to plan and implement a customer data tracking strategy for your Micro-SaaS
1 project | reddit.com/r/ShopifyAppDev | 16 Nov 2021
TLDR: general steps/starting point for setting up an app with Rudderstack ( or Segment) to track customer events
Developing a Custom Plugin using Flutter
5 projects | dev.to | 11 Nov 2021
As a part of our SDK roadmap at RudderStack, we wanted to develop a Flutter SDK. Our existing SDKs include features such as storing event details and persisting user details on the database, and much more. However, these features are already implemented in our Android and iOS SDKs.
Visualize Stripe Payments Data in Postgres using SQL
2 projects | dev.to | 2 Nov 2021
To load Stripe data into Postgres, you can use platforms such as Stitch Data and Rudderstack. In this guide, we will use Stitch Data because it is a cheap and a fast solution.
Dogfooding at RudderStack: Tracking Plans Part 1
1 project | dev.to | 21 Oct 2021
With your Tracking Plans in place, you can use the existing Data Governance API's to evaluate your inbound events, payload samples and metadata to compare them against your plans. You can also use the RudderTyper tool we're releasing alongside Tracking Plans. RudderTyper is a tool for generating strongly-typed RudderStack analytics library wrappers based on your published tracking plan specs, meaning your data will conform to your defined schema upon capture.
Ask HN: Free and open source distributed database written in C++ or C
12 projects | news.ycombinator.com | 16 May 2022
File format for large data with many columns
2 projects | reddit.com/r/Python | 15 May 2022
Try: Clickhouse - has a clickhouse-local executable, no dependencies. Just run and start querying. Linux only. https://clickhouse.com. You’ll need a driver for Python, I’ve had good experience with https://clickhouse-driver.readthedocs.io/en/latest/index.html
CIDR Mask Function Equivalents
1 project | reddit.com/r/Clickhouse | 11 May 2022
Attempting to query a data set and use a WHERE clause to return ip addresses that match a predefined subnet. There is a IPv4NumToStringClassC, but not an equivalent IPv4StringToNumClassC function. After much searching, I have not encountered a semi-reasonable way to perform this. Closet I found was - https://github.com/ClickHouse/ClickHouse/issues/247. Any suggestions?
What’s the Database Plus concept and what challenges can it solve?
5 projects | dev.to | 10 May 2022
Today, it is normal for enterprises to leverage diversified databases. In my market of expertise, China, in the Internet industry, MySQL together with data sharding middleware is the go to architecture, with GreenPlum, HBase, Elasticsearch, Clickhouse and other big data ecosystems being auxiliary computing engine for analytical data. At the same time, some legacy systems (such as SQLServer legacy from .NET transformation, or Oracle legacy from outsourcing) can still be found in use. In the financial industry, Oracle or DB2 is still heavily used as the core transaction system. New business is migrating to MySQL or PostgreSQL. In addition to transactional databases, analytical databases are increasingly diversified as well.
Install ClickHouse Faster
1 project | news.ycombinator.com | 7 May 2022
Real-time Open Source Indexes: Databases, Headless CMSs and Static Site Generators
7 projects | dev.to | 4 May 2022
ClickHouse (349 active contributors).7 projects | dev.to | 4 May 2022
ClickHouse (349 active contributors). Originated in 2009 at Yandex and now developed by ClickHouse, Inc. (valued at $2B in 2021).
ArcticDB: A Database for Observability
3 projects | news.ycombinator.com | 4 May 2022
There is work already going on in ClickHouse community to support dynamic columns - https://github.com/ClickHouse/ClickHouse/pull/23932
Russian Tech Industry Faces ‘Brain Drain’ as Workers Flee
1 project | reddit.com/r/europe | 14 Apr 2022
https://clickhouse.com/ is created by Russian developers and originally by russian Yandex.
Grafana Mimir – 1B active series TSDB
12 projects | news.ycombinator.com | 30 Mar 2022
> I can't find any other open source time series database except Mimir/Cortex which allows this much scale (clustering options in their open source version)
The following open source time series databases also can scale horizontally to many nodes:
- Thanos - https://github.com/thanos-io/thanos/
- M3 - https://github.com/m3db/m3
- Cluster version of VictoriaMetrics - https://docs.victoriametrics.com/Cluster-VictoriaMetrics.htm... (I'm CTO at VictoriaMetrics)
> Can we use Prometheus/Mimir as general purpose time series database?
This depends on what do you mean under "general purpose time series database". Prometheus/Mimir are optimized for storing (timestamp, value) series where timestamp is a unix timestamp in milliseconds and value is a floating-point number. Each series has a name and can have arbitrary set of additional (label=value) labels. Prometheus/Mimir aren't optimized for storing and processing series of other value types such as strings (aka logs) and complex datastructures (aka events and traces).
So, if you need storing time series with floating-point values, then Prometheus/Mimir may be a good fit. Otherwise take a look at ClickHouse  - it can efficiently store and process time series with values of arbitrary types.
What are some alternatives?
Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
VictoriaMetrics - VictoriaMetrics: fast, cost-effective monitoring solution and time series database
loki - Like Prometheus, but for logs.
arrow-datafusion - Apache Arrow DataFusion SQL Query Engine
TimescaleDB - An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
duckdb - DuckDB is an in-process SQL OLAP Database Management System
RocksDB - A library that provides an embeddable, persistent key-value store for fast storage.
Adminer - Database management in a single PHP file
PostgreSQL - Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch
TileDB - The Universal Storage Engine
QuestDB - An open source SQL database designed to process time series data, faster