Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
EMQX is an open-source, highly scalable, and distributed MQTT messaging broker written in Erlang/OTP that can support millions of concurrent clients. As such, there is a need to persist and replicate various data among the cluster nodes. For example: MQTT topics and their subscribers, routing information, ACL rules, various configurations, and many more. Since its beginning, EMQX has used Mnesia as the database backend for such needs.
For deploying and running our cluster tests, we used AWS CDK, which allowed us to experiment with different instance types and numbers, and also trying out different development branches of EMQX. You can checkout our scripts in this Github repo. In our load generator nodes ("loadgens" for short), we used our emqtt-bench tool to generate the connection / publishing / subscribing traffic with various options. EMQX's Dashboard and Prometheus were used for monitoring the progress of the test and the instances' health.
In EMQX 5.0, we attempted to mitigate this issue in a new DB backend type called RLOG (as in replication log), which is implemented in Mria. Mria is an extension to the Mnesia database that helps it scale horizontally by defining two types of nodes: i) core nodes, which behave as usual Mnesia nodes and participate in write transactions; ii) replicant nodes, which do not take part in transactions and delegate those to core nodes, while keeping a read-only replica of the data locally. This helps to reduce the risk of split-brain scenarios and lessens the coordination needed for transactions, since fewer nodes participate in it, while keeping read-only data access fast, since data is available locally for reading in all nodes.
For deploying and running our cluster tests, we used AWS CDK, which allowed us to experiment with different instance types and numbers, and also trying out different development branches of EMQX. You can checkout our scripts in this Github repo. In our load generator nodes ("loadgens" for short), we used our emqtt-bench tool to generate the connection / publishing / subscribing traffic with various options. EMQX's Dashboard and Prometheus were used for monitoring the progress of the test and the instances' health.
For deploying and running our cluster tests, we used AWS CDK, which allowed us to experiment with different instance types and numbers, and also trying out different development branches of EMQX. You can checkout our scripts in this Github repo. In our load generator nodes ("loadgens" for short), we used our emqtt-bench tool to generate the connection / publishing / subscribing traffic with various options. EMQX's Dashboard and Prometheus were used for monitoring the progress of the test and the instances' health.
For deploying and running our cluster tests, we used AWS CDK, which allowed us to experiment with different instance types and numbers, and also trying out different development branches of EMQX. You can checkout our scripts in this Github repo. In our load generator nodes ("loadgens" for short), we used our emqtt-bench tool to generate the connection / publishing / subscribing traffic with various options. EMQX's Dashboard and Prometheus were used for monitoring the progress of the test and the instances' health.
In our initial tests with Mria, without going into too many details, the replication mechanism basically involved logging all transactions to a "phantom" Mnesia table, which was subscribed to by replicant nodes. This effectively generated a bit of network overhead between the core nodes because each transaction was essentially "duplicated". In our OTP fork, we added a new Mnesia module that allows us to capture all committed transaction logs more easily, removing the need for the "duplicate" writes and reducing network usage significantly, and allowing the cluster to sustain higher connection / transaction rates. While stressing the cluster further after those optimizations, we found new bottlenecks that prompted further performance tunings[4][5][6].