Migrating ClickHouse’ warm & cold data to object storage with JuiceFS

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

juicefs

42 9,774 9.7 Go

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

$ ls -l /var/lib/clickhouse/data// drwxr-xr-x 2 test test 64B Aug 8 13:46 202208_1_3_0 drwxr-xr-x 2 test test 64B Aug 8 13:46 202208_4_6_1 drwxr-xr-x 2 test test 64B Sep 8 13:46 202209_1_1_0 drwxr-xr-x 2 test test 64B Sep 8 13:46 202209_4_4_0 Enter fullscreen mode Exit fullscreen mode In the rightmost column of the above example, the name of each subdirectory is preceded by time, i.e., 202208, but 202208 is also the partition name, which can be defined by user but usually named by time.For example, the partition, 202208, will have two subdirectories (i.e., parts), and each partition usually consists of multiple parts. When writing data to ClickHouse, data will be written to memory first, and then persisted to disk according to the data structure in memory. If the data in a partition is too large, the partition will become many parts on the disk. ClickHouse doesn’t recommend creating too many parts under one table, it will also merge parts to reduce its total number. This is one of the reasons why it’s called the MergeTree engine.There is another example helping us to understand “part” in ClickHouse. There are many small files in the part, some of which are meta-information, such as index information, which facilitates lookup performance. $ ls -l /var/lib/clickhouse/data///202208_1_3_0 -rw-r--r-- 1 test test ?? Aug 8 14:06 ColumnA.bin -rw-r--r-- 1 test test ?? Aug 8 14:06 ColumnA.mrk -rw-r--r-- 1 test test ?? Aug 8 14:06 ColumnB.bin -rw-r--r-- 1 test test ?? Aug 8 14:06 ColumnB.mrk -rw-r--r-- 1 test test ?? Aug 8 14:06 checksums.txt -rw-r--r-- 1 test test ?? Aug 8 14:06 columns.txt -rw-r--r-- 1 test test ?? Aug 8 14:06 count.txt -rw-r--r-- 1 test test ?? Aug 8 14:06 minmax_ColumnC.idx -rw-r--r-- 1 test test ?? Aug 8 14:06 partition.dat -rw-r--r-- 1 test test ?? Aug 8 14:06 primary.idx Enter fullscreen mode Exit fullscreen mode The most right column of the above example, the files prefixed by Column are actual data files, which are relatively large compared to meta information. There are only two columns in this example, A and B, and a table may consist of many columns in actual uses. All these files, including meta and index information, will together help users to quickly jump between files or look up files. ClickHouse storage policy If you want to tier hot and cold data in ClickHouse, you will use a lifecycle policy similar to the one mentioned in ES, which is called Storage Policy in ClickHouse. Slightly different from ES, ClickHouse does not divide data into different stages, i.e., hot, warm, cold. Instead, ClickHouse provides some rules and configuration methods that require users to develop their own data tiering policy. Each ClickHouse node supports the simultaneous configuration of multiple disks, and the storage medium can be varied. For example, users usually configure a ClickHouse node with an SSD for better performance; for warm and cold data, users can store the data in a medium with a lower cost, such as a mechanical disk. Users of ClickHouse will not be aware of the underlying storage medium.Similar to ES, ClickHouse users need to create a storage policy based on data characteristics, such as the size of each subdirectory in part, the proportion of space left on the entire disk, etc. The execution of the storage policy is triggered when a certain data characteristic occurs. This policy will migrate one part from one disk to another. In ClickHouse, multiple disks configured in the same node have priority, and by default data will fall on the highest priority disk. This enables the transfer of the part from one storage medium to another.Data migration can be triggered manually through SQL commands in ClickHouse, such as MOVE PARTITION/PART, and users can also do function validation through these commands. Secondly there may be some cases where explicitly need to move a part from the current storage medium to another one by manual means. ClickHouse also supports time-based migration policy, which is independent of the storage policy. After data is written, ClickHouse triggers the migration of data on disk according to the time set by the TTL property of each table. For example, if the TTL is set to 7 days, ClickHouse will re-write the data in the table, which is older than 7 days, from the current disk (e.g. default SSD) to another lower priority disk (e.g. JuiceFS) What is JuiceFS？ JuiceFS is a high-performance, open-source distributed POSIX file system, which can be built on top of any object storage. For more details: https://github.com/juicedata/juicefs Integration of ClickHouse + JuiceFS Step 1: Mount the JuiceFS file system on all ClickHouse nodes. Any path would work because ClickHouse will have a configuration file to point to the mount point.Step 2: Modify the ClickHouse configuration to add a new JuiceFS disk. Add the JuiceFS file system mount point that you just mounted in ClickHouse so that ClickHouse can recognize this new disk.Step 3: Add a new storage policy and set the rules for sinking data. This storage policy will automatically sink data from the default disk to the specified store, such as JuiceFS, according to the user's rules.Step 4: Set the storage policy and TTL for a specific table. After the storage policy is set, you need to apply the policy to a table. In the pre-testing and validation phases, it is recommended to use a relatively large table, and if users want to achieve data sinking based on the time dimension, they need to set the TTL on the table at the same time. The whole sinking process is automatic, you can check the parts that are currently processing data migration and migration progress through ClickHouse's system table.Step 5: Manually move the part for validation. You can verify whether the current configuration or storage policy is in effect by manually executing the MOVE PARTITION command.As an example below, ClickHouse has a configuration item called storage_configuration, which contains disks configuration, in which JuiceFS is added as a disk and named "jfs" (他the name is arbitrary) and the mount point is the /jfs directory. /jfs default 1073741824 jfs 0.1 Enter fullscreen mode Exit fullscreen mode Further down are the policies configuration items, where a storage policy called hot_and_cold is defined, and the user needs to define some specific rules, such as prioritizing the volumes in order of hot first and then cold, with the data first falling to the first hot disk in the volumes and the default ClickHouse disk (usually the local SSD). The max_data_part_size_bytes configuration in volumes means that when the size of a part exceeds the set size, the storage policy will be triggered and the corresponding part will sink to the next volume, i.e. cold volume. In the above example, JuiceFS is the cold volume. The bottom move_ factor configuration means that ClickHouse will trigger the execution of the storage policy based on the portion of the remaining disk space. CREATE TABLE test ( d DateTime, ... ) ENGINE = MergeTree ... TTL d + INTERVAL 1 DAY TO DISK 'jfs' SETTINGS storage_policy = 'hot_and_cold'; Enter fullscreen mode Exit fullscreen mode As the above code snippet shows, you can set the storage_policy to the previously defined hot_and_cold storage policy in SETTINGS when you create a table or modify the schema of this table. The TTL in the second to last line of the above code is the time-based tiering rule mentioned above. In this example, we specify a column called d in the table, which is of type DateTime; with INTERVAL 1 DAY, that line of code presents that the data will be transferred to JuiceFS when new data is written in for more than one day. From JuiceFS/Juicedata.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project