Advice on backing up a Ceph cluster

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • backy2

    backy2: Deduplicating block based backup software for ceph/rbd, image files and devices

  • I've been a DataHoarder for a while, but only a modest ~10TB or so. I finally had the space to set up a rack and some servers, and am setting up a Ceph cluster with a ton of old disks I've accumulated over the years, totaling upwards of 20TB. I would like to still have an offsite and preferably offline backup for this data though, but backing up 20+ TB of data to a single drive is obviously off the table. Is there any other alternative to just deploying another Ceph cluster offsite? I don't want to use cloud storage due to the costs, and I also very much prefer to keep all my data under my own physical control. I was looking at Backy2 for the actual extraction of data and writing it to a destination, but that doesn't seem to support idempotent writes (i.e. take one full object and place it on a single drive). I could theoretically combine drives via LVM, but without additional redundancy (I would probably use raid 1 for that) losing one drive would be disastrous, and I am trying to avoid having to add additional redundancy for backups, considering the main ceph cluster will already have 3 copies of the data on it. I also am wondering if I should avoid using Ceph for the backups, since then all my eggs would be in the Ceph basket so to speak. I would love some advice from some of the folks with larger hoards and how you make backups. Thank you!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • PVE-based Ceph cluster build (II): Ceph storage pool build and basic performance testing

    1 project | /r/homelab | 13 Jan 2023
  • Backups for virtual servers on Ceph

    1 project | /r/ceph | 28 Jul 2021
  • Are small ceph clusters viable?

    3 projects | /r/ceph | 11 Jun 2023
  • Reddit downloader that works after API upgrade

    1 project | /r/DataHoarder | 10 Oct 2023
  • Starfield Xbox no deluxe edition support?

    1 project | /r/GeForceNOW | 15 Sep 2023