-
Kubernetes workloads are stateless by default. This lets you represent a lot of valid services (it's kinda like Heroku by default), but many times you will need persistent storage such as filesystem, relational, or object storage. In my homelab cluster, I have two StorageClasses: NFS mounting folders from my NAS (via the nfs-subdir-external-provisioner), and Longhorn. I ended up going object-storage native for this project so that I didn't have to worry about exhausting the storage on my homelab machines.
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
I did some digging and found out about Yandex's csi-s3. It's a StorageClass implementation that uses S3 buckets via geesefs as its backing store instead of storage devices (rotational drives, SSD, EBS, etc.). Unlike a lot of other StorageClass implementations I've tried this year, csi-s3 was really really easy to install. All I had to do was apply the release with helmfile and it was up:
-
rclone
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
Once that was done, I made a bucket (imaginatively named pvfm) and copied the data over with aws s3 sync. I probably could have gotten better performance out of rclone or s5cmd (or if I copied the data to my NAS with its 2.5 gigabit NIC), but I started it, went to sleep, and when I woke up it was done. When I looked back over the logs, I noticed that the main reason why it took so long was that a lot of the older files had many small files alongside of them (.cue sheets listing when each track started and stopped in the DJ set). Tigris handles many small files efficiently, but aws s3 sync didn't properly recycle HTTP connections so uploading a small file was way more costly than it probably should have been. Otherwise I was hitting the limits of the gigabit ethernet card in my shellbox. Sweet!
-
Once that was done, I made a bucket (imaginatively named pvfm) and copied the data over with aws s3 sync. I probably could have gotten better performance out of rclone or s5cmd (or if I copied the data to my NAS with its 2.5 gigabit NIC), but I started it, went to sleep, and when I woke up it was done. When I looked back over the logs, I noticed that the main reason why it took so long was that a lot of the older files had many small files alongside of them (.cue sheets listing when each track started and stopped in the DJ set). Tigris handles many small files efficiently, but aws s3 sync didn't properly recycle HTTP connections so uploading a small file was way more costly than it probably should have been. Otherwise I was hitting the limits of the gigabit ethernet card in my shellbox. Sweet!
-
Kubernetes workloads are stateless by default. This lets you represent a lot of valid services (it's kinda like Heroku by default), but many times you will need persistent storage such as filesystem, relational, or object storage. In my homelab cluster, I have two StorageClasses: NFS mounting folders from my NAS (via the nfs-subdir-external-provisioner), and Longhorn. I ended up going object-storage native for this project so that I didn't have to worry about exhausting the storage on my homelab machines.
-
So in order to create a PersistentVolumeClaim pointing at the pvfm bucket, I needed to make both a PersistentVolume and the PersistentVolumeClaim at the same time. This was fairly simple thanks to one of their examples, and I made a single config with both of them in it.
-
Applying the Deployment, Service, and Ingress went off without a hitch. cert-manager minted a new certificate and External DNS set the DNS target for me. All that was left was to make sure it worked.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
Related posts
-
MinIO Removes Web UI Features from Community Version, Pushes Users to Paid Plans
-
Show HN: Colanode, open-source and local-first Slack and Notion alternative
-
Show HN: Open Rewind – POC for audio and screen and video streaming to S3
-
Beware of Data Loss: Issue with MinIO's Tiering Feature
-
DevOps Isn't Dead, but It's Not in Great Health Either