-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
fds
Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc
-
delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)
At DagsHub, we’re integrated with DVC, which I love using. First and foremost, it’s open-source. It provides pipeline capabilities and supports many cloud providers for remote storage. Also, DVC acts as an extension to Git, which allows you to keep using the standard Git flow in your work. If you don’t want to use both tools, I recommend using FDS, an open-source tool that makes version control for machine learning fast & easy. It combines Git and DVC under one roof and takes care of code, data, and model versioning. (Bias alert: DagsHub developed FDS)
But as you can imagine, tracking each experiment with Git can become a hassle. We’d like to automate the logging process of each run. The same as for large file versioning, many tools emerged in recent years for experiment logging, such as W&B, MLflow, TensorBoard, and the list goes on. In this case, I believe that it doesn’t matter with which hammer you choose to hit the nail, as long as you punch it through.
At DagsHub, we’re integrated with DVC, which I love using. First and foremost, it’s open-source. It provides pipeline capabilities and supports many cloud providers for remote storage. Also, DVC acts as an extension to Git, which allows you to keep using the standard Git flow in your work. If you don’t want to use both tools, I recommend using FDS, an open-source tool that makes version control for machine learning fast & easy. It combines Git and DVC under one roof and takes care of code, data, and model versioning. (Bias alert: DagsHub developed FDS)
Git was designed for managing software development projects and for versioning text/code files. Therefore, Git doesn’t handle large files. Git released Git LFS (Large File System) to overcome large file versioning, which is better than Git, but fails when scaling. Also, both Git and Git LFS are not optimized for data science workflow. To overcome this challenge, many powerful tools emerged in recent years, such as DVC, Delta Lake, LakeFS, and more.
Git was designed for managing software development projects and for versioning text/code files. Therefore, Git doesn’t handle large files. Git released Git LFS (Large File System) to overcome large file versioning, which is better than Git, but fails when scaling. Also, both Git and Git LFS are not optimized for data science workflow. To overcome this challenge, many powerful tools emerged in recent years, such as DVC, Delta Lake, LakeFS, and more.
Git was designed for managing software development projects and for versioning text/code files. Therefore, Git doesn’t handle large files. Git released Git LFS (Large File System) to overcome large file versioning, which is better than Git, but fails when scaling. Also, both Git and Git LFS are not optimized for data science workflow. To overcome this challenge, many powerful tools emerged in recent years, such as DVC, Delta Lake, LakeFS, and more.