Versatile-data-kit Alternatives

Similar projects and alternatives to versatile-data-kit

missing-semester

374 4,679 6.8 CSS versatile-data-kit VS missing-semester

The Missing Semester of Your CS Education 📚
aws-cdk

263 11,121 9.9 TypeScript versatile-data-kit VS aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Pulumi

178 19,630 9.9 Go versatile-data-kit VS Pulumi

Pulumi - Infrastructure as Code in any programming language. Build infrastructure intuitively on any cloud using familiar languages 🚀
Airflow

169 34,317 10.0 Python versatile-data-kit VS Airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airbyte

139 13,821 10.0 Python versatile-data-kit VS airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
superset

137 58,576 9.9 TypeScript versatile-data-kit VS superset

Apache Superset is a Data Visualization and Data Exploration Platform
terraform-cdk

104 4,709 9.9 TypeScript versatile-data-kit VS terraform-cdk

Define infrastructure resources using programming constructs and provision them using HashiCorp Terraform
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Apache Spark

101 38,249 10.0 Scala versatile-data-kit VS Apache Spark

Apache Spark - A unified analytics engine for large-scale data processing
dbt-core

86 8,842 9.7 Python versatile-data-kit VS dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Mage

76 6,953 9.9 Python versatile-data-kit VS Mage

🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Rudderstack

83 3,910 9.8 Go versatile-data-kit VS Rudderstack

Privacy and Security focused Segment-alternative, in Golang and React
Benthos

76 7,516 9.6 Go versatile-data-kit VS Benthos

Fancy stream processing made operationally mundane
Apache Arrow

75 13,442 10.0 C++ versatile-data-kit VS Apache Arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
delta

69 6,847 9.8 Scala versatile-data-kit VS delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)
dagster

46 10,114 10.0 Python versatile-data-kit VS dagster

An orchestration platform for the development, production, and observation of data assets.
Trino

44 9,519 10.0 Java versatile-data-kit VS Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
sqlfluff

35 7,189 9.6 Python versatile-data-kit VS sqlfluff

A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
Apache Hadoop

26 14,280 9.9 Java versatile-data-kit VS Apache Hadoop

Apache Hadoop
pinot

15 5,114 9.9 Java versatile-data-kit VS pinot

Apache Pinot - A realtime distributed OLAP datastore
Reddit-API-Pipeline

7 271 0.0 Python versatile-data-kit VS Reddit-API-Pipeline
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better versatile-data-kit alternative or higher similarity.

Suggest an alternative to versatile-data-kit

versatile-data-kit reviews and mentions

Posts with mentions or reviews of versatile-data-kit. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-11-23.

Can we take a moment to appreciate how much of dataengineering is open source?
8 projects | /r/dataengineering | 23 Nov 2022

Free, Python+SQL ELT pipelines framework with orchestration functionality https://github.com/vmware/versatile-data-kit

8 projects | /r/dataengineering | 23 Nov 2022

If you wish to contribute, projects usually have good first issues: https://github.com/vmware/versatile-data-kit/labels/good%20first%20issue If you wish to learn, check out examples: https://github.com/vmware/versatile-data-kit/tree/main/examples
DE Open Source
2 projects | /r/dataengineering | 13 Nov 2022

Versatile Data Kit is a framework to bBuild, run and manage your data pipelines with Python or SQL on any cloud https://github.com/vmware/versatile-data-kit here's a list of good first issues: https://github.com/vmware/versatile-data-kit/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 Join our slack channel to connect with our team: https://cloud-native.slack.com/archives/C033PSLKCPR
What is a personality type of a Data Engineer?
2 projects | /r/dataengineering | 26 Oct 2022

Okay, I will explain what I am doing and how I see the "fun" in the project. I work with an open-source framework for data engineers. The community members are developers and people who use the tool - DEs. Indeed, I am facilitating a monthly community meeting for everyone to meet and discuss important topics, but that's the only part that takes their direct time, and it's totally voluntary, so DEs usually don't join, but I'm glad that the developers are joining and participating. What I was having in mind is more of a design and promotion question. I have a vision for open source projects to have a feel of friendliness, and openness (fun) which I communicate through design and visuals that are part of the repo and information we share about the project. And, as I don't find long texts engaging, because I literally can't focus when I see a long description of, say, a GitHub repo, I have an internal struggle against very detailed descriptions. That said, I am having an internal wish to transform the project into something more like this: https://github.com/mage-ai/mage-ai Instead of this: https://github.com/vmware/versatile-data-kit But I'm questioning myself, and thinking that maybe it is better suited for DEs as it is.
Best Open source no-code ELT tool for startup
5 projects | /r/dataengineering | 29 Aug 2022

Opensource, good for basic SQL and/or Python skills, extensible and provides support in setup/adoption of the framework. https://github.com/vmware/versatile-data-kit I'm the community manager for this project, I built my first full ELT pipeline (tracking GitHub stats) with no previous experience on my first month totally by myself. It's covering the full data journey. Oh, and it has Airflow integration, with that you can have a dashboard to see your jobs, dependencies but has better/more intuitive scheduling.
I created a pipeline extracting Reddit data using Airflow, Docker, Terraform, S3, dbt, Redshift, and Google Data Studio
7 projects | /r/dataengineering | 25 Jun 2022

In order to simplify steps 1-5 I can bring another framework to your attention - Versatile Data Kit (entirely open-source) which allows you to create data jobs (being it ingestion, transformation, publishing) with SQL/ Python, which runs on any cloud and is also multi-tenant.
ELT of my own Strava data using the Strava API, MySQL, Python, S3, Redshift, and Airflow
2 projects | /r/dataengineering | 24 Jun 2022

I believe that you would not need to build the docker image yourself. There are data engineering frameworks which allow you to build your data jobs yourself and take care of the containerisation of your pipeline. You can have a look at this ingest from rest API example. They would also allow you to schedule your data job using cron, while data job itself can contain SQL & Python.
How-to-Guide: Contributing to Open Source
19 projects | /r/dataengineering | 11 Jun 2022
Has anyone "inherited" a pipeline/code/model that was so poorly written they wanted to quit their job?
2 projects | /r/datascience | 3 May 2022

I wouldn't stay there if they absolutely disagree with changing things, it would drain my energy and I'd just get sad and depressed, on the other hand, if you decide to go for it and try to untangle this mess, I think it would contribute to the confidence, but take some real patience and persistence. I'm a real automation geek, everything that can be automated should be. Maybe if you wish for advice, I would check out this open-source DataOps / automation tool here: https://github.com/vmware/versatile-data-kit maybe it helps, maybe not, whatever you do, good luck!
Python or Tool for Pipelines
2 projects | /r/dataengineering | 9 Dec 2021

I would recommend taking a look at Versatile Data Kit . It is an open-source tool that covers the full end-to-end cycle of data engineering with data ops practices embedded - from ingesting data from a source system, transformations (including implementation of some design patterns like Kimbal) and publishing data (for reports, apps) .
A note from our sponsor - SaaSHub
www.saashub.com | 17 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Stats

Basic versatile-data-kit repo stats

Mentions

Stars

409

Activity

9.7

Last Commit

about 10 hours ago

vmware/versatile-data-kit is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of versatile-data-kit is Python.