versatile-data-kit
quadratic
Our great sponsors
versatile-data-kit | quadratic | |
---|---|---|
52 | 9 | |
406 | 2,653 | |
3.4% | 6.3% | |
9.8 | 10.0 | |
2 days ago | 6 days ago | |
Python | TypeScript | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
versatile-data-kit
-
Can we take a moment to appreciate how much of dataengineering is open source?
Free, Python+SQL ELT pipelines framework with orchestration functionality https://github.com/vmware/versatile-data-kit
If you wish to contribute, projects usually have good first issues: https://github.com/vmware/versatile-data-kit/labels/good%20first%20issue If you wish to learn, check out examples: https://github.com/vmware/versatile-data-kit/tree/main/examples
-
DE Open Source
Versatile Data Kit is a framework to bBuild, run and manage your data pipelines with Python or SQL on any cloud https://github.com/vmware/versatile-data-kit here's a list of good first issues: https://github.com/vmware/versatile-data-kit/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 Join our slack channel to connect with our team: https://cloud-native.slack.com/archives/C033PSLKCPR
-
What is a personality type of a Data Engineer?
Okay, I will explain what I am doing and how I see the "fun" in the project. I work with an open-source framework for data engineers. The community members are developers and people who use the tool - DEs. Indeed, I am facilitating a monthly community meeting for everyone to meet and discuss important topics, but that's the only part that takes their direct time, and it's totally voluntary, so DEs usually don't join, but I'm glad that the developers are joining and participating. What I was having in mind is more of a design and promotion question. I have a vision for open source projects to have a feel of friendliness, and openness (fun) which I communicate through design and visuals that are part of the repo and information we share about the project. And, as I don't find long texts engaging, because I literally can't focus when I see a long description of, say, a GitHub repo, I have an internal struggle against very detailed descriptions. That said, I am having an internal wish to transform the project into something more like this: https://github.com/mage-ai/mage-ai Instead of this: https://github.com/vmware/versatile-data-kit But I'm questioning myself, and thinking that maybe it is better suited for DEs as it is.
-
Best Open source no-code ELT tool for startup
Opensource, good for basic SQL and/or Python skills, extensible and provides support in setup/adoption of the framework. https://github.com/vmware/versatile-data-kit I'm the community manager for this project, I built my first full ELT pipeline (tracking GitHub stats) with no previous experience on my first month totally by myself. It's covering the full data journey. Oh, and it has Airflow integration, with that you can have a dashboard to see your jobs, dependencies but has better/more intuitive scheduling.
-
I created a pipeline extracting Reddit data using Airflow, Docker, Terraform, S3, dbt, Redshift, and Google Data Studio
In order to simplify steps 1-5 I can bring another framework to your attention - Versatile Data Kit (entirely open-source) which allows you to create data jobs (being it ingestion, transformation, publishing) with SQL/ Python, which runs on any cloud and is also multi-tenant.
-
ELT of my own Strava data using the Strava API, MySQL, Python, S3, Redshift, and Airflow
I believe that you would not need to build the docker image yourself. There are data engineering frameworks which allow you to build your data jobs yourself and take care of the containerisation of your pipeline. You can have a look at this ingest from rest API example. They would also allow you to schedule your data job using cron, while data job itself can contain SQL & Python.
- How-to-Guide: Contributing to Open Source
-
Has anyone "inherited" a pipeline/code/model that was so poorly written they wanted to quit their job?
I wouldn't stay there if they absolutely disagree with changing things, it would drain my energy and I'd just get sad and depressed, on the other hand, if you decide to go for it and try to untangle this mess, I think it would contribute to the confidence, but take some real patience and persistence. I'm a real automation geek, everything that can be automated should be. Maybe if you wish for advice, I would check out this open-source DataOps / automation tool here: https://github.com/vmware/versatile-data-kit maybe it helps, maybe not, whatever you do, good luck!
-
Python or Tool for Pipelines
I would recommend taking a look at Versatile Data Kit . It is an open-source tool that covers the full end-to-end cycle of data engineering with data ops practices embedded - from ingesting data from a source system, transformations (including implementation of some design patterns like Kimbal) and publishing data (for reports, apps) .
quadratic
-
Quadratic – Open-Source Spreadsheet Is Now Multiplayer
Unlike other spreadsheets, Quadratic has an infinite canvas (like Figma). As a result, you can pinch and zoom to navigate large data sets, and everything renders smoothly at 60fps.
Today, we launched real-time Multiplayer on Product Hunt!
Quadratic is built using WebGL and Rust WASM. We built our multiplayer service from scratch in Rust to handle large amounts of data smoothly. For smooth rendering of a large grid of data, cells and text are rendered using low-level WebGL for performance. If you are interested in the technical details, check us out on GitHub (https://github.com/quadratichq/quadratic/)
What do you think? Can we make a spreadsheet that developers actually enjoy using?
-
suggestions for a free spreadsheet library (like excel or google spreadsheets)
Hey guys, as the title says I am looking for some suggestions on useful free spreadsheet libraries, similar to excel or google sheets. I saw quadratic (https://github.com/quadratichq/quadratic), which looks really interesting but is in very early alpha, to the point I cant even embed it in my own project yet.
-
Show HN: Quadratic – Open-Source Spreadsheet with Python, & AI (WASM and WebGL)
Hi, I am David Kircos. The Founder of Quadratic (https://QuadraticHQ.com), an open-source spreadsheet application that supports Python, SQL (coming soon), AI Prompts, and classic Formulas.
Unlike other spreadsheets, Quadratic has an infinite canvas (like Figma). As a result, you can pinch and zoom to navigate large data sets, and everything renders smoothly at 60fps.
Our vision is to build a place where your team can collaborate on data analysis. You can write Python, AI Prompts, and Formulas in one spreadsheet feeding each other data and updating automatically.
Quadratic is built using WebGL and Rust WASM. To render a large grid of cells smoothly, we tile the spreadsheet similar to google maps. If you are interested in the technical details, check us out on GitHub (https://github.com/quadratichq/quadratic/)
You can use AI to help you write Python and then run the code directly in Quadratic. Then, we feed the result back to the AI model so it can follow along, help you debug, and modify your existing code.
AI can also be used to directly generate data onto the sheet with prompts. It knows the context of what's on the sheet and how the data it's inserting fits in. Try it out.
SQL is coming soon... stay tuned!
Yes the bundle is huge, we have made no effort yet to optimize it. Feel free to create a PR :)
Here is how we manage cell dependencies https://github.com/quadratichq/quadratic/blob/main/src/grid/...
- TIL: The autocorrect feature in Excel, which converts certain combinations into dates, has mangled up to 30% of published papers, causing significant issues. As a result, at least 27 gene symbols have been forced to change to prevent further errors from occurring.
What are some alternatives?
data-engineering-zoomcamp - Free Data Engineering course!
astro-sdk - Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Mage - 🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
pyramid-jsonapi - Auto-build JSON API from sqlalchemy models using the pyramid framework
hamilton - A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Reddit-API-Pipeline
mito - The mitosheet package, trymito.io, and other public Mito code.
dbt-data-reliability - dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
data-engineering-wiki - The best place to learn data engineering. Built and maintained by the data engineering community.
missing-semester - The Missing Semester of Your CS Education 📚
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
AWS Data Wrangler - pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).