Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Databrick Open-Source Projects
-
Redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
optscale
FinOps and MLOps platform to run ML/AI and regular cloud workloads with optimal performance and cost.
-
multiwoven
🔥 Open Source Reverse ETL and Customer Data Platform (CDP). An open-source alternative to Hightouch, Census, and RudderStack.
-
dbx
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
-
analytics-toolbox-core
A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities
-
scalable-data-science
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Redash: Connect to data source, easily visualize, dashboard and share your data | news.ycombinator.com | 2024-03-20
Project mention: "[D]" Using data from Alpaca for a commercial version of a Open LLM | /r/MachineLearning | 2023-07-02
Recommend checking out https://github.com/tobymao/sqlglot if you are interested in this capability for other SQL dialects
Tools like this are helpful for:
- Rendering SQL in a consistent way, eg for snapshot testing
One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.
Project mention: Delta-rs – a Rust-based implementation of deltalake | news.ycombinator.com | 2024-04-08
Project mention: Profile and instrument ML experiments and optimize their performance expenses | news.ycombinator.com | 2023-09-27
Project mention: Multiwoven Reverse ETL (0.2.0) – Open-Source Alternative to Hightouch and Census | news.ycombinator.com | 2024-04-19Multiwoven is now a leading Open Source Alternative to Hightouch, Census, and Rudderstack.
It's been a great journey so far, and we are excited to announce a major update to Multiwoven - our new release, Multiwoven 0.2.0, is now available!
Repo: https://github.com/Multiwoven/multiwoven
This release brings a host of new features, enhancements, and bug fixes to streamline data syncs and user experience.
From new connectors to advanced reporting dashboards, as a team, we have been working hard on these updates based on the feedback and requests from our customers and the community.
- 10+ new connectors added to Multiwoven, including
Project mention: Show HN: Synmetrix – Open-Source Platform for Data and Metrics Management | news.ycombinator.com | 2024-02-28
To build custom deployment scripts, that go beyond declarative definitions, you are welcome to use https://github.com/databricks/databricks-sdk-py, https://github.com/databricks/databricks-sdk-jvm, and https://github.com/databricks/databricks-sdk-go.
Project mention: I can’t terraform my company’s Databricks environment and I’m going insane. | /r/dataengineering | 2023-06-20Use the Databricks terraform examples the external credentials and external locations in UC should help.
Project mention: Show HN: DataFlint, performance monitoring for Apache Spark | news.ycombinator.com | 2023-12-28
To build custom deployment scripts, that go beyond declarative definitions, you are welcome to use https://github.com/databricks/databricks-sdk-py, https://github.com/databricks/databricks-sdk-jvm, and https://github.com/databricks/databricks-sdk-go.
To build custom deployment scripts, that go beyond declarative definitions, you are welcome to use https://github.com/databricks/databricks-sdk-py, https://github.com/databricks/databricks-sdk-jvm, and https://github.com/databricks/databricks-sdk-go.
Databricks related posts
- Hello OLMo: A Open LLM
- Delta-rs – a Rust-based implementation of deltalake
- DBRX: A New Open LLM
- CI/CD for Databricks
- Databricks SDK for Python
- databricks/databricks-sdk-go: Databricks SDK for Go
- Official Python SDK for Databricks
-
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024
Index
What are some of the best open-source Databrick projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Redash | 24,917 |
2 | dolly | 10,784 |
3 | sqlglot | 5,441 |
4 | SynapseML | 4,967 |
5 | dbrx | 2,363 |
6 | spark | 1,997 |
7 | delta-rs | 1,820 |
8 | optscale | 969 |
9 | multiwoven | 617 |
10 | mlcraft | 467 |
11 | dbx | 433 |
12 | terraform-provider-databricks | 403 |
13 | databricks-sdk-py | 297 |
14 | nutter | 261 |
15 | analytics-toolbox-core | 185 |
16 | dbt-databricks | 180 |
17 | terraform-databricks-examples | 177 |
18 | scalable-data-science | 164 |
19 | stowage | 157 |
20 | spark | 123 |
21 | databricks-sdk-go | 43 |
22 | delta-go | 33 |
23 | databricks-sdk-java | 24 |
Sponsored