The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 11 Python Databrick Projects
-
Redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
optscale
FinOps and MLOps platform to run ML/AI and regular cloud workloads with optimal performance and cost.
-
dbx
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
xonai-dashboard
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
Project mention: Redash: Connect to data source, easily visualize, dashboard and share your data | news.ycombinator.com | 2024-03-20
Project mention: "[D]" Using data from Alpaca for a commercial version of a Open LLM | /r/MachineLearning | 2023-07-02
Recommend checking out https://github.com/tobymao/sqlglot if you are interested in this capability for other SQL dialects
Tools like this are helpful for:
- Rendering SQL in a consistent way, eg for snapshot testing
One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.
Project mention: Profile and instrument ML experiments and optimize their performance expenses | news.ycombinator.com | 2023-09-27
To build custom deployment scripts, that go beyond declarative definitions, you are welcome to use https://github.com/databricks/databricks-sdk-py, https://github.com/databricks/databricks-sdk-jvm, and https://github.com/databricks/databricks-sdk-go.
Project mention: Curious if anyone has adopted a stack to do raw data ingestion in Databricks? | /r/dataengineering | 2023-04-25Our current data infra looks a little something like this: 1. Airbyte deployed on EKS for supported data connectors. I’m using the alpha Databricks connector to load directly into Unity Catalog. 1a. S3 bucket for raw landing zone storage if we cannot directly load into Databricks Managed Tables. 2. Orchestration, storage, and transformations are in Databricks. Calling out to the Airbyte api in the EKS cluster to keep all orchestrations inside Databricks. 2a. databricks-dbt for transformations & cleaning.
Project mention: Show HN: Open sourcing a Big Data monitoring tool | news.ycombinator.com | 2024-03-29
Python Databricks related posts
- Hello OLMo: A Open LLM
- DBRX: A New Open LLM
- Databricks SDK for Python
- Official Python SDK for Databricks
- How much object orienteered do you use in your projects? Bonus points for integration and unit tests
- how/where do you define your databricks jobs, tasks and workflows?
- Any suggestions for building DBT project on DataBricks?
-
A note from our sponsor - WorkOS
workos.com | 24 Apr 2024
Index
What are some of the best open-source Databrick projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Redash | 24,917 |
2 | dolly | 10,784 |
3 | sqlglot | 5,441 |
4 | dbrx | 2,363 |
5 | optscale | 969 |
6 | dbx | 433 |
7 | databricks-sdk-py | 297 |
8 | nutter | 261 |
9 | dbt-databricks | 180 |
10 | xonai-dashboard | 10 |
11 | fastdbfs | 4 |
Sponsored