Python Databricks

Open-source Python projects categorized as Databricks

Top 11 Python Databrick Projects

  • Redash

    Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

  • Project mention: Redash: Connect to data source, easily visualize, dashboard and share your data | news.ycombinator.com | 2024-03-20
  • dolly

    Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

  • Project mention: "[D]" Using data from Alpaca for a commercial version of a Open LLM | /r/MachineLearning | 2023-07-02
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • sqlglot

    Python SQL Parser and Transpiler

  • Project mention: Transpile Any SQL to PostgreSQL Dialect | news.ycombinator.com | 2024-03-18

    Recommend checking out https://github.com/tobymao/sqlglot if you are interested in this capability for other SQL dialects

    Tools like this are helpful for:

    - Rendering SQL in a consistent way, eg for snapshot testing

  • dbrx

    Code examples and resources for DBRX, a large language model developed by Databricks

  • Project mention: Hello OLMo: A Open LLM | news.ycombinator.com | 2024-04-08

    One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.

  • optscale

    FinOps and MLOps platform to run ML/AI and regular cloud workloads with optimal performance and cost.

  • Project mention: Profile and instrument ML experiments and optimize their performance expenses | news.ycombinator.com | 2023-09-27
  • dbx

    🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

  • databricks-sdk-py

    Databricks SDK for Python (Beta)

  • Project mention: CI/CD for Databricks | /r/dataengineering | 2023-07-11

    To build custom deployment scripts, that go beyond declarative definitions, you are welcome to use https://github.com/databricks/databricks-sdk-py, https://github.com/databricks/databricks-sdk-jvm, and https://github.com/databricks/databricks-sdk-go.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • nutter

    Testing framework for Databricks notebooks

  • dbt-databricks

    A dbt adapter for Databricks.

  • Project mention: Curious if anyone has adopted a stack to do raw data ingestion in Databricks? | /r/dataengineering | 2023-04-25

    Our current data infra looks a little something like this: 1. Airbyte deployed on EKS for supported data connectors. I’m using the alpha Databricks connector to load directly into Unity Catalog. 1a. S3 bucket for raw landing zone storage if we cannot directly load into Databricks Managed Tables. 2. Orchestration, storage, and transformations are in Databricks. Calling out to the Airbyte api in the EKS cluster to keep all orchestrations inside Databricks. 2a. databricks-dbt for transformations & cleaning.

  • xonai-dashboard

    A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver

  • Project mention: Show HN: Open sourcing a Big Data monitoring tool | news.ycombinator.com | 2024-03-29
  • fastdbfs

    fastdbfs - An interactive command line client for Databricks DBFS.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Databricks related posts

Index

What are some of the best open-source Databrick projects in Python? This list will help you:

Project Stars
1 Redash 24,917
2 dolly 10,784
3 sqlglot 5,441
4 dbrx 2,363
5 optscale 969
6 dbx 433
7 databricks-sdk-py 297
8 nutter 261
9 dbt-databricks 180
10 xonai-dashboard 10
11 fastdbfs 4

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com