Databricks

Open-source projects categorized as Databricks

Top 23 Databrick Open-Source Projects

  • Redash

    Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

  • Project mention: Redash: Connect to data source, easily visualize, dashboard and share your data | news.ycombinator.com | 2024-03-20
  • dolly

    Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

  • Project mention: "[D]" Using data from Alpaca for a commercial version of a Open LLM | /r/MachineLearning | 2023-07-02
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • sqlglot

    Python SQL Parser and Transpiler

  • Project mention: Transpile Any SQL to PostgreSQL Dialect | news.ycombinator.com | 2024-03-18

    Recommend checking out https://github.com/tobymao/sqlglot if you are interested in this capability for other SQL dialects

    Tools like this are helpful for:

    - Rendering SQL in a consistent way, eg for snapshot testing

  • SynapseML

    Simple and Distributed Machine Learning

  • Project mention: FLaNK Stack Weekly for 12 September 2023 | dev.to | 2023-09-12
  • dbrx

    Code examples and resources for DBRX, a large language model developed by Databricks

  • Project mention: Hello OLMo: A Open LLM | news.ycombinator.com | 2024-04-08

    One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.

  • spark

    .NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers. (by dotnet)

  • delta-rs

    A native Rust library for Delta Lake, with bindings into Python

  • Project mention: Delta-rs – a Rust-based implementation of deltalake | news.ycombinator.com | 2024-04-08
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • optscale

    FinOps and MLOps platform to run ML/AI and regular cloud workloads with optimal performance and cost.

  • Project mention: Profile and instrument ML experiments and optimize their performance expenses | news.ycombinator.com | 2023-09-27
  • multiwoven

    🔥 Open Source Reverse ETL and Customer Data Platform (CDP). An open-source alternative to Hightouch, Census, and RudderStack.

  • Project mention: Multiwoven Reverse ETL (0.2.0) – Open-Source Alternative to Hightouch and Census | news.ycombinator.com | 2024-04-19

    Multiwoven is now a leading Open Source Alternative to Hightouch, Census, and Rudderstack.

    It's been a great journey so far, and we are excited to announce a major update to Multiwoven - our new release, Multiwoven 0.2.0, is now available!

    Repo: https://github.com/Multiwoven/multiwoven

    This release brings a host of new features, enhancements, and bug fixes to streamline data syncs and user experience.

    From new connectors to advanced reporting dashboards, as a team, we have been working hard on these updates based on the feedback and requests from our customers and the community.

    - 10+ new connectors added to Multiwoven, including

  • mlcraft

    Synmetrix – open source semantic layer / Boost your LLM precision

  • Project mention: Show HN: Synmetrix – Open-Source Platform for Data and Metrics Management | news.ycombinator.com | 2024-02-28
  • dbx

    🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

  • terraform-provider-databricks

    Databricks Terraform Provider

  • databricks-sdk-py

    Databricks SDK for Python (Beta)

  • Project mention: CI/CD for Databricks | /r/dataengineering | 2023-07-11

    To build custom deployment scripts, that go beyond declarative definitions, you are welcome to use https://github.com/databricks/databricks-sdk-py, https://github.com/databricks/databricks-sdk-jvm, and https://github.com/databricks/databricks-sdk-go.

  • nutter

    Testing framework for Databricks notebooks

  • analytics-toolbox-core

    A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift, Postgres and Databricks with Spatial Analytics capabilities

  • dbt-databricks

    A dbt adapter for Databricks.

  • terraform-databricks-examples

    Examples of using Terraform to deploy Databricks resources

  • Project mention: I can’t terraform my company’s Databricks environment and I’m going insane. | /r/dataengineering | 2023-06-20

    Use the Databricks terraform examples the external credentials and external locations in UC should help.

  • scalable-data-science

    Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

  • stowage

    Bloat-free, no BS cloud storage SDK.

  • spark

    Performance Observability for Apache Spark (by dataflint)

  • Project mention: Show HN: DataFlint, performance monitoring for Apache Spark | news.ycombinator.com | 2023-12-28
  • databricks-sdk-go

    Databricks SDK for Go

  • Project mention: CI/CD for Databricks | /r/dataengineering | 2023-07-11

    To build custom deployment scripts, that go beyond declarative definitions, you are welcome to use https://github.com/databricks/databricks-sdk-py, https://github.com/databricks/databricks-sdk-jvm, and https://github.com/databricks/databricks-sdk-go.

  • delta-go

    Native Delta Lake Implementation in Go

  • Project mention: Delta-go supports Azure Blob now | /r/golang | 2023-05-26
  • databricks-sdk-java

    Databricks SDK for Java

  • Project mention: CI/CD for Databricks | /r/dataengineering | 2023-07-11

    To build custom deployment scripts, that go beyond declarative definitions, you are welcome to use https://github.com/databricks/databricks-sdk-py, https://github.com/databricks/databricks-sdk-jvm, and https://github.com/databricks/databricks-sdk-go.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Databricks related posts

Index

What are some of the best open-source Databrick projects? This list will help you:

Project Stars
1 Redash 24,917
2 dolly 10,784
3 sqlglot 5,441
4 SynapseML 4,967
5 dbrx 2,363
6 spark 1,997
7 delta-rs 1,820
8 optscale 969
9 multiwoven 617
10 mlcraft 467
11 dbx 433
12 terraform-provider-databricks 403
13 databricks-sdk-py 297
14 nutter 261
15 analytics-toolbox-core 185
16 dbt-databricks 180
17 terraform-databricks-examples 177
18 scalable-data-science 164
19 stowage 157
20 spark 123
21 databricks-sdk-go 43
22 delta-go 33
23 databricks-sdk-java 24

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com