Databricks Strikes $1.3B Deal for Generative AI Startup MosaicML

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

delta

69 6,897 9.8 Scala

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)

Databricks provides Jupyter lab like notebooks for analysis and ETL pipelines using spark through pyspark, sparkql or scala. I think R is supported as well but it doesn't interop as well with their newer features as well as python and SQL do. It interfaces with cloud storage backend like S3 and offers some improvements to the parquet format of data querying that allows for updating, ordering and merged through https://delta.io . They integrate pretty seamlessly to other data visualisation tooling if you want to use it for that but their built in graphs are fine for most cases. They also have ML on rails type through menus and models if I recall but I typically don't use it for that. I've typically used it for ETL or ELT type workflows for data that's too big or isn't stored in a database.

open_llama

52 7,193 5.3

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

OpenLLaMA models up to 13B parameters have now been trained on 1T tokens:
https://github.com/openlm-research/open_llama

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ggml

69 9,642 9.8 C

Tensor library for machine learning

Mosaic's MPT models are already supported in GGML: https://github.com/ggerganov/ggml
Here's MPT-30B running in 4-bit precision on CPU :) https://twitter.com/abacaj/status/1673133443339763712?s=20

qlora

80 9,388 7.4 Jupyter Notebook

QLoRA: Efficient Finetuning of Quantized LLMs

I used: https://github.com/artidoro/qlora but there are quite a few others that likely work better. It was literally my first attempt at doing anything like this, and took the better part of an evening to work through CUDA/Python issues to get it training, and ~20 hours of training.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project