Databricks Strikes $1.3B Deal for Generative AI Startup MosaicML

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • delta

    An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)

  • Databricks provides Jupyter lab like notebooks for analysis and ETL pipelines using spark through pyspark, sparkql or scala. I think R is supported as well but it doesn't interop as well with their newer features as well as python and SQL do. It interfaces with cloud storage backend like S3 and offers some improvements to the parquet format of data querying that allows for updating, ordering and merged through https://delta.io . They integrate pretty seamlessly to other data visualisation tooling if you want to use it for that but their built in graphs are fine for most cases. They also have ML on rails type through menus and models if I recall but I typically don't use it for that. I've typically used it for ETL or ELT type workflows for data that's too big or isn't stored in a database.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • open_llama

    OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

  • OpenLLaMA models up to 13B parameters have now been trained on 1T tokens:

    https://github.com/openlm-research/open_llama

  • ggml

    Tensor library for machine learning

  • Mosaic's MPT models are already supported in GGML: https://github.com/ggerganov/ggml

    Here's MPT-30B running in 4-bit precision on CPU :) https://twitter.com/abacaj/status/1673133443339763712?s=20

  • qlora

    QLoRA: Efficient Finetuning of Quantized LLMs

  • I used: https://github.com/artidoro/qlora but there are quite a few others that likely work better. It was literally my first attempt at doing anything like this, and took the better part of an evening to work through CUDA/Python issues to get it training, and ~20 hours of training.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [D] Is there other better data format for LLM to generate structured data?

    1 project | /r/MachineLearning | 10 Dec 2023
  • Delta vs Iceberg: make love not war

    1 project | /r/MicrosoftFabric | 30 Jun 2023
  • Medallion/lakehouse architecture data modelling

    1 project | /r/dataengineering | 3 Jun 2023
  • whenNotMatchedBySourceUpdate not existing? Trying to upsert parquet into Delta table

    1 project | /r/apachespark | 10 May 2023
  • Delta.io/deltalake self hosting

    2 projects | /r/bigdata | 26 Apr 2023