gretel-synthetics
thinc
gretel-synthetics | thinc | |
---|---|---|
4 | 4 | |
535 | 2,794 | |
3.2% | 0.5% | |
7.2 | 7.6 | |
5 days ago | 6 days ago | |
Python | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
gretel-synthetics
-
Ask HN: If we train an LLM with “data” instead of “language” tokens
Hey there! Co-founder of Gretel.ai here, and I think I can provide some insights on this topic.
Firstly, the concept you're hinting at is not purely traditional ML. In traditional machine learning, we often prioritize feature extraction and engineering specific to a given problem space before training.
What you're describing and what we've been working on at Gretel.ai, is leveraging the power of models like Large Language Models (LLMs) to understand and extrapolate from vast amounts of diverse data without the need for time-consuming feature engineering. Here's a link to our open-source library https://github.com/gretelai/gretel-synthetics for synthetic data generation (currently supporting GAN and RNN-based language models), and also our recent announcement around a Tabular LLM we're training to help people build with data https://gretel.ai/tabular-llm
A few areas where we've found tabular or Large Data Models to be really useful are:
-
Libraries for synthetic data?
you can try QuantGAN: https://github.com/PakAndrey/QuantGANforRisk also try DoppelGANger https://github.com/gretelai/gretel-synthetics/tree/master/src/gretel_synthetics/timeseries_dgan
- Which open source tool for generating synthetic data sets?
- Gretel-synthetics: open-source library to create synthetic datasets
thinc
-
JAX – NumPy on the CPU, GPU, and TPU, with great automatic differentiation
Agree, though I wouldn’t call PyTorch a drop-in for NumPy either. CuPy is the drop-in. Excepting some corner cases, you can use the same code for both. Thinc’s ops work with both NumPy and CuPy:
https://github.com/explosion/thinc/blob/master/thinc/backend...
-
Tinygrad: A simple and powerful neural network framework
I love those tiny DNN frameworks, some examples that I studied in the past (I still use PyTorch for work related projects) :
thinc.by the creators of spaCy https://github.com/explosion/thinc
-
good examples of functional-like python code that one can study?
thinc - defining neural nets in functional way jax, a new deep learning framework puts emphasis on functions rather than tensors, I've tested it for a couple of applications and it's really cool, you can write stuff like you'd write math expressions in papers using numpy. That speeds up development significantly, and makes code much more readable
- thinc - A refreshing functional take on deep learning, compatible with your favorite libraries
What are some alternatives?
Copulas - A library to model multivariate data using copulas.
quantulum3 - Library for unit extraction - fork of quantulum for python3
gretel-python-client - The Gretel Python Client allows you to interact with the Gretel REST API.
jax - Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
rex-gym - OpenAI Gym environments for an open-source quadruped robot (SpotMicro)
horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
adversarial-robustness-toolbox - Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
extending-jax - Extending JAX with custom C++ and CUDA code
CTGAN - Conditional GAN for generating synthetic tabular data.
dm-haiku - JAX-based neural network library
AI-basketball-analysis - :basketball::robot::basketball: AI web app and API to analyze basketball shots and shooting pose.
AIF360 - A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.