Our great sponsors
-
esm
Evolutionary Scale Modeling (esm): Pretrained language models for proteins (by facebookresearch)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Such an explosion of protein AI lately. It’s the absolute best time to be a protein scientist with an interest in ML. Every new model type is inevitably tried out on proteins. In this case, by grad students at a very famous protein design lab (Baker Lab at University of Washington). And they usually find some interesting application. Protein design presents tons of interesting challenges.
The very largest plain transformer models trained on protein sequences (analogous to plain text) are about 15B parameters (I am thinking of Meta AI’s ESM-2 [1]). These can do for protein sequences what LLMs do for text (that is, they can “fill in the blank” to design variations, generate new proteins that look like their training data—which consists of all natural protein sequences), and tell you how likely it is that a given sequence exists.
Some cool variations of transformers have applications for protein design, like the now-famous SE(3) equivariant transformer used in the structure prediction module of AlphaFold [2], now appearing in TFA
1. https://github.com/facebookresearch/esm
2. https://github.com/deepmind/alphafold
Related posts
- Einsum in 40 Lines of Python
- Show HN: Free GitHub Copilot CLI with your own model or API
- Show HN: Cognita – open-source RAG framework for modular applications
- Show HN: Data Bonsai: a Python package to clean your data with LLMs
- Ask HN: Seeking On-Premises Website Examples for Uptime Comparison Experiment