progen
deepblast
progen | deepblast | |
---|---|---|
6 | 2 | |
565 | 99 | |
2.5% | - | |
0.0 | 4.8 | |
9 months ago | about 1 month ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
progen
-
Large language models generate functional protein sequences across families
I was supposed to be reply to another comment. The GitHub is from 2022:
https://github.com/salesforce/progen
-
What is a recent scientific discovery that you find exciting?
For all you programmer types, these are the repos for each of them. AlphaFold - ProGen - ProtGPT2
-
[R] Large language models generate functional protein sequences across diverse families
Code and models: https://github.com/salesforce/progen
- Salesforce/progen: projects and models for protein engineering and design
-
Myth debunked: Myths about nanorobots
This tool by SalesForce called ProGen is a LLM that can create new enyzmes from prompts: https://github.com/salesforce/progen
deepblast
- [D] To all the machine learning engineers: most difficult model task/type you’ve ever had to work with?
-
Strains clustering based on predicted protein genes.
Haven't tried it, but you could try running TM-vec / DeepBLAST -- where embedding vector distance is designed to approximate structural similarity. We're still polishing the API for large queries, but it is already pretty fast, so give it a shot. We tried to design it to be a drop-in replacement for BLAST.
What are some alternatives?
ProteinStructurePrediction - Protein structure prediction is the task of predicting the 3-dimensional structure (shape) of a protein given its amino acid sequence and any available supporting information. In this section, we will Install and inspect sidechainnet, a dataset with tools for predicting and inspecting protein structures, complete two simplified implementations of Attention based Networks for predicting protein angles from amino acid sequences, and visualize our predictions along the way.
pypdb - A Python API for the RCSB Protein Data Bank (PDB)
esm - Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MolecularNodes - Toolbox for molecular animations in Blender, powered by Geometry Nodes.
alphafold - Open source code for AlphaFold.
evodiff - Generation of protein sequences and evolutionary alignments via discrete diffusion models
reCOGnizer - A tool for domain based annotation with databases from the Conserved Domains Database
Biopython - Official git repository for Biopython (originally converted from CVS)
basaran - Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.