tape
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (by songlab-cal)
ProFET
ProFET: Protein Feature Engineering Toolkit for Machine Learning (by ddofer)
tape | ProFET | |
---|---|---|
1 | 1 | |
620 | 56 | |
0.0% | - | |
0.0 | 0.0 | |
over 1 year ago | over 8 years ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | GNU General Public License v3.0 only |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
tape
Posts with mentions or reviews of tape.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-05-30.
-
ProteinBERT: A universal deep-learning model of protein sequence and function
We evaluated based on downstream tasks (multiple supervised benchmarks, including 4 from TAPE), not the LM performance.
ProFET
Posts with mentions or reviews of ProFET.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-05-30.
-
ProteinBERT: A universal deep-learning model of protein sequence and function
That should be trivial for it, attention models are good for "feature X exists somewhere in the text"/ That said, if your feature is just the presence of some short motif, why not just use n-gram/k-mer features? Those are invariant to location, and super fast/simple. I did some packages in the past for that, specially for proteins (PROFET, ASAP(for residue level)).
What are some alternatives?
When comparing tape and ProFET you can also consider the following projects:
protein-bert-pytorch - Implementation of ProteinBERT in Pytorch
fashion-mnist - A MNIST-like fashion product database. Benchmark :point_down:
beir - A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
evodiff - Generation of protein sequences and evolutionary alignments via discrete diffusion models
text - Models, data loaders and abstractions for language processing, powered by PyTorch
pypdb - A Python API for the RCSB Protein Data Bank (PDB)
protein_bert
openfold - Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
asap - Amino-Acid Sequence Annotation Predictor (ASAP)
ronin - RoNIN: Robust Neural Inertial Navigation in the Wild