reCOGnizer
progen
reCOGnizer | progen | |
---|---|---|
9 | 6 | |
26 | 565 | |
- | 2.5% | |
7.4 | 0.0 | |
4 months ago | 9 months ago | |
HTML | Python | |
BSD 3-clause "New" or "Revised" License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
reCOGnizer
-
Is there a way to achieve a similar COG annotation for Haloferax mediterranei?
reCOGnizer (https://github.com/iquasere/reCOGnizer) will do that for you. It also outputs the COG categories in a krona plot
-
Why do some HMMs come with a separate score table instead of embedding in as cut_ga, cut_tc, or cut_nc?
What tool are you using for analyzing them? Using something like reCOGnizer (https://github.com/iquasere/reCOGnizer) gives you the bitscores and evalues in a TSV table, next to the matches
-
Metadbgwas, my tool ! What could I do better ?
Finally, a huge help on developing is having a Continuous Integration/Deployment on the tool. CD refers to the conda suggestion above, as your tool would be readily available in a controlled environment. CI could use the GitHub Actions functionality provided by GitHub. You can see an example here
-
Any programs/packages that will allow me to compare cluster annotations obtained from metagenomic data?
You may run MOSCA (https://github.com/iquasere/MOSCA), it performs all major steps of metagenomics analysis. It includes that functional classification you are looking for, since with UPIMAPI (https://github.com/iquasere/UPIMAPI) it annotates with UniProt DB as reference, and obtains information including taxonomy, EC numbers, and even those GOs, and reCOGnizer (https://github.com/iquasere/reCOGnizer), which annotates with CDD DB as reference, and obtains orthologous groups information (COG, Pfam, etc).
-
Metatranscriptomics Workflow Questions?
Prediction of coding sequences takes as input the contigs you obtained, and gives you the translated genes. Besides annotating with the KEGG database, you may also want to annotate with more general purpose databases (e.g. UniProt), as these provide more taxonomies and functional information. MOSCA includes UPIMAPI (https://github.com/iquasere/UPIMAPI) and reCOGnizer (https://github.com/iquasere/reCOGnizer), which annotate genes with reference to UniProt and CDD databases using two different methods, providing complementary information. This is the same methodology used by widely popular tools such as eggNOG-mapper and Prokka, but these use other databases.
-
Problems using Prokka
Install with mamba instead of conda, no more problems. Or use UPIMAPI (https://github.com/iquasere/UPIMAPI) together with reCOGnizer (https://github.com/iquasere/reCOGnizer), since these tools obtain better results when annotating proteins
-
How would you name the program that does cog annotation?
I think reCOGnizer is a nice name
-
Is there any other tool for COG annotation of the bacterial genome than EggNOG mapper?
reCOGnizer (https://github.com/iquasere/reCOGnizer) can annotate with COGs, and the other databases available at CDD. It obtains all information concerning COGs description and categories, and outputs krona plots and TSV tables in formats easy to analyze. There is also mantis (https://github.com/PedroMTQ/mantis), prokka (https://github.com/tseemann/prokka) and DFAST (https://github.com/nigyta/dfast_core), the latter two work on contigs and the first two are for proteins
progen
-
Large language models generate functional protein sequences across families
I was supposed to be reply to another comment. The GitHub is from 2022:
https://github.com/salesforce/progen
-
What is a recent scientific discovery that you find exciting?
For all you programmer types, these are the repos for each of them. AlphaFold - ProGen - ProtGPT2
-
[R] Large language models generate functional protein sequences across diverse families
Code and models: https://github.com/salesforce/progen
- Salesforce/progen: projects and models for protein engineering and design
-
Myth debunked: Myths about nanorobots
This tool by SalesForce called ProGen is a LLM that can create new enyzmes from prompts: https://github.com/salesforce/progen
What are some alternatives?
Ory Keto - Open Source (Go) implementation of "Zanzibar: Google's Consistent, Global Authorization System". Ships gRPC, REST APIs, newSQL, and an easy and granular permission language. Supports ACL, RBAC, and other access models.
deepblast - Neural Networks for Protein Sequence Alignment
prokka - :zap: :aquarius: Rapid prokaryotic genome annotation
ProteinStructurePrediction - Protein structure prediction is the task of predicting the 3-dimensional structure (shape) of a protein given its amino acid sequence and any available supporting information. In this section, we will Install and inspect sidechainnet, a dataset with tools for predicting and inspecting protein structures, complete two simplified implementations of Attention based Networks for predicting protein angles from amino acid sequences, and visualize our predictions along the way.
orfipy - Fast and flexible ORF finder
esm - Evolutionary Scale Modeling (esm): Pretrained language models for proteins
guardian_db - Guardian DB integration for tracking tokens and ensuring logout cannot be replayed.
alphafold - Open source code for AlphaFold.
pyfaidx - Efficient pythonic random access to fasta subsequences
Biopython - Official git repository for Biopython (originally converted from CVS)
basaran - Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.