cookiecutter-snakemake-workflow
metaGEM
cookiecutter-snakemake-workflow | metaGEM | |
---|---|---|
1 | 14 | |
55 | 170 | |
- | - | |
1.8 | 6.3 | |
over 2 years ago | 4 months ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
cookiecutter-snakemake-workflow
-
Development notes in Snakemake workflows
I currently use cookiecutter to start new Snakemake repositories (projects) because it's really comfortable and guarantees that I follow an organized and recommended directory structure. However, I have a tendency to make notes during development. Maybe I made some decision based on something I learned in biostars or reddit and I want to remember that, or I start by outlining my workflow steps in written form. So far I've used both the README file and the Snakemake file (comments) to do that, but I'm wondering if anyone has a suggestion for how to fit development notes within a Snakemake project.
metaGEM
-
Metagenomic samples analysis: From taxonomic classification to reads mapping
Maybe this workflow will be interesting to you: it assembles metagenomes, generates bins, and also creates metabolic models for predicting interactions within microbiomes 💎 https://github.com/franciscozorrilla/metaGEM
-
Is Developing a Multi-omics Pipeline Feasable for a Masters Thesis?
For example, I started developing metaGEM for my masters thesis, although it took another 2 years of work to get the publication and GitHub repo ready 💎 Although it’s not multi-omics, it spans across metagenomic analysis and metabolic modeling.
-
Favorite Pipeline/Methods Figure
shameless self plug https://github.com/franciscozorrilla/metaGEM it helps to get feedback from your colleagues, especially the more design-minded ones. here's what mine looked like before feedback https://github.com/franciscozorrilla/metaGEM/wiki
-
Why bother reconstructing MAGs ?
TLDR you get higher genomic resolution compared to 16S. Also consider the fact that there is a lot of strain level variation within species, which you completely miss out on without shotgun or long read sequencing. Self plugging our workflow that takes in shotgun sequencing reads, assembles MAGs and then reconstructs metabolic models that can be used for flux balance analysis simulations https://github.com/franciscozorrilla/metaGEM
-
MetaQuast for assembling samples from complex communities
In my experience I haven’t found metaquast or other assembly evaluation tools very useful precisely because they are geared toward reference genome based assessment. I don’t think there is a standardized way of assessing your assemblies (someone please correct me if I’m wrong), but it helps to look at the distribution of contig lengths. For example, an assembly with a distribution peak around 10kbp is much better than a peak around 1kbp. In your case you probably want to bin the assembled contigs into MAGs and then assess the quality of those genomes using a tool like CheckM or BUSCO. If you want to get an idea of tools/workflows you can use then maybe check out the metaGEM pipeline on GitHub or read the paper
-
Finding BGCs from antiSMASH database in metagenomes
One thing you could try is generating metagenome assembled genomes (MAGs) from each metagenome, use those MAGs to automatically reconstruct genome scale metabolic models (GEMs), and then do flux balance analysis (FBA) based simulations to compare the predicted metabolism across different treatments. We developed the metaGEM pipeline for exactly this purpose, you can read more about it here or check it out on GitHub
-
Need guidance/plan/roadmap to transition from bioinformatics to systems biology and netwrok biology.
At the risk of self promotion, I would invite you to check out the metaGEM 💎 GitHub repo/paper. This was originally my MS thesis which I published at the start of my PhD. It is a Snakemake workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from short read metagenomic data, i.e. bioinformatics + systems biology. If this is interesting, you may also want to check out my other pinned repos which include related tutorials and resources.
-
Bad tools that NEED improvement
Paper: https://academic.oup.com/nar/article/49/21/e126/6382386 GitHub: https://github.com/franciscozorrilla/metaGEM
-
Looking for Voluntary/Part-time Bioinformatics projects/work!!
Perhaps you may be interested in contributing to the development of the metaGEM pipeline? There are a number of ideas/modifications that I wanted to explore but have not had the time to do yet, you can find these in the issues sections with the "method" flag (https://github.com/franciscozorrilla/metaGEM/labels/method). You can have a look and see if anything piques your interest, in particular I think this would be a good one to address: https://github.com/franciscozorrilla/metaGEM/issues/31 . Here is the paper if you want to get more info about the pipeline itself: doi.org/10.1093/nar/gkab815
-
Advice on how to go about genome scale metabolic model construction
GitHub: https://github.com/franciscozorrilla/metaGEM Paper: https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkab815/6382386
What are some alternatives?
cookiecutter-data-science - A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
carveme - CarveMe: genome-scale metabolic model reconstruction
pypsa-eur - PyPSA-Eur: A Sector-Coupled Open Optimisation Model of the European Energy System
EukCC - Tool to estimate genome quality of microbial eukaryotes
FastAPI-template - Feature rich robust FastAPI template.
EukRep - Classification of Eukaryotic and Prokaryotic sequences from metagenomic datasets
hecatomb - hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
PhaMers - A bioinformatic tool for identifying bacteriophages using machine learning and k-mers
GraphBin2 - ☯️🧬 Refined and Overlapped Binning of Metagenomic Contigs Using Assembly Graphs
quast - Genome assembly evaluation tool
merqury - k-mer based assembly evaluation
aviary - A hybrid assembly and MAG recovery pipeline (and more!)