Top 22 Python Bioinformatic Projects
Official git repository for Biopython (originally converted from CVS)Project mention: Seq: A programming language for high-performance computational genomics | news.ycombinator.com | 2021-09-15
It might be pretty useful as a teaching tool, but I'm skeptical of its long-term benefit to professionals. I'm not sure the ecosystem of Seq users will be large enough, y'know? Again, it's pretty impressive work, and it's come a long way. I wish the devs all the best. :)
Run Linux Software Faster and Safer than Linux with Unikernels.
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.Project mention: [D] Is deep learning having an impact in life sciences yet? | reddit.com/r/bioinformatics | 2021-07-04
Deepvariant is a another example.
Single-Cell Analysis in Python. Scales to >1M cells.Project mention: Flipping one histogram below the axis. | reddit.com/r/matlab | 2021-09-02
I am plotting 2 1-D histograms on top of one another using a hold on command. Is there a way to have one histogram be upside down, and then to flip the entire plot 90˚? I am looking to create a violin plot https://github.com/theislab/scanpy/issues/1448 on my own, having one histogram on the left, and one on the right.
Data intensive science for everyone. (by galaxyproject)Project mention: Developed a new kind of dual extruder system on fully custom built 3D printer | reddit.com/r/3Dprinting | 2021-03-01
Aggregate results from bioinformatics analyses across many samples into a single report.Project mention: How to use MultiQC? I am trying to run it to compile the summary from my FASTQC but I keep getting the "Sample has no read" error. | reddit.com/r/bioinformatics | 2021-08-18
If that all checks out then I would have to see more of your files in order to help, sorry. Submitting the issue at https://github.com/ewels/MultiQC/issues would help you more
Scalable genomic data analysis.Project mention: Ask HN: Who is hiring? (July 2021) | news.ycombinator.com | 2021-07-01
Broad Institute of MIT and Harvard | Cambridge, MA | Associate Software Engineer | Onsite
We are seeking an associate software engineer interested in contributing to an open-source data visualization library for analyzing the biological impact human genetic variation. You will contribute to projects like gnomAD (https://gnomad.broadinstitute.org), the world's largest catalogue of human genetic variation used by hundreds of thousands of researchers and help us scale towards millions of genomes in the coming years. We are also developing next-generation tools for enabling genetic analyses of large biobanks across richly phenotyped individuals (https://genebass.org). In this role you will gain experience developing data-intensive web applications with Typescript, React, Python, Terraform, Google Cloud Platform, and will make use of the scalable data analysis library Hail (https://hail.is). Key to our success is growing a strong team with a diverse membership who foster a culture of continual learning, and who support the growth and success of one another. Towards this end, we are committed to seeking applications from women and from underrepresented groups. We know that many excellent candidates choose not to apply despite their capabilities; please allow us to enthusiastically counter this tendency.
Please provide a CV and links previous work or projects, ideally with contributions visible on Github.
email: [email protected]
Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
Scripts to download genomes from the NCBI FTP serversProject mention: Downloading genomes from database via command line FTP | reddit.com/r/bioinformatics | 2021-07-16
I know you said Ensembl, but if you can live with NCBI, I would suggest https://github.com/kblin/ncbi-genome-download
Interactive network visualization in Python and Dash, powered by Cytoscape.jsProject mention: Visualize Neo4j Nodes in Python | reddit.com/r/Neo4j | 2021-10-28
Efficient pythonic random access to fasta subsequencesProject mention: Can anyone recommend notable examples of simple python projects with unit tests? | reddit.com/r/bioinformatics | 2021-03-07
My package for indexing FASTA files has some extensive tests. Every time someone raised an issue I’d write a test to reproduce the issue and add it after I fix the code. This way I can test for regressions. https://github.com/mdshw5/pyfaidx
A comprehensive library for computational molecular biologyProject mention: Journey into bioinformatics | reddit.com/r/bioinformatics | 2021-04-28
For bioinformatics in Python, the BioPython library (https://biopython.org/) is commonly used. An alternative to this package is Biotite (https://www.biotite-python.org/), a package I am maintaining.
Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule SequencingProject mention: Installing packages for python2 with pip when they're already installed for python3 | reddit.com/r/learnpython | 2021-06-29
Hi! This might be a dumb question, but I am stuck at this problem. I am using Ubuntu 20.04 with Anaconda and I have both Python 2.7 and 3.7 on my system. I am trying to use this tool: https://github.com/aquaskyline/Clairvoyante but I am running into problems that there are packages, such as numpy and intervaltree that are installed for python3 but not for python2. When I run a python2 script that requires this package, I am getting errors like
Structural variant toolkit for VCFsProject mention: VCF files for ML and AI | reddit.com/r/bioinformatics | 2021-11-12
If you use python and are familiar with pandas (which is more data science friendly than anything VCF), Truvari has a utility for conversion. `truvari vcf2df input.vcf.gz output.jl` If the VCFs are very large, I'd also consider scikit-allele.
:pencil2: A versatile DNA sequence optimizerProject mention: Medical professionals, what is the stupidest misconception a patient has had about the human body? | reddit.com/r/AskReddit | 2021-02-20
There exists an algoritm and its implementation called DNA Chisel which can optimize viruses or bacteria computationally and it was used for example to replicate the BioNTech vaccine.
Rapid comparison and dereplication of genomes (by MrOlm)Project mention: How to dereplicate MAGs? (metagenome assembly genomes) | reddit.com/r/bioinformatics | 2021-07-25
You should look at https://github.com/MrOlm/drep . Thats exactly the tool you need. Usually one takes all their bins and dereplicates them at 99% ANI for strain level dereplication or 95% at species level. For preserving the best quality, you would need to check yourself if the best genome survived the dereplication afaik. You probably want some sort of quality score and chose the one with the highest score out of the bins that got grouped as one by dereplication.
WebLogo 3: Sequence Logos redrawnProject mention: Amino acid alignment to find shared motifs | reddit.com/r/bioinformatics | 2021-05-09
you can use this on the comnand line to make nice sequence logos: https://github.com/WebLogo/weblogo
BioGrakn - COVID Knowledge GraphProject mention: How Roche Discovered Novel Potential Gene Targets with TypeDB | dev.to | 2021-06-10
For this purpose, David built out a set of rules and schema in TypeQL. Below is just a very small excerpt of how such data can be modelled — taken from BioGrakn-Covid, a Vaticle community-led project by Konrad Mysliwiec (Data Science Software Engineer, Roche). Note that this is a selected schema; the full schema can be found within the BioGrakn-Covid schema file.
A package to annotate protein sequences (by PedroMTQ)Project mention: Is there any other tool for COG annotation of the bacterial genome than EggNOG mapper? | reddit.com/r/bioinformatics | 2021-11-15
Hello, I'm the developer of Mantis (https://github.com/PedroMTQ/mantis). Mantis doesn't use a database for COGs specifically but it does output some of the IDs you mentioned (e.g., KOs, COGs). If this is important for your work I could consider creating a COG centric database (or at least format it to be natively compatible with Mantis). Anyhow, please check the GitHub page and message me or post an issue and I'll try to help out.
Fast and flexible ORF finderProject mention: Looking for a tool to convert a whole fasta file with CDS sequences to a fasta file with protein sequences. | reddit.com/r/bioinformatics | 2021-08-08
You can probably use or adapt https://github.com/urmi-21/orfipy. I've never used this but it looks fantastic.
Convert GWAS summary statistics to VCFProject mention: Nuitka: An extremely compatible Python compiler | news.ycombinator.com | 2021-09-01
Here is the original repo I have tried to speed up using:
python -m nuitka --clang --follow-imports main.py
If someone can make the program run faster by whatever means, it will make a bunch of people quite happy.
A geometric deep learning pipeline for predicting protein interface contacts.Project mention: [R] Geometric Transformers for Protein Interface Contact Prediction | reddit.com/r/MachineLearning | 2021-10-06
Calculates the melting temperature(in Celsius) of a user imported forward and reverse primerProject mention: tm_calculator_gui: Calculates the melting temperature(in Celsius) of a user imported forward and reverse primer. It chooses between 2 algorithms: 1. basic Tm calculation based on nucleotide content 2. Tm calculation algorith based on salt concetration(such as in primer blast). Avaliable as GUI exe | reddit.com/r/labrats | 2021-08-24
Python Bioinformatics related posts
Is there any other tool for COG annotation of the bacterial genome than EggNOG mapper?
5 projects | reddit.com/r/bioinformatics | 15 Nov 2021
Functional annotation of prokaryotic genomes
4 projects | reddit.com/r/bioinformatics | 20 Oct 2021
[R] Geometric Transformers for Protein Interface Contact Prediction
1 project | reddit.com/r/MachineLearning | 6 Oct 2021
Deciding on genome assembly software
6 projects | reddit.com/r/bioinformatics | 1 Oct 2021
Seq: A programming language for high-performance computational genomics
9 projects | news.ycombinator.com | 15 Sep 2021
What would be the best coding language to run simulations with?
1 project | reddit.com/r/AskProgramming | 5 Sep 2021
Flipping one histogram below the axis.
1 project | reddit.com/r/matlab | 2 Sep 2021
What are some of the best open-source Bioinformatic projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.