Python Bioinformatics

Open-source Python projects categorized as Bioinformatics | Edit details

Top 22 Python Bioinformatic Projects

  • GitHub repo dash

    Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

    Project mention: How can i create a cloud interface using python? | reddit.com/r/learnpython | 2021-11-27
  • GitHub repo Biopython

    Official git repository for Biopython (originally converted from CVS)

    Project mention: Seq: A programming language for high-performance computational genomics | news.ycombinator.com | 2021-09-15

    It might be pretty useful as a teaching tool, but I'm skeptical of its long-term benefit to professionals. I'm not sure the ecosystem of Seq users will be large enough, y'know? Again, it's pretty impressive work, and it's come a long way. I wish the devs all the best. :)

    1. https://biopython.org/

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo deepvariant

    DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

    Project mention: [D] Is deep learning having an impact in life sciences yet? | reddit.com/r/bioinformatics | 2021-07-04

    Deepvariant is a another example.

  • GitHub repo scanpy

    Single-Cell Analysis in Python. Scales to >1M cells.

    Project mention: Flipping one histogram below the axis. | reddit.com/r/matlab | 2021-09-02

    I am plotting 2 1-D histograms on top of one another using a hold on command. Is there a way to have one histogram be upside down, and then to flip the entire plot 90˚? I am looking to create a violin plot https://github.com/theislab/scanpy/issues/1448 on my own, having one histogram on the left, and one on the right.

  • GitHub repo galaxy

    Data intensive science for everyone. (by galaxyproject)

    Project mention: Developed a new kind of dual extruder system on fully custom built 3D printer | reddit.com/r/3Dprinting | 2021-03-01
  • GitHub repo MultiQC

    Aggregate results from bioinformatics analyses across many samples into a single report.

    Project mention: How to use MultiQC? I am trying to run it to compile the summary from my FASTQC but I keep getting the "Sample has no read" error. | reddit.com/r/bioinformatics | 2021-08-18

    If that all checks out then I would have to see more of your files in order to help, sorry. Submitting the issue at https://github.com/ewels/MultiQC/issues would help you more

  • GitHub repo Hail

    Scalable genomic data analysis.

    Project mention: Ask HN: Who is hiring? (July 2021) | news.ycombinator.com | 2021-07-01

    Broad Institute of MIT and Harvard | Cambridge, MA | Associate Software Engineer | Onsite

    We are seeking an associate software engineer interested in contributing to an open-source data visualization library for analyzing the biological impact human genetic variation. You will contribute to projects like gnomAD (https://gnomad.broadinstitute.org), the world's largest catalogue of human genetic variation used by hundreds of thousands of researchers and help us scale towards millions of genomes in the coming years. We are also developing next-generation tools for enabling genetic analyses of large biobanks across richly phenotyped individuals (https://genebass.org). In this role you will gain experience developing data-intensive web applications with Typescript, React, Python, Terraform, Google Cloud Platform, and will make use of the scalable data analysis library Hail (https://hail.is). Key to our success is growing a strong team with a diverse membership who foster a culture of continual learning, and who support the growth and success of one another. Towards this end, we are committed to seeking applications from women and from underrepresented groups. We know that many excellent candidates choose not to apply despite their capabilities; please allow us to enthusiastically counter this tendency.

    Please provide a CV and links previous work or projects, ideally with contributions visible on Github.

    email: [email protected]

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo ncbi-genome-download

    Scripts to download genomes from the NCBI FTP servers

    Project mention: Downloading genomes from database via command line FTP | reddit.com/r/bioinformatics | 2021-07-16

    I know you said Ensembl, but if you can live with NCBI, I would suggest https://github.com/kblin/ncbi-genome-download

  • GitHub repo dash-cytoscape

    Interactive network visualization in Python and Dash, powered by Cytoscape.js

    Project mention: Visualize Neo4j Nodes in Python | reddit.com/r/Neo4j | 2021-10-28
  • GitHub repo pyfaidx

    Efficient pythonic random access to fasta subsequences

    Project mention: Can anyone recommend notable examples of simple python projects with unit tests? | reddit.com/r/bioinformatics | 2021-03-07

    My package for indexing FASTA files has some extensive tests. Every time someone raised an issue I’d write a test to reproduce the issue and add it after I fix the code. This way I can test for regressions. https://github.com/mdshw5/pyfaidx

  • GitHub repo biotite

    A comprehensive library for computational molecular biology

    Project mention: Journey into bioinformatics | reddit.com/r/bioinformatics | 2021-04-28

    For bioinformatics in Python, the BioPython library (https://biopython.org/) is commonly used. An alternative to this package is Biotite (https://www.biotite-python.org/), a package I am maintaining.

  • GitHub repo Clairvoyante

    Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing

    Project mention: Installing packages for python2 with pip when they're already installed for python3 | reddit.com/r/learnpython | 2021-06-29

    Hi! This might be a dumb question, but I am stuck at this problem. I am using Ubuntu 20.04 with Anaconda and I have both Python 2.7 and 3.7 on my system. I am trying to use this tool: https://github.com/aquaskyline/Clairvoyante but I am running into problems that there are packages, such as numpy and intervaltree that are installed for python3 but not for python2. When I run a python2 script that requires this package, I am getting errors like

  • GitHub repo truvari

    Structural variant toolkit for VCFs

    Project mention: VCF files for ML and AI | reddit.com/r/bioinformatics | 2021-11-12

    If you use python and are familiar with pandas (which is more data science friendly than anything VCF), Truvari has a utility for conversion. `truvari vcf2df input.vcf.gz output.jl` If the VCFs are very large, I'd also consider scikit-allele.

  • GitHub repo DnaChisel

    :pencil2: A versatile DNA sequence optimizer

    Project mention: Medical professionals, what is the stupidest misconception a patient has had about the human body? | reddit.com/r/AskReddit | 2021-02-20

    There exists an algoritm and its implementation called DNA Chisel which can optimize viruses or bacteria computationally and it was used for example to replicate the BioNTech vaccine.

  • GitHub repo drep

    Rapid comparison and dereplication of genomes (by MrOlm)

    Project mention: How to dereplicate MAGs? (metagenome assembly genomes) | reddit.com/r/bioinformatics | 2021-07-25

    You should look at https://github.com/MrOlm/drep . Thats exactly the tool you need. Usually one takes all their bins and dereplicates them at 99% ANI for strain level dereplication or 95% at species level. For preserving the best quality, you would need to check yourself if the best genome survived the dereplication afaik. You probably want some sort of quality score and chose the one with the highest score out of the bins that got grouped as one by dereplication.

  • GitHub repo typedb-data-bio-covid

    BioGrakn - COVID Knowledge Graph

    Project mention: How Roche Discovered Novel Potential Gene Targets with TypeDB | dev.to | 2021-06-10

    For this purpose, David built out a set of rules and schema in TypeQL. Below is just a very small excerpt of how such data can be modelled — taken from BioGrakn-Covid, a Vaticle community-led project by Konrad Mysliwiec (Data Science Software Engineer, Roche). Note that this is a selected schema; the full schema can be found within the BioGrakn-Covid schema file.

  • GitHub repo mantis

    A package to annotate protein sequences (by PedroMTQ)

    Project mention: Is there any other tool for COG annotation of the bacterial genome than EggNOG mapper? | reddit.com/r/bioinformatics | 2021-11-15

    Hello, I'm the developer of Mantis (https://github.com/PedroMTQ/mantis). Mantis doesn't use a database for COGs specifically but it does output some of the IDs you mentioned (e.g., KOs, COGs). If this is important for your work I could consider creating a COG centric database (or at least format it to be natively compatible with Mantis). Anyhow, please check the GitHub page and message me or post an issue and I'll try to help out.

  • GitHub repo orfipy

    Fast and flexible ORF finder

    Project mention: Looking for a tool to convert a whole fasta file with CDS sequences to a fasta file with protein sequences. | reddit.com/r/bioinformatics | 2021-08-08

    You can probably use or adapt https://github.com/urmi-21/orfipy. I've never used this but it looks fantastic.

  • GitHub repo gwas2vcf

    Convert GWAS summary statistics to VCF

    Project mention: Nuitka: An extremely compatible Python compiler | news.ycombinator.com | 2021-09-01

    Here is the original repo I have tried to speed up using:

    python -m nuitka --clang --follow-imports main.py

    repo: https://github.com/MRCIEU/gwas2vcf

    If someone can make the program run faster by whatever means, it will make a bunch of people quite happy.

  • GitHub repo DeepInteract

    A geometric deep learning pipeline for predicting protein interface contacts.

    Project mention: [R] Geometric Transformers for Protein Interface Contact Prediction | reddit.com/r/MachineLearning | 2021-10-06
  • GitHub repo tm_calculator_gui

    Calculates the melting temperature(in Celsius) of a user imported forward and reverse primer

    Project mention: tm_calculator_gui: Calculates the melting temperature(in Celsius) of a user imported forward and reverse primer. It chooses between 2 algorithms: 1. basic Tm calculation based on nucleotide content 2. Tm calculation algorith based on salt concetration(such as in primer blast). Avaliable as GUI exe | reddit.com/r/labrats | 2021-08-24
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-11-27.

Python Bioinformatics related posts

Index

What are some of the best open-source Bioinformatic projects in Python? This list will help you:

Project Stars
1 dash 15,469
2 Biopython 2,908
3 deepvariant 2,381
4 scanpy 1,015
5 galaxy 883
6 MultiQC 777
7 Hail 761
8 ncbi-genome-download 574
9 dash-cytoscape 383
10 pyfaidx 330
11 biotite 170
12 Clairvoyante 159
13 truvari 119
14 DnaChisel 114
15 drep 108
16 weblogo 103
17 typedb-data-bio-covid 34
18 mantis 29
19 orfipy 25
20 gwas2vcf 18
21 DeepInteract 18
22 tm_calculator_gui 1
Find remote jobs at our new job board 99remotejobs.com. There are 34 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com