Python Bioinformatics

Open-source Python projects categorized as Bioinformatics

Top 23 Python Bioinformatic Projects

  • dash

    Data Apps & Dashboards for Python. No JavaScript Required.

    Project mention: NiceGUI: Let any browser be the frontend for your Python code | reddit.com/r/Python | 2023-01-15

    Of course there are valid use cases for splitting frontend and backend technologies. NiceGUI is for those who don’t want to leave the Python ecosystem and like to reap the benefits of having all code in one place. There are other options like Streamlit, Dash, Anvil, JustPy, and Pynecone. But we initially created NiceGUI to easily handle the state of external hardware like LEDs, motors, and cameras. Additionally, we wanted to offer a gentle learning curve while still providing the ability to go all the way down to HTML, CSS, and JavaScript if needed.

  • Biopython

    Official git repository for Biopython (originally converted from CVS)

    Project mention: Biology related exercices and "challenges" to train by myself | reddit.com/r/learnpython | 2023-02-01

    I think you mind find something of a community around BioPython, which might be helpful. Just looking at the capabilities will probably be instructive as well.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • deepvariant

    DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

    Project mention: Give me your suggestions for papers with a Convolutional Neural Network in Bioinformatics | reddit.com/r/bioinformatics | 2022-07-12

    See https://www.nature.com/articles/nbt.4235 for the paper and https://github.com/google/deepvariant for the code.

  • scanpy

    Single-cell analysis in Python. Scales to >1M cells.

    Project mention: Useful Python Decorators for Data Scientists | news.ycombinator.com | 2022-05-23
  • scispacy

    A full spaCy pipeline and models for scientific/biomedical documents.

    Project mention: Guidance needed: Extracting diseases and symptoms from medical text | reddit.com/r/LanguageTechnology | 2022-11-05

    https://github.com/medspacy/medspacy and https://allenai.github.io/scispacy/ should get you most of the way there

  • galaxy

    Data intensive science for everyone.

    Project mention: BIOINFORMATICS PROJECT | reddit.com/r/bioinformatics | 2022-10-16
  • deep_gcns_torch

    Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • MultiQC

    Aggregate results from bioinformatics analyses across many samples into a single report.

    Project mention: RNA-seq analysis | reddit.com/r/bioinformatics | 2022-11-16

    I would recommend looking at the pages for FastQC and MultiQC. I run FastQC on my fastq files, then MultiQC on them to collect all that individual data into one report. You can also use MultiQC to analyze the quality of your alignments, at least after using STAR aligner (probably others too, I just have only used STAR aligned).

  • Hail

    Scalable genomic data analysis.

    Project mention: We're wasting money by only supporting gzip for raw DNA files | news.ycombinator.com | 2023-01-09
  • ncbi-genome-download

    Scripts to download genomes from the NCBI FTP servers

    Project mention: I have a question about the FTP of annotation files from NCBI's Genbank and RefSeq | reddit.com/r/bioinformatics | 2022-08-02

    If you have the taxonomic IDs of your organisms of interest, there are existing parallelized download tools that are more efficient like https://github.com/kblin/ncbi-genome-download or bit-dl-ncbi-assemblies from https://github.com/AstrobioMike/bit

  • biostar-central

    Biostar Q&A

    Project mention: first stop codon | reddit.com/r/learnpython | 2022-12-08

    also consider posting bioinformatics questions over at https://www.biostars.org/ instead of reddit

  • dash-cytoscape

    Interactive network visualization in Python and Dash, powered by Cytoscape.js

  • dash-bio

    Open-source bioinformatics components for Dash

    Project mention: New CRISPR-based map ties every human gene to its function | news.ycombinator.com | 2022-06-10

    > Where are the polished, powerful design tools for biology

    User interfaces for biology have drastically improved over the last 10 years.

    Domain-specific tools like genome browsers, protein viewers, or phylogenetic explorers [1-3] almost all look and feel a lot better than they did in 2012.

    The biggest exception here is UCSC Genome Browser, which has an old-school design and web technology stack. That said, it's steadily added features over the years, has substantially sleekened UX in its periphery, and remains widely used.

    There are also bespoke visual design resources for biology applications that are good and getting better, like BioRender and PhyloPic [4-5]. There are multi-tiered packages like Dash Bio that wrap biology components together. There's Blender biology community, too!

    ---

    1. Genome browsers and components: https://jbrowse.org/jb2/, https://www.ncbi.nlm.nih.gov/genome/gdv, https://igv.org/app, https://eweitz.github.io/ideogram

    2. Protein viewers: https://pymol.org/, https://nglviewer.org/ngl/

    3. Phylogenetic explorers: https://clades.nextstrain.org/

    4. https://biorender.com/

    5. http://phylopic.org/

    6. https://github.com/plotly/dash-bio, https://dash.gallery/Portal/?search=[Pharma]

  • DnaFeaturesViewer

    :eye: Python library to plot DNA sequence features (e.g. from Genbank files)

    Project mention: Software to make in-scale illustrations of genomic locations | reddit.com/r/bioinformatics | 2022-09-10

    If you can build a Python environment and do a little coding, DnaFeaturesViewer or pyGenomeViz would be good choices. You can generate the following figure from a Genbank file with about 10 lines of code. Of course, you can specify the range of coordinates to be plotted.

  • clinker

    Gene cluster comparison figure generator

  • Sniffles

    Structural variation caller using third generation sequencing

  • pyfaidx

    Efficient pythonic random access to fasta subsequences

  • sourmash

    Quickly search, compare, and analyze genomic and metagenomic data sets.

    Project mention: Any good meta-transcriptomics pipelines | reddit.com/r/bioinformatics | 2022-12-29

    have you seen https://www.nature.com/articles/s41587-019-0209-9?ref=https://githubhelp.com and https://github.com/sourmash-bio/sourmash ?

  • biotite

    A comprehensive library for computational molecular biology

  • bakta

    Rapid & standardized annotation of bacterial genomes, MAGs & plasmids

    Project mention: orfs in DNA | reddit.com/r/bioinformatics | 2022-10-11

    If you need more accurate ORF(CDS) prediction including functional annotation, I recommend using CLI tools such as prokka, bakta, or DFAST (DFAST is also available in a web version).

  • pysradb

    Package for fetching metadata and downloading data from SRA/ENA/GEO

    Project mention: Systematic way to collect GEO datasets | reddit.com/r/bioinformatics | 2022-10-05

    If you are okay with looking at already processed data, chdck out https://dee2.io/. Otherwise there is https://github.com/ncbi/sra-tools for getting fastq files (a cli tool) or https://github.com/saketkc/pysradb (python)

  • truvari

    Structural variant toolkit for VCFs

  • hgvs

    Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`

    Project mention: [Question] How to transform HGVSg to HGVSp? | reddit.com/r/bioinformatics | 2022-08-12

    Python > https://github.com/biocommons/hgvs

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-02-01.

Python Bioinformatics related posts

Index

What are some of the best open-source Bioinformatic projects in Python? This list will help you:

Project Stars
1 dash 18,034
2 Biopython 3,422
3 deepvariant 2,702
4 scanpy 1,376
5 scispacy 1,315
6 galaxy 1,050
7 deep_gcns_torch 1,003
8 MultiQC 927
9 Hail 854
10 ncbi-genome-download 696
11 biostar-central 549
12 dash-cytoscape 485
13 dash-bio 457
14 DnaFeaturesViewer 449
15 clinker 416
16 Sniffles 387
17 pyfaidx 384
18 sourmash 344
19 biotite 340
20 bakta 257
21 pysradb 239
22 truvari 200
23 hgvs 192
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com