Bad tools that NEED improvement

This page summarizes the projects mentioned and recommended in the original post on /r/bioinformatics

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Cenote_Unlimited_Breadsticks

    DEPRECATED: Discover divergent virus sequences, prune flanking cellular sequences, make basic report

  • Cenote Unlimited Breadsticks: I have run ~160k simulated contigs of various lengths through this tool, and none have been predicted as phage. I haven't openened an issue yet because I need to make sure it isn't on my end and that I can give a reprex. Also, you cannot choose the output directory, so it clutters your working directory.

  • DeePhage

    A tool for distinguish temperate phage-derived and virulent phage-derived sequence in metavirome data using deep learning

  • Deephage and PPR-meta: both by the same group. They require MATLAB which makes them tricky on an HPC or cloud system. They both say that if you need to run on several samples concurrently, you must clone the tool to a new directory for each(!). Likely due to temporary files being written to the working directory. Entirely unscalable in that case.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • PPR-Meta

    A tool for identifying phages and plasmids from metagenomic fragments using deep learning

  • Deephage and PPR-meta: both by the same group. They require MATLAB which makes them tricky on an HPC or cloud system. They both say that if you need to run on several samples concurrently, you must clone the tool to a new directory for each(!). Likely due to temporary files being written to the working directory. Entirely unscalable in that case.

  • PhaMers

    A bioinformatic tool for identifying bacteriophages using machine learning and k-mers

  • PhaMers: several open issues. I opened the one about bad FASTA header parsing. But even when I reformat my headers so that works, new errors pop up (which I didn't bother opening issues for since they were unresponsive to others). Also, output is written to current working directory which is annoying.

  • RNN-VirSeeker

    This is a deep learning method for identification of viral contigs with short length from metagenomic data.

  • RNN-Virseeker: hard coded paths to training and actual data, so source code must be edited for it to be used. Bad practice and unscalable. Also, they don't follow Python conventions. test.py is generally meant to indicate a unit/integration test file for something like pytest, but in their "tool" that is the actual tool's file name.

  • ViraMiner

    CNN based classifier for detecting viral sequences among metagenomic contigs

  • ViraMiner: bad documentation. Wasnt sure how to install or run it. Maybe it works? Never got that far.

  • virMine

  • VirMine: docker container can't be built due to outdated and unavailable dependencies. Even with that resolved myself, they install a package that needs CLI input during container building which cannot be supplied so it gets stuck in a loop. Cannot be installed.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • virnet

    VirNet: A deep attention model for viral reads identification

  • VirNet: several issues with their dependencies not being available. For fun, you have to read the issue opened on their "requirments.txt". Pointed out bad dependencies, and mentioned that the file "requirments.txt" was mispelled and so didn't match the README. They updated the README to reflect the mispelling, left the bad file name and bad dependencies, and closed the issue.

  • VirusSeeker-Virome

    VirusSeeker is a set of fully automated and modular software package designed for mining sequence data to identify sequences of microbial origin.

  • VirusSeeker: no installation or running instructions on GitHub. They do have it on their lab website, but you have to edit source files to point to databases. Too much of a pain. So not sure how it works once that is done.

  • metaGEM

    :gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data

  • Paper: https://academic.oup.com/nar/article/49/21/e126/6382386 GitHub: https://github.com/franciscozorrilla/metaGEM

  • MetaRon

    Metagenomic opeRon Prediction pipeline. MetaRon presents the first pipeline for the prediction of metagenomic operons without any functional or experimental data.

  • MetaRon is a tool for metagenomic operon prediction that was published in BMC Genomics in 2020. As far as I can tell, no one has been able to get it to work, and the authors are unresponsive on GitHub. If I knew any python, I would take a crack at fixing it.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts