Bacterial WGS reads and assembly quality questions

This page summarizes the projects mentioned and recommended in the original post on /r/bioinformatics

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • I have ONT and Illumina reads for a bacterial WGS (1.8 Mb) and I'm following Ryan Wick's methods for "perfect bacterial genome assembly" (https://github.com/rrwick/Perfect-bacterial-genome-tutorial). I've ran into a few questions I have not been able to find answers to. I'm a grad student struggling in a mostly clinical lab.

  • ALE

    Assembly Likelihood Estimator (by sc932)

  • I used ALE (https://github.com/sc932/ALE) and Prodigal to evaluate assembly quality. The ALE score was what I think is a terrible -15000000 and a 300 mean prodigal length (I think this is good?). Does anyone know of a guide to interpretation of ALE scores besides the original publication? Any recommendations on other ways to evaluate de-novo assemblies without existing reference genomes?

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • plassembler

    Program to quickly and accurately assemble plasmids in hybrid and long-only sequenced bacterial isolates

  • All looks pretty good to me, 1 is good and expected - circularised chromosome is great - and 2/3 is pretty normal - peak at >10x just because it includes 10-49x in the same histogram bin together, and the first few bases of the illumina read often jump around randomly until it settles down to approximqtely the gc content. Maybe run fastp on those short reads if you are concerned with the first bases. With 4 I’d run a webblast of some chunks of the assembly on nr to see if it’s close to anything/related species or strains (maybe not useful if this is a completely novel species). Also the polisher you used (eg polypolish) should tell you how many changes it made somewhere - if it’s many thousands then you might have a problem of the long and short reads not matching well (maybe if from different extractions), maybe try something like this https://github.com/gbouras13/plassembler (my own tool so self plug) to see if the long and short read sets match well. Another thing to try would be running the assembly through an annotation program like bakta - you would hope to see a high coding density and lots of well annotated cds. All in all what youve done looks pretty great to be honest, Ryan Wick’s tutorials are the bible so you’re already reading the right thing. Here’s the preprint too in case you havent read it https://preprints.scielo.org/index.php/scielo/preprint/view/5053

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Do not Reinvent the Wheel: Utilize Django’s Built-in Auth App to Create a Robust Authentication System

    1 project | dev.to | 16 May 2024
  • Llama3.np: pure NumPy implementation of Llama3

    8 projects | news.ycombinator.com | 16 May 2024
  • Show HN: Open-source tool for data cleaning with LLM

    1 project | news.ycombinator.com | 16 May 2024
  • Python VS Common Lisp applied: print, log and icecream

    1 project | dev.to | 16 May 2024
  • Who's worked with dbt and how does it compare with Qlik load script or SQL & SSIS?

    4 projects | /r/BusinessIntelligence | 16 Aug 2022