Bacterial WGS reads and assembly quality questions

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Perfect-bacterial-genome-tutorial

2 109 4.5 Python

I have ONT and Illumina reads for a bacterial WGS (1.8 Mb) and I'm following Ryan Wick's methods for "perfect bacterial genome assembly" (https://github.com/rrwick/Perfect-bacterial-genome-tutorial). I've ran into a few questions I have not been able to find answers to. I'm a grad student struggling in a mostly clinical lab.

ALE

1 32 2.2 C

Assembly Likelihood Estimator (by sc932)

I used ALE (https://github.com/sc932/ALE) and Prodigal to evaluate assembly quality. The ALE score was what I think is a terrible -15000000 and a 300 mean prodigal length (I think this is good?). Does anyone know of a guide to interpretation of ALE scores besides the original publication? Any recommendations on other ways to evaluate de-novo assemblies without existing reference genomes?

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
plassembler

2 48 8.8 Python

Program to quickly and accurately assemble plasmids in hybrid and long-only sequenced bacterial isolates

All looks pretty good to me, 1 is good and expected - circularised chromosome is great - and 2/3 is pretty normal - peak at >10x just because it includes 10-49x in the same histogram bin together, and the first few bases of the illumina read often jump around randomly until it settles down to approximqtely the gc content. Maybe run fastp on those short reads if you are concerned with the first bases. With 4 I’d run a webblast of some chunks of the assembly on nr to see if it’s close to anything/related species or strains (maybe not useful if this is a completely novel species). Also the polisher you used (eg polypolish) should tell you how many changes it made somewhere - if it’s many thousands then you might have a problem of the long and short reads not matching well (maybe if from different extractions), maybe try something like this https://github.com/gbouras13/plassembler (my own tool so self plug) to see if the long and short read sets match well. Another thing to try would be running the assembly through an annotation program like bakta - you would hope to see a high coding density and lots of well annotated cds. All in all what youve done looks pretty great to be honest, Ryan Wick’s tutorials are the bible so you’re already reading the right thing. Here’s the preprint too in case you havent read it https://preprints.scielo.org/index.php/scielo/preprint/view/5053

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Do not Reinvent the Wheel: Utilize Django’s Built-in Auth App to Create a Robust Authentication System

1 project | dev.to | 16 May 2024
Llama3.np: pure NumPy implementation of Llama3

8 projects | news.ycombinator.com | 16 May 2024
Show HN: Open-source tool for data cleaning with LLM

1 project | news.ycombinator.com | 16 May 2024
Python VS Common Lisp applied: print, log and icecream

1 project | dev.to | 16 May 2024
Who's worked with dbt and how does it compare with Qlik load script or SQL & SSIS?

4 projects | /r/BusinessIntelligence | 16 Aug 2022

Bacterial WGS reads and assembly quality questions

This page summarizes the projects mentioned and recommended in the original post on /r/bioinformatics Post date: 10 Jan 2023

Perfect-bacterial-genome-tutorial

ALE

InfluxDB

plassembler

Related posts

Do not Reinvent the Wheel: Utilize Django’s Built-in Auth App to Create a Robust Authentication System

Llama3.np: pure NumPy implementation of Llama3

Show HN: Open-source tool for data cleaning with LLM

Python VS Common Lisp applied: print, log and icecream

Who's worked with dbt and how does it compare with Qlik load script or SQL & SSIS?

Bacterial WGS reads and assembly quality questions

This page summarizes the projects mentioned and recommended in the original post on /r/bioinformatics Post date: 10 Jan 2023

Perfect-bacterial-genome-tutorial

ALE

InfluxDB

plassembler

Related posts

Do not Reinvent the Wheel: Utilize Django’s Built-in Auth App to Create a Robust Authentication System

Llama3.np: pure NumPy implementation of Llama3

Show HN: Open-source tool for data cleaning with LLM

Python VS Common Lisp applied: print, log and icecream

Who's worked with dbt and how does it compare with Qlik load script or SQL &amp; SSIS?

Who's worked with dbt and how does it compare with Qlik load script or SQL & SSIS?