-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
plassembler
Program to quickly and accurately assemble plasmids in hybrid and long-only sequenced bacterial isolates
I have ONT and Illumina reads for a bacterial WGS (1.8 Mb) and I'm following Ryan Wick's methods for "perfect bacterial genome assembly" (https://github.com/rrwick/Perfect-bacterial-genome-tutorial). I've ran into a few questions I have not been able to find answers to. I'm a grad student struggling in a mostly clinical lab.
I used ALE (https://github.com/sc932/ALE) and Prodigal to evaluate assembly quality. The ALE score was what I think is a terrible -15000000 and a 300 mean prodigal length (I think this is good?). Does anyone know of a guide to interpretation of ALE scores besides the original publication? Any recommendations on other ways to evaluate de-novo assemblies without existing reference genomes?
All looks pretty good to me, 1 is good and expected - circularised chromosome is great - and 2/3 is pretty normal - peak at >10x just because it includes 10-49x in the same histogram bin together, and the first few bases of the illumina read often jump around randomly until it settles down to approximqtely the gc content. Maybe run fastp on those short reads if you are concerned with the first bases. With 4 I’d run a webblast of some chunks of the assembly on nr to see if it’s close to anything/related species or strains (maybe not useful if this is a completely novel species). Also the polisher you used (eg polypolish) should tell you how many changes it made somewhere - if it’s many thousands then you might have a problem of the long and short reads not matching well (maybe if from different extractions), maybe try something like this https://github.com/gbouras13/plassembler (my own tool so self plug) to see if the long and short read sets match well. Another thing to try would be running the assembly through an annotation program like bakta - you would hope to see a high coding density and lots of well annotated cds. All in all what youve done looks pretty great to be honest, Ryan Wick’s tutorials are the bible so you’re already reading the right thing. Here’s the preprint too in case you havent read it https://preprints.scielo.org/index.php/scielo/preprint/view/5053
Related posts
-
Do not Reinvent the Wheel: Utilize Django’s Built-in Auth App to Create a Robust Authentication System
-
Llama3.np: pure NumPy implementation of Llama3
-
Show HN: Open-source tool for data cleaning with LLM
-
Python VS Common Lisp applied: print, log and icecream
-
Who's worked with dbt and how does it compare with Qlik load script or SQL & SSIS?