sra-tools
pysradb
sra-tools | pysradb | |
---|---|---|
8 | 1 | |
1,054 | 296 | |
1.4% | - | |
9.2 | 7.3 | |
6 days ago | 8 months ago | |
C | Python | |
GNU General Public License v3.0 or later | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sra-tools
-
Bulk RNA seq analysis.
Note that conda sra-tools does not support ARM (M chips use customized ARM architecture ) yet and the official repo also does not provide pre-compiled version for arm. You should be able to compile it from the source but YMMV. Similarly, STAR does not officially support ARM either but compilation on your own always worths a try.
-
fasterq-dump target disk-limit
According to this, the disk space required should be about 17 times the size of the accession (for both output files and tmp files). As can be seen below, I have >500GB of disk space. Yet, I receive this output:
-
Scientists Are Finding Fungi in Cancerous Tumors
Start with downloading SRA toolkit: https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-To...
Find some data of interest: https://www.ncbi.nlm.nih.gov/sra?term=(%22Homo%20sapiens%22[... (This searches SRA for human genome sequences on illumina with fastq files available)
Run fasterq-dump on the SRR (listed as "Runs" in the SRA page of your choice):
-
Systematic way to collect GEO datasets
If you are okay with looking at already processed data, chdck out https://dee2.io/. Otherwise there is https://github.com/ncbi/sra-tools for getting fastq files (a cli tool) or https://github.com/saketkc/pysradb (python)
-
How to extract gene sequences from SRA
Fasterq-dump also doesnt play nice sometimes. By that I mean it causes problems like this https://github.com/ncbi/sra-tools/issues/383
-
How do I enable prefetch? Keep getting "-bash: defaults: command not found"
In this case, you're missing software. Specifically, you're missing sra-tools. Fortunately, this is provided via Homebrew as sratoolkit. After installing that formula, you should be able to use the prefetch command.
-
SRAToolKit for pipeline
Yeah, unfortunately this required key entry is by design for some reason (see more here https://github.com/ncbi/sra-tools/issues/291).
- SRA data from NCBI - moved?
pysradb
-
Systematic way to collect GEO datasets
If you are okay with looking at already processed data, chdck out https://dee2.io/. Otherwise there is https://github.com/ncbi/sra-tools for getting fastq files (a cli tool) or https://github.com/saketkc/pysradb (python)
What are some alternatives?
bwa - Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
kingfisher-download - Easier download/extract of FASTA/Q read data and metadata from the ENA, NCBI, AWS or GCP.
biostar-central - Biostar Q&A
GermlineMutationCalling - An adaptable Snakemake workflow which uses GATKs best practice recommendations to perform germline mutation calling starting with BAM files
HomeBrew - 🍺 The missing package manager for macOS (or Linux)
edgecase - A framework for extracting telomeric reads from single-molecule sequencing experiments, describing their sequence variation and motifs, and for haplotype inference.