Top 3 kmer Open-Source Projects
-
DNABERT
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
If I want to get started, they said it's optional to pre-train (so you can skip to step 3). This is where I got tripped up: "Note that the sequences are in kmer format, so you will need to convert your sequences into that." From what I understand, you need to do this so that all of the sequences are the same length? So kmer=6 means all of the sequences are length 6? Someone suggested that I take the first nucleotide in the promoter and grab 3 nucleotides before and 3 nucleotides after (+/-3 bases). I don't think that's how the kmer thing works though? I tried replicating how I think it works down below (I got confused on the last row of the 'after' df). Please correct me if I'm wrong!
I’m not a bioinformatician, I’m just a software engineer. I made krust because I wanted to learn Rust. I’ve kept working on it and I’m surprised it has as many as 22 stars on GitHub, from people who seem to be in bioinformatics for the most part. So it seems like it’s somewhat interesting/useful. But I don’t know what would make this more/less useful for a specialist.
kmer related posts
Index
What are some of the best open-source kmer projects? This list will help you:
Project | Stars | |
---|---|---|
1 | DNABERT | 549 |
2 | sourmash | 437 |
3 | krust | 29 |
Sponsored