comparing the similarity between a set of protein sequences

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Biopython

31 4,167 9.6 Python

Official git repository for Biopython (originally converted from CVS)

Usearch will do all-against-all comparisons, cluster sequences, and produce alignments for each cluster. You can set the clustering threshold (proportion of residues identical). The alignments are in fasta format, which is pretty standard. If all you want is basic similarity it might be easiest to just write something that calculates normalized Hamming distances (typically called p-distances in the molecular evolution literature) between pairs of sequences. I suspect the biopython fasta reader (you can install biopython from https://biopython.org/) will be good enough.

diamond

3 954 6.3 C++

Accelerated BLAST compatible local sequence aligner. (by bbuchfink)

Diamond (https://github.com/bbuchfink/diamond) might help. It has a protein sequence clustering option. You could cluster your sequences and then take the centroids of each cluster. Vary the BLAST parameters to increase/decrease the numbers of clusters.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project