mmh3
murmurhash
mmh3 | murmurhash | |
---|---|---|
2 | 2 | |
304 | 42 | |
- | - | |
7.5 | 5.0 | |
4 months ago | 6 months ago | |
C | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mmh3
-
Does python have a siphash implementation ready to use?
I am playing with some dict implementation and so far I have either used murmur hash library or some custom bit manipulation.
-
Data Ingestion - Build Your Own "Map Reduce"?
Some notes: We don't need Sha256 and not evey base64; nothing will happen if keys will not distribute very equally. we could take MMH3; googling "python murmurhash" gives 2 interesting results; and since both use the same cpp code, let's take the one with most stars Other options would be to simply do (% NUM_SHARDS) or even shift right (however must have shards count == power of 2).
murmurhash
-
Is anyone using PyPy for real work?
If you have very large dicts, you might find this hash table I wrote for spaCy helpful: https://github.com/explosion/preshed . You need to key the data with 64-bit keys. We use this wrapper around murmurhash for it: https://github.com/explosion/murmurhash
There's no docs so obviously this might not be for you. But the software does work, and is efficient. It's been executed many many millions of times now.
-
Data Ingestion - Build Your Own "Map Reduce"?
Some notes: We don't need Sha256 and not evey base64; nothing will happen if keys will not distribute very equally. we could take MMH3; googling "python murmurhash" gives 2 interesting results; and since both use the same cpp code, let's take the one with most stars Other options would be to simply do (% NUM_SHARDS) or even shift right (however must have shards count == power of 2).
What are some alternatives?
py-spy - Sampling profiler for Python programs
mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services
preshed - 💥 Cython hash tables that assume keys are pre-hashed
python-mysql-replication - Pure Python Implementation of MySQL replication protocol build on top of PyMYSQL
sparc-curation - code and files for SPARC curation workflows
psycopg2cffi - Port to cffi with some speed improvements
MurMurHash - This little tool is to calculate a MurmurHash value of a favicon to hunt phishing websites on the Shodan platform.
pymssql - Official home for the pymssql source code.
legion - The Legion Parallel Programming System
Pyjion - Pyjion - A JIT for Python based upon CoreCLR