murmurhash
gevent
Our great sponsors
murmurhash | gevent | |
---|---|---|
2 | 5 | |
42 | 6,161 | |
- | 0.3% | |
5.0 | 8.7 | |
6 months ago | 3 months ago | |
C++ | Python | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
murmurhash
-
Is anyone using PyPy for real work?
If you have very large dicts, you might find this hash table I wrote for spaCy helpful: https://github.com/explosion/preshed . You need to key the data with 64-bit keys. We use this wrapper around murmurhash for it: https://github.com/explosion/murmurhash
There's no docs so obviously this might not be for you. But the software does work, and is efficient. It's been executed many many millions of times now.
-
Data Ingestion - Build Your Own "Map Reduce"?
Some notes: We don't need Sha256 and not evey base64; nothing will happen if keys will not distribute very equally. we could take MMH3; googling "python murmurhash" gives 2 interesting results; and since both use the same cpp code, let's take the one with most stars Other options would be to simply do (% NUM_SHARDS) or even shift right (however must have shards count == power of 2).
gevent
-
Is anyone using PyPy for real work?
A sub-question for the folks here: is anyone using the combination of gevent and PyPy for a production application? Or, more generally, other libraries that do deep monkey-patching across the Python standard library?
Things like https://github.com/gevent/gevent/issues/676 and the fix at https://github.com/gevent/gevent/commit/f466ec51ea74755c5bee... indicate to me that there are subtleties on how PyPy's memory management interacts with low-level tweaks like gevent that have relied on often-implicit historical assumptions about memory management timing.
Not sure if this is limited to gevent, either - other libraries like Sentry, NewRelic, and OpenTelemetry also have low-level monkey-patched hooks, and it's unclear whether they're low-level enough that they might run into similar issues.
For a stack without any monkey-patching I'd be overjoyed to use PyPy - but between gevent and these monitoring tools, practically every project needs at least some monkey-patching, and I think that there's a lack of clarity on how battle-tested PyPy is with tools like these.
- SynchronousOnlyOperation from celery task using gevent execution pool on django orm
-
How to Choose the Right Python Concurrency API
I'm not sure how much it replicates the CSP model, but the closest thing I've found to Go-style concurrency in Python is gevent: https://github.com/gevent/gevent
I personally still prefer to use it in all my projects.
-
I have a problem with installing Ajenti on a 64bit Ubuntu 21.04 server
Greenlet seems to have some troubles compiling with Python 3.9. https://github.com/gevent/gevent/issues/1627
What are some alternatives?
mmh3 - Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.
eventlet - Concurrent networking library for Python
mrjob - Run MapReduce jobs on Hadoop or Amazon Web Services
Ray - Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
preshed - 💥 Cython hash tables that assume keys are pre-hashed
Faust - Python Stream Processing
python-mysql-replication - Pure Python Implementation of MySQL replication protocol build on top of PyMYSQL
Thespian Actor Library - Python Actor concurrency library
sparc-curation - code and files for SPARC curation workflows
kombu - Messaging library for Python.
psycopg2cffi - Port to cffi with some speed improvements
Tomorrow - Magic decorator syntax for asynchronous code in Python