mmh3
mrjob
mmh3 | mrjob | |
---|---|---|
2 | 1 | |
306 | 2,609 | |
- | 0.0% | |
7.5 | 0.0 | |
4 months ago | about 1 year ago | |
C | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mmh3
-
Does python have a siphash implementation ready to use?
I am playing with some dict implementation and so far I have either used murmur hash library or some custom bit manipulation.
-
Data Ingestion - Build Your Own "Map Reduce"?
Some notes: We don't need Sha256 and not evey base64; nothing will happen if keys will not distribute very equally. we could take MMH3; googling "python murmurhash" gives 2 interesting results; and since both use the same cpp code, let's take the one with most stars Other options would be to simply do (% NUM_SHARDS) or even shift right (however must have shards count == power of 2).
mrjob
What are some alternatives?
murmurhash - 💥 Cython bindings for MurmurHash2
Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing
py-spy - Sampling profiler for Python programs
luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
dumbo - Python module that allows one to easily write and run Hadoop programs.
dpark - Python clone of Spark, a MapReduce alike framework in Python
streamparse - Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.
data-science-ipython-notebooks - Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.