SaaSHub helps you find the best software and product alternatives Learn more →
Top 4 Python Hadoop Projects
-
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Maybe if your use case is “smallish” and doesn’t require the whole studio suite you could check out apscheduler for doing python “tasks” on a schedule and luigi to build pipelines.
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
nagios-plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
-
the link to the project: https://github.com/Breaka84/Spooq/blob/master/spooq/loader/hive_loader.py
Python Hadoop related posts
- pandas 2.0 and the Arrow revolution (part I)
- Why use Python over SQL?
- Apache Spark on Apple m1
- Query a Rest API via SQL?
- Weird Nagios plugin output
- check_yum.py weird output
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007f0920f42bc8>
www.saashub.com | 6 Jun 2023
Index
What are some of the best open-source Hadoop projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | data-science-ipython-notebooks | 25,155 |
2 | luigi | 16,560 |
3 | nagios-plugins | 1,098 |
4 | Spooq | 8 |