SaaSHub helps you find the best software and product alternatives Learn more →
Top 4 Python Hadoop Projects
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.Project mention: In the context of Python what is a Bob Job? | /r/learnpython | 2022-07-10
Maybe if your use case is “smallish” and doesn’t require the whole studio suite you could check out apscheduler for doing python “tasks” on a schedule and luigi to build pipelines.
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Project mention: Using Spooq to load a large scale of data | /r/apachespark | 2022-12-22
the link to the project: https://github.com/Breaka84/Spooq/blob/master/spooq/loader/hive_loader.py
Python Hadoop related posts
pandas 2.0 and the Arrow revolution (part I)
1 project | /r/dataengineering | 23 Feb 2023
Why use Python over SQL?
1 project | /r/datascience | 25 Dec 2022
Apache Spark on Apple m1
2 projects | /r/apachespark | 22 Feb 2022
Query a Rest API via SQL?
2 projects | /r/dataengineering | 4 Oct 2021
Weird Nagios plugin output
1 project | /r/sysadmin | 18 Apr 2021
check_yum.py weird output
1 project | /r/nagios | 16 Apr 2021
A note from our sponsor - #<SponsorshipServiceOld:0x00007f0920f42bc8>
www.saashub.com | 6 Jun 2023
What are some of the best open-source Hadoop projects in Python? This list will help you: