Automate your data processing pipeline in 9 steps ⚙️

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • vaderSentiment

    VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

  • I was really excited, though also a bit overwhelmed by all the things I had to set up for this project. In total, I spent five days learning the tools, debugging, and building this pipeline with Python (including libraries like Tweepy, TextBlob, VADER, and SQLAlchemy), Postgres, MongoDB, Docker, and Airflow (most frustrating part...). If you're interested to see how I did this, you can check out the project on GitHub and read this blog post.

  • tweets-docker-pipeline

    Docker pipeline for streaming tweets and their sentiment score to a Slack channel

  • I was really excited, though also a bit overwhelmed by all the things I had to set up for this project. In total, I spent five days learning the tools, debugging, and building this pipeline with Python (including libraries like Tweepy, TextBlob, VADER, and SQLAlchemy), Postgres, MongoDB, Docker, and Airflow (most frustrating part...). If you're interested to see how I did this, you can check out the project on GitHub and read this blog post.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • SQLAlchemy

    The Database Toolkit for Python

  • I was really excited, though also a bit overwhelmed by all the things I had to set up for this project. In total, I spent five days learning the tools, debugging, and building this pipeline with Python (including libraries like Tweepy, TextBlob, VADER, and SQLAlchemy), Postgres, MongoDB, Docker, and Airflow (most frustrating part...). If you're interested to see how I did this, you can check out the project on GitHub and read this blog post.

  • PostgreSQL

    Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

  • Load the cleaned tweets and their sentiment score in a Postgres database

  • MongoDB

    The MongoDB Database

  • Store the collected tweets in a MongoDB database

  • Docker Compose

    Define and run multi-container applications with Docker

  • A few months ago, I completed a Data Science bootcamp, where one week was all about data engineering, ETL pipelines, and workflow automation. The project for that week was to create a database of tweets that use the hashtag #OnThisDay, along with their sentiment score, and post tweets in a Slack channel to inform members about historical events that happened on that day. This pipeline had to be done with Docker Compose and included six steps:

  • twurl

    OAuth-enabled curl for the Twitter API

  • Next, we are going to collect tweets with the hashtag #OnThisDay. To do this, first you need to create a Twitter Developer account and register an app. Follow the instructions in our reference docs to learn how to set up your Twitter app and get the necessary credentials (Consumer Key and Consumer Secret). Once you have your credentials, copy and paste them in the Credentials field of the Twitter node. Next, set the parameters:

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • I was really excited, though also a bit overwhelmed by all the things I had to set up for this project. In total, I spent five days learning the tools, debugging, and building this pipeline with Python (including libraries like Tweepy, TextBlob, VADER, and SQLAlchemy), Postgres, MongoDB, Docker, and Airflow (most frustrating part...). If you're interested to see how I did this, you can check out the project on GitHub and read this blog post.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • starlette-admin: support for Odmantic & many more

    5 projects | /r/Python | 21 Dec 2022
  • Xz/liblzma: Bash-stage Obfuscation Explained

    1 project | news.ycombinator.com | 31 Mar 2024
  • Alembic with Async SQLAlchemy

    1 project | dev.to | 12 Dec 2023
  • Imperative vs. Declarative mapping style in Domain Driven Design project

    1 project | news.ycombinator.com | 28 Oct 2023
  • A steering council note about making the global

    3 projects | news.ycombinator.com | 29 Jul 2023