Python and ETL

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Rudderstack

    Privacy and Security focused Segment-alternative, in Golang and React

    Checkout Rudderstack, an open source project to collect data from various sources (databases, apps, etc.) and prepare for business analytics. Let me know if you have any questions

  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

    Shameless plug. But I genuinely believe polars is the best tool for the job if performance, schema validity and RAM usage is important to you. Dependent on your machine Its performance is 2x-70x times pandas. It uses arrow memory and thus has proper null handling, has query optimization, a lot of parallelization, insanely fast csv-parser and utilizes much less RAM then pandas.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • connector-x

    Fastest library to load data from DB to DataFrames in Rust and Python

    For SQL reading I'd really recommend connector-x, they do a great job preventing unneeded serialization and don't have to go through python.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts