Show HN: Data load tool(dlt)-Python library to automate the creation of datasets

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

dlt

6 1,736 9.9 Python

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

Hi HN,
We're Anna, Adrian, Marcin and Matt, developers of dlt. dlt is an open source library to automatically create datasets out of messy, unstructured data sources. You can use the library to move data from about anywhere into most of well known SQL and vector stores, data lakes, storage buckets, or local engines like DuckDB. It automates many cumbersome data engineering tasks and can by handled by anyone who knows Python.
Here’s our Github: https://github.com/dlt-hub/dlt
Here’s our Colab demo: https://colab.research.google.com/drive/1DhaKW0tiSTHDCVmPjM-...
— — —
In the past we wrote hundreds of Python scripts to fit messy data sources into something that you can work with in Python - a database, Pandas frame or just a Python list. We were solving the same problems and making the similar mistakes again and again.
This is why we built an easy to use Python library called dlt that will automate most data engineering tasks. It hides the complexities of data loading and automatically generates a structured and clean datasets for immediate querying and sharing.
— — —
At its core, dlt removes the need to create the dataset schemas, react to changing data, generate append or merge statements, and to move the data in transactional and idempotent manner. Those things are automated and can be declared right in the Python code, just by decorating functions.
Add @dlt.resource decorator, give it a few hints, and convert any data into a simple pipeline that creates and updates datasets.
dlt gets the details out of your way:
1. You do not need to worry about the structure of a database or parquet files
dlt will create a nice, typed schema out of your data and will migrate it when the data changes. You can put some data contracts and Pydantic models on top to keep your data clean.
2. You do not need to write any INSERT/UPDATE or data copy statements
dlt will push the data to DuckDB, Weaviate, storage buckets and many popular SQL stores. It will align the data types, file formats, and identifier names automatically
3. You do not need to worry when you need to add new data or update the changes.
dlt lets you declare how to load the data, how to increment it and will keep the loading state together so they are always in sync.
4. You keep how you develop and test your code
Iterate and test quickly on your laptop or in a dev container. Run locally on DuckDB and just swap destination name to go to the cloud - your code, schema and data will stay the same.
5. You can work with data on your laptop.
Combine dlt with other tools and libraries to process data locally. duckdb, Pandas, Arrow tables and Rust based loading libraries like ConnectorX work nicely with dlt and process data blazingly fast, compared to the cloud.
6. You do not need to worry if your pipeline will work when you deploy it.
dlt is a minimalistic Python library, requires no backend and works whenever Python works. You can finetune it to work on constrained environments like AWS Lambda or run with Airflow, GitHub Actions or Dagster.
dlt has an Apache 2.0 license. We plan to make money by offering organizations a paid control plane, where dlt users can track and policy what every pipeline does, manage schemas and contracts across organization, create data catalogues, and share them with the team members and customers.

verified-sources

2 40 9.0 Python

Contribute to dlt verified sources 🔥

- get data from any storage bucket:https://github.com/dlt-hub/verified-sources/tree/master/sour...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Data load tool (dlt) – open-source Python library that makes data loading easy

1 project | news.ycombinator.com | 17 Oct 2023
[Discussion] How to implement Data Contracts generically? Seeking advice from data contract users.

1 project | /r/MachineLearning | 6 Sep 2023
Ask HN: Freelancer? Seeking freelancer? (December 2023)

3 projects | news.ycombinator.com | 3 Dec 2023
Can we take a moment to appreciate how much of dataengineering is open source?

8 projects | /r/dataengineering | 23 Nov 2022
What is a personality type of a Data Engineer?

2 projects | /r/dataengineering | 26 Oct 2022

Show HN: Data load tool(dlt)-Python library to automate the creation of datasets

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Data Python data-engineering data-lake data-loading
Post date: 24 Oct 2023

dlt

verified-sources

InfluxDB

Related posts

Data load tool (dlt) – open-source Python library that makes data loading easy

[Discussion] How to implement Data Contracts generically? Seeking advice from data contract users.

Ask HN: Freelancer? Seeking freelancer? (December 2023)

Can we take a moment to appreciate how much of dataengineering is open source?

What is a personality type of a Data Engineer?

Show HN: Data load tool(dlt)-Python library to automate the creation of datasets

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Data Python data-engineering data-lake data-loading Post date: 24 Oct 2023

dlt

verified-sources

InfluxDB

Related posts

Data load tool (dlt) – open-source Python library that makes data loading easy

[Discussion] How to implement Data Contracts generically? Seeking advice from data contract users.

Ask HN: Freelancer? Seeking freelancer? (December 2023)

Can we take a moment to appreciate how much of dataengineering is open source?

What is a personality type of a Data Engineer?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Data Python data-engineering data-lake data-loading
Post date: 24 Oct 2023