mack
ibis
mack | ibis | |
---|---|---|
5 | 23 | |
271 | 4,241 | |
- | 6.5% | |
5.9 | 10.0 | |
3 months ago | 2 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mack
-
Implementing and using SCD Type 2
There still library form databricks? But I have never used it: https://github.com/MrPowers/mack
-
Spark/databricks seems amazing?
I was a Databricks user for 5 years and spent almost all my time inside the IntelliJ IDE developing code. I wrote almost all code in a text editor, unit tested all code (actually authored the popular Scala Spark / PySpark testing libraries: https://github.com/MrPowers/) and had everything up with CI/CD. Lots of OSS PySpark/Scala Spark work too. I only used Databricks notebooks for data exploration and for lightweight notebooks that would invoke functions (that were defined in Python Wheel / JAR files). I am on the Delta Lake team at Databricks now and still do all my work in text editors (see this project: https://github.com/MrPowers/mack) and create lots of examples in Jupyter Notebooks. So I definitely think it's possible to limit notebook exposure.
-
PySpark OSS Contribution Opportunity
Great, would love your help. You can also check out the mack project if you'd like to work on a Delta Lake + PySpark project: https://github.com/MrPowers/mack/issues
-
Spark open source community is awesome
a couple devs just added a `find_compositite_keys_candidates` function so users can easily identify columns that could be used as a unique identifier in their Delta table.
-
How to append data to Delta tables without adding any duplicates
Fair points. Here's the code repo: https://github.com/MrPowers/mack
ibis
-
Show HN: Hashquery, a Python library for defining reusable analysis
I really don't understand the appeal of dbt vs a proper programming language. The templating approach leads to massive spaghetti. I look forward to trying out something like Ibis [0]
0: https://ibis-project.org/
-
This Week In Python
ibis – portable Python dataframe library
- Ibis: The portable Python dataframe library
- FLaNK Stack 26 February 2024
-
Quarto
The main benefit is that you get a Python (or R, Julia or Rust) interpreter. So you can evaluate code. A good example of the value of this is the Ibis docs which use Quarto: https://ibis-project.org/
-
Polars – A bird's eye view of Polars
Ive found polars quite intuitive, though for python, I lean more towards [ibis](https://ibis-project.org/). The interface is nearly identical, but ibis has the benefit if building sql queries before pulling any actual data (like dbplyr) — whereas polars requires the data to be in-memory (at least for rdb’s, though correct me if Im wrong)
this to me seems like a good argument for only using ibis, but Im happy to be convinced otherwise
- Ibis – Universal Interface for Data Wrangling
-
Vanna.ai: Chat with your SQL database
Please add Ibis Birdbrain https://ibis-project.github.io/ibis-birdbrain/ to the list. Birdbrain is an AI-powered data bot, built on Ibis and Marvin, supporting more than 18 database backends.
See https://github.com/ibis-project/ibis and https://ibis-project.org for more details.
- Ibis
What are some alternatives?
chispa - PySpark test helper methods with beautiful error messages
snowflake-connector-python - Snowflake Connector for Python
delta-rs - A native Rust library for Delta Lake, with bindings into Python
PySpark-Boilerplate - A boilerplate for writing PySpark Jobs
os-lib - OS-Lib is a simple, flexible, high-performance Scala interface to common OS filesystem and subprocess APIs
Apache Impala - Apache Impala
jodie - Delta lake and filesystem helper methods
pangres - SQL upsert using pandas DataFrames for PostgreSQL, SQlite and MySQL with extra features
sqlite_scanner - DuckDB extension to read and write to SQLite databases
katacoda
nodejs-polars - nodejs front-end of polars
django-clickhouse - This project's goal is to build Yandex ClickHouse database into Django project.