Our great sponsors
grai-core | jupysql | |
---|---|---|
6 | 8 | |
269 | 598 | |
2.2% | 7.0% | |
9.5 | 9.3 | |
3 days ago | 16 days ago | |
Python | Python | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
grai-core
-
Launch HN: Grai (YC S22) β Open-Source Data Observability Platform
Elastic v2 if one is interested in such things: https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE
-
Standalone lineage tool
Iβm not sure if this is precisely what youβre looking for but Grai might serve your needs. The backend data model allows you to push any arbitrary metadata you want / need onto the lineage graph and retrieve it either through the rest or graph API. Iβm one of the authors so happy to answer any questions you might have.
-
Data Load Diagram
We've been looking at building something like this for Grai specifically to support Airflow but haven't yet prioritized it.
-
Grai, a self-hosted data lineage tool. Test downstream impact of data migration changes
We were frustrated because although we had tests in our data warehouse, they only notified us after an outage occurred. What we needed was a way to detect changes during CI/CD, so we could fix things before they impacted production. So we developed Grai, as an open-source data lineage toolkit pre-built integrations for the most common data stores and designed to work with CI tools, like Github Actions.
jupysql
-
Show HN: JupySQL β a SQL client for Jupyter (ipython-SQL successor)
Hey, HN community!
We're stoked to launch JupySQL today! JupySQL is an open-source library that brings a modern SQL experience to Jupyter. JupySQL is compatible with all major databases, such as Snowflake, Redshift, PostgreSQL, MySQL, MariaDB, DuckDB, SQL Server, Clickhouse, Trino, and more!
To get started, check out our tutorial: https://jupysql.ploomber.io/en/latest/quick-start.html
SQL is the defacto language for data analysis; however, analysis often requires a mix of SQL and Python. JupySQL bridges this gap, allowing users to execute SQL queries seamlessly in Jupyter and continue their analysis in Python. Add %%sql to the top of your cell and start writing SQL.
Here are some of JupySQL's main features:
- Syntax highlighting
-
JupySQL: Connecting to a SQL database from Jupyter
Please show your support with a π: https://github.com/ploomber/jupysql
- GitHub - ploomber/jupysql: Better SQL in Jupyter. π
- SQL CTE's in Jupyter notebooks, DuckDB integration and more
- TL;DR incorporate SQL functionality within Jupyter, access to modern data processing DBs (like DuckDB), polars and data exploration through plotting easier with JupySQL.
-
Evidence β Business Intelligence as Code
If anyone is looking for something like this in Python/Jupyter, check out JupySQL: https://github.com/ploomber/jupysql
- A full-featured SQL client for Jupyter
-
Pandas v2.0 Released
How are people managing the existence of data frame APIs like pandas/polars with SQL engines like BigQuery, Snowflake, and DuckDB?
Most of my notebooks are a mix of SQL and Python: SQL for most processing, dump the results as a pandas dataframe (via https://github.com/ploomber/jupysql) and then use Python for operations that are difficult to express with SQL (or that I don't know how to do it), so I end up with 80% SQL, 20% Python.
Unsure if this is the best workflow but it's the most efficient one I've come up with.
Disclaimer: my team develops JupySQL.
What are some alternatives?
dbt-snowflake-monitoring - A dbt package from SELECT to help you monitor Snowflake performance and costs
tpch
awesome-data-catalogs - π Awesome Data Catalogs and Observability Platforms.
datapane - Build and share data reports in 100% Python
MindsDB - The platform for customizing AI from enterprise data
nba-monte-carlo - Monte Carlo simulation of the NBA season, leveraging dbt, duckdb and evidence.dev
django-pgschemas - Django multi-tenancy through Postgres schemas
chdb-server-bak - API Server for chDB, an in-process SQL OLAP Engine powered by ClickHouse
sqlparse - A non-validating SQL parser module for Python
pytest-mock-resources - Pytest Fixtures that let you actually test against external resource (Postgres, Mongo, Redshift...) dependent code.
ibis - the portable Python dataframe library
prism - Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.