pg-bulk-ingest
asyncpg
pg-bulk-ingest | asyncpg | |
---|---|---|
3 | 16 | |
34 | 6,699 | |
- | 1.3% | |
8.7 | 6.4 | |
15 days ago | 8 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pg-bulk-ingest
-
Show HN: pg-bulk-ingest – now with multi-table support
Ah the name - you're not the first to mention it! Do you (or anyone lurking...) have any suggestions as to what it might better be called?
On what it does/why it exists, we've kept the README quite light to avoid duplication, with the main bits of the docs at https://pg-bulk-ingest.docs.trade.gov.uk/
But to try to answer the question here:
A set of insert statements - there are lots of cases where this would be fine, so pg-bulk-ingest (/its future name ;-) would be unnecessary, and so you might as well use insert statements.
But there are lots of things that pg-bulk-ingest does that a set of insert statements don't:
- It uses COPY, which in many cases is (much?) faster than INSERT
- Show HN: pg-bulk-ingest – Bulk ingest into PostgreSQL with high-watermarking
asyncpg
- PyPy has been working for me for several years now
-
Ask HN: Is Python async/await some kind of joke?
- SqlAlchemy/asyncpg => you can’t use it if you’re using PgBouncer (necessary most of the time with Postgres) in transaction mode? What?? https://github.com/MagicStack/asyncpg/issues/1058
-
Differences from Psycopg2
OK I stand corrected, asyncpg has these two C files:
https://github.com/MagicStack/asyncpg/blob/master/asyncpg/pr...
https://github.com/MagicStack/asyncpg/blob/master/asyncpg/pr...
If you are interested here is a post by the psycopg author about psycopg2 and 3 and performance versus asyncpg.
https://www.varrazzo.com/blog/2020/05/19/a-trip-into-optimis...
- Asyncpg – A Fast PostgreSQL Database Client Library for Python/Asyncio
-
Ruby Outperforms C: Breaking the Catch-22
This pure Python library claims quite fabulous performance: https://github.com/MagicStack/asyncpg
I believe it because that team have done lots of great stuff but I haven't used it, I just remembered thinking it was interesting the performance was so good. Not sure how related it is to running on the asyncio loop (or which loop they used for benchmarks).
-
PgBouncer is useful, important, and fraught with peril
what a great post, we have had a ton of issues with users using pgbouncer and it's not because things are "broken" per se, it's just the situation is very complicated, and pgbouncer's docs are also IMO in need of updating to be more detailed and in a few critical cases less misleading, specifically the prepared statements docs.
This blog post refers to this misleading nature at https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html#pre... .
> PgBouncer says it doesn’t support prepared statements in either PREPARE or protocol-level format. What it actually doesn’t support are named prepared statements in any form.
That's also not really accurate. You can use a named prepared statement just fine in transaction mode. start a transaction (so you aren't in autocommit), use a named statement, works fine. you just can't use it again in another transaction, because it will be "gone" (more accurately, "unmoored" - might be in your session, might be in someone else's session). Making things worse, when the prepared statement is "unmoored", its name can then conflict with another client attempting to use the same name.
so to use named prepared statements, you can less ideally name them with random strings to avoid conflicts, or you can DEALLOCATE the prepared statement(s) you used at the end of your transaction. for our users that use asyncpg, we have them use a uuid for prepared statements to avoid these name conflicts (asyncpg added this feature for us here: https://github.com/MagicStack/asyncpg/issues/837). however, they can just as well use DEALLOCATE ALL, set this as their `server_reset_query`, and then so that happens in transaction mode, also set `server_reset_query_always`, so that it's called at the end of transactions. Where pgbouncer here IMO entirely misleadingly documents this as "This setting is for working around broken setups that run applications that use session features over a transaction-pooled PgBouncer." - which is why nobody uses it, because pgbouncer claims this is "broken". It's not any more broken than it is to switch out the PostgreSQL session underneath a connection that uses multiple transactions. Pgbouncer can do better here and make this clearer and more accommodating of real world database drivers.
-
Library to connect Python to Postgresql
asyncpg is another great driver if you're using asyncio and want maximum performance (although they also break with DBAPI, but the tradeoff may be worth it).
-
aiopg vs asyncpg vs psycopg3
asyncpg: 5.5k starts, last commit recently, ~150 issues, some incompatibility, few open PRs, extensive README. Includes benchmark showing it's supposedly 3x faster than aiopg and psycopg2, psycopg3 is not mentioned in the benchmark.
-
Announcing Quart-DB
Quart-DB uses asyncpg to manage the connections and buildpg to parse the named parameter bindings.
-
Should I use TimescaleDB or partitioning is enough?
A major performance boost specifically on inserts with timescaledb was actually starting to use https://github.com/MagicStack/asyncpg.