NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Parallel Snapshotting: make pg_dump and pg_restore multi-threaded per table (blog.peerdb.io)
cswilliams 10 days ago [-]
Several years ago at a company I no longer work at, when we swapped from single-threaded pgdumps (from a replica) to multi-threaded pgdumps (-j flag), we occasionally saw some dumps which were "inconsistent". Not sure if that's the best term for it, but we would occasionally get duplicate foreign key errors on pg_restore or even if pg_restore succeeded, sometimes primary key sequences would be less than the maximum id of our table (so new inserts would fail). This was on standard postgres 13.3 (not aurora) on AWS RDS. Nothing really noteworthy about our setup either that I can remember. I tried reporting it at the time on both the postgres mailing list and with AWS support and unfortunately both pointed fingers at the other and I eventually gave up since it was not easy to reproduce. We unfortunately ended up having to go back to single threaded dumps which never produced any of these "corrupt" dumps but was obviously much slower.
lordgilman 9 days ago [-]
I can’t speak specifically to duplicate foreign key errors but I possibly fixed that bug via a fix to parallel dump/restore in the 12-13 era. It would manifest as a dump that would fail on restore with errors related to schema dependencies (trying to restore database objects out-of-order).
porsager 10 days ago [-]
I was expecting to see https://github.com/dimitri/pgcopydb when clicking this. How is this different?
saisrirampur 10 days ago [-]
OP here. pgcopydb is also a great tool. It does pg_dump|pg_restore. It cannot multi-thread (parallelize) single table data migration
lmz 10 days ago [-]
seems like it can parallelize to an extent: https://pgcopydb.readthedocs.io/en/latest/concurrency.html#s...
fyrn_ 10 days ago [-]
It only uses pg_dump and pg_restore for transfer of the schema, the actual data transfer is a custom mechanism.
joevandyk 10 days ago [-]
does pgcopydb support copying a single table in multiple threads?
digger495 10 days ago [-]
Because this is advertising :)
victor106 9 days ago [-]
Does this work with Azure|AWS Postgres as source and replicate to an onPrem Postgres and vice-versa?
saisrirampur 9 days ago [-]
Yes it should work for any postgres database as a source or a target. Let it be on the cloud or on-premise
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 12:10:45 GMT+0000 (Coordinated Universal Time) with Vercel.