-
Notifications
You must be signed in to change notification settings - Fork 430
[clickpipes] update FAQ guidance for initial load PG<=13 #5271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
| 1. **Drop the existing pipe**: This is necessary to apply new settings. | ||
| 2. **Delete destination tables on ClickHouse**: Ensure that the tables created by the previous pipe are removed. | ||
| 3. **Create a new pipe with optimized settings**: Typically, increase the snapshot number of rows per partition to between 1 million and 10 million, depending on your specific requirements and the load your Postgres instance can handle. | ||
| For Postgres versions 13 or lower, CTID range scans are very slow and therefore ClickPipes does not use them. Instead we read the entire table as a single partition, essentially making it single-threaded (therefore ignoring both number of rows per partition and parallel threads settings). It is critical to adjust these settings to instead move multiple tables in parallel for fast initial loads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to mention custom partitioning column for PG13?
| 3. **Create a new pipe with optimized settings**: Typically, increase the snapshot number of rows per partition to between 1 million and 10 million, depending on your specific requirements and the load your Postgres instance can handle. | ||
| For Postgres versions 13 or lower, CTID range scans are very slow and therefore ClickPipes does not use them. Instead we read the entire table as a single partition, essentially making it single-threaded (therefore ignoring both number of rows per partition and parallel threads settings). It is critical to adjust these settings to instead move multiple tables in parallel for fast initial loads. | ||
| These adjustments should significantly enhance the performance of the initial load, especially for older Postgres versions. If you are using Postgres 14 or later, these settings are less impactful due to improved support for CTID range scans. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now refers to the deleted lines
Summary
Checklist