-
Notifications
You must be signed in to change notification settings - Fork 84
system store - reduce load time (part of performance effort 4.20) #9096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe changes in this pull request introduce a new configuration constant Changes
Sequence Diagram(s)sequenceDiagram
participant SystemStore
participant Database
SystemStore->>Database: Read new data (limit concurrency by SYSTEM_STORE_LOAD_CONCURRENCY)
Database-->>SystemStore: Return data
SystemStore->>SystemStore: Schedule periodic refresh calls
The sequence diagram shows the Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
npm error Exit handler never called! 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
⏰ Context from checks skipped due to timeout of 90000ms (3)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/server/system_services/system_store.js (1)
439-444
: Minor: expensive debug payload serialisation
JSON.stringify(new_data)
andutil.inspect
on large datasets allocate tens of MB and block the event-loop even when log level is low.
Consider wrapping both calls with the samedbg.should_log(1)
guard to avoid the work entirely.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
config.js
(1 hunks)src/server/system_services/system_store.js
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build-noobaa-image
- GitHub Check: run-jest-unit-tests
- GitHub Check: run-package-lock-validation
🔇 Additional comments (3)
src/server/system_services/system_store.js (3)
354-356
: Constructor: keep the config values but fail fast on nonsense input
config.LOAD_FRESH_THRESHOLD
andconfig.READ_NEW_DATA_CONCURRENCY
are already integers.
Still, enforce basic sanity ( > 0 ) to avoid later surprises.-this.LOAD_FRESH_THRESHOLD = config.LOAD_FRESH_THRESHOLD || 100; -this.READ_NEW_DATA_CONCURRENCY = config.READ_NEW_DATA_CONCURRENCY || 5; +this.LOAD_FRESH_THRESHOLD = Math.max(1, config.LOAD_FRESH_THRESHOLD); +this.READ_NEW_DATA_CONCURRENCY = Math.max(1, config.READ_NEW_DATA_CONCURRENCY);
418-419
: Good move: instantiateSystemStoreData
once per loadCreating
new_data
outside the semaphore saves redundant allocations in contended scenarios.
520-520
:✅ Verification successful
Verify
P.map_with_concurrency
exists – otherwise this is a breaking change
P.map_with_concurrency
is not part of the standard bluebird or nativePromise
helpers.
If the localutil/promise
never exported it, this will throw at runtime.If not found, either add the helper or fall back to
P.map
with explicit throttling.
🏁 Script executed:
#!/bin/bash # Check whether the helper is defined anywhere in the repo rg --no-heading -n "function .*map_with_concurrency" | headLength of output: 146
Verified helper implementation:
map_with_concurrency
is defined insrc/util/promise.js
(line 52), so this call is safe and does not introduce a breaking change.
config.js
Outdated
config.LOAD_FRESH_THRESHOLD = parseInt(process.env.LOAD_FRESH_THRESHOLD, 10) || 100; | ||
config.READ_NEW_DATA_CONCURRENCY = parseInt(process.env.READ_NEW_DATA_CONCURRENCY, 10) || 5; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Guard against non-positive or NaN values when parsing the new tunables
parseInt()
returns NaN
for empty/non-numeric input, and users might also pass 0
or negative numbers by mistake.
Relying on the ||
fallback keeps the default safe, but if a caller explicitly sets READ_NEW_DATA_CONCURRENCY=0
or <0
, the code will silently fall back to 5
, hiding the mis-configuration. Make the validation explicit and fail early.
-config.LOAD_FRESH_THRESHOLD = parseInt(process.env.LOAD_FRESH_THRESHOLD, 10) || 100;
-config.READ_NEW_DATA_CONCURRENCY = parseInt(process.env.READ_NEW_DATA_CONCURRENCY, 10) || 5;
+const _freshRaw = parseInt(process.env.LOAD_FRESH_THRESHOLD, 10);
+config.LOAD_FRESH_THRESHOLD = Number.isFinite(_freshRaw) && _freshRaw > 0 ? _freshRaw : 100;
+
+const _concurrencyRaw = parseInt(process.env.READ_NEW_DATA_CONCURRENCY, 10);
+config.READ_NEW_DATA_CONCURRENCY =
+ Number.isFinite(_concurrencyRaw) && _concurrencyRaw > 0 ? _concurrencyRaw : 5;
This makes wrong inputs visible instead of being silently ignored.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
config.LOAD_FRESH_THRESHOLD = parseInt(process.env.LOAD_FRESH_THRESHOLD, 10) || 100; | |
config.READ_NEW_DATA_CONCURRENCY = parseInt(process.env.READ_NEW_DATA_CONCURRENCY, 10) || 5; | |
const _freshRaw = parseInt(process.env.LOAD_FRESH_THRESHOLD, 10); | |
config.LOAD_FRESH_THRESHOLD = Number.isFinite(_freshRaw) && _freshRaw > 0 ? _freshRaw : 100; | |
const _concurrencyRaw = parseInt(process.env.READ_NEW_DATA_CONCURRENCY, 10); | |
config.READ_NEW_DATA_CONCURRENCY = | |
Number.isFinite(_concurrencyRaw) && _concurrencyRaw > 0 ? _concurrencyRaw : 5; |
🤖 Prompt for AI Agents
In config.js around lines 253 to 255, the code uses parseInt with a fallback
that silently ignores invalid, zero, or negative values for LOAD_FRESH_THRESHOLD
and READ_NEW_DATA_CONCURRENCY. To fix this, add explicit validation after
parsing to check if the values are positive integers; if not, throw an error or
exit early to fail fast and make misconfigurations visible instead of defaulting
silently.
38341fc
to
50e4d2c
Compare
50e4d2c
to
5c2e8fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/server/system_services/system_store.js (1)
415-426
: Address the past review concern about fresh-skip logic.The past review comment identified a critical issue where the fresh-skip logic ignores the
since
parameter, potentially dropping updates when a caller requests data with an older timestamp but the store is considered "fresh".The concern raised in the past review about the fresh-skip logic ignoring the
since
hint is still valid and should be addressed. The current implementation could skip necessary reloads whensince < last_update_time
but the data is considered fresh according to the threshold.
🧹 Nitpick comments (1)
src/server/system_services/system_store.js (1)
363-364
: Review the periodic refresh strategy for potential interference.The periodic refresh using
setInterval
ensures data freshness, but consider potential issues:
- The refresh runs every 10 minutes regardless of system activity
- It might interfere with explicit load operations or cause unnecessary database load during low-activity periods
- The arrow function binding might not preserve the correct
this
contextConsider using a more sophisticated refresh strategy:
- setInterval(this.refresh, this.START_REFRESH_THRESHOLD); + setInterval(() => this.refresh(), this.START_REFRESH_THRESHOLD);And potentially add activity-based refresh control to avoid unnecessary refreshes during idle periods.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
config.js
(1 hunks)src/server/bg_services/cluster_hb.js
(1 hunks)src/server/bg_services/md_aggregator.js
(3 hunks)src/server/system_services/system_store.js
(6 hunks)
✅ Files skipped from review due to trivial changes (1)
- config.js
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: run-jest-unit-tests
- GitHub Check: Build Noobaa Image
- GitHub Check: run-package-lock-validation
- GitHub Check: Summary
🔇 Additional comments (9)
src/server/bg_services/cluster_hb.js (1)
96-96
: LGTM - Performance optimization for heartbeat updates.Disabling notifications for heartbeat updates reduces unnecessary cluster propagation overhead. This is appropriate since heartbeat data is typically consumed locally and doesn't require immediate cluster-wide synchronization.
src/server/bg_services/md_aggregator.js (3)
71-71
: LGTM - Performance optimization for storage stats updates.Disabling notifications for incremental storage statistics updates is appropriate since these are internal maintenance operations that don't require immediate cluster-wide propagation.
81-81
: LGTM - Consistent with batch processing approach.Disabling notifications for the global last update timestamp aligns with the incremental processing strategy where notifications are handled separately from the update batching.
209-209
: Verify notification strategy for time skew reset operations.While disabling notifications here aligns with the performance optimization, the time skew reset operation resets all storage statistics to NOOBAA_EPOCH. This is a significant system state change that might benefit from immediate cluster propagation to ensure consistency.
Consider whether this critical recovery operation should retain notifications or if the delayed propagation is acceptable for your use case.
src/server/system_services/system_store.js (5)
354-354
: LGTM - Configurable concurrency control for database load balancing.Adding configurable concurrency control helps prevent database overload during system store loading. The default value of 5 seems reasonable for most deployments.
515-515
: LGTM - Concurrency control for database collection reads.Replacing
P.map
withP.map_with_concurrency
using the configured limit prevents overwhelming the database during incremental data loading. This addresses the performance objective of optimizing concurrent database queries.
607-607
: LGTM - Flexible notification control for performance optimization.Adding the
publish
parameter with a default value oftrue
maintains backward compatibility while allowing callers to disable notifications for performance-sensitive operations. This supports the performance optimization goals.
617-617
: LGTM - Conditional notification publishing implementation.The conditional check properly implements the notification control based on the
publish
parameter, maintaining the existing behavior when notifications are enabled.
433-439
: Enhanced logging provides valuable performance insights.The detailed logging with conditional debug level checks helps monitor system store performance without impacting production performance when debugging is disabled.
5c2e8fb
to
b89f054
Compare
b89f054
to
4b2fe6d
Compare
Signed-off-by: Amit Prinz Setter <alphaprinz@gmail.com>
4b2fe6d
to
36b9892
Compare
} | ||
//call refresh periodically | ||
P.delay_unblocking(this.START_REFRESH_THRESHOLD).then(this.refresh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can cause a lot of refresh()
calls to be queued. I believe that refresh()
is called on every RPC request as a middleware (see here), so I don't think we should schedule another refresh.
Describe the Problem
Reduce system_store.load() cpu load.
Explain the Changes
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
Summary by CodeRabbit
Here are the updated release notes: