-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: downloads are never larger than 20mb #5247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements file size limiting for Google Drive downloads to prevent memory issues from large files. The changes add a size_threshold
parameter that flows through the entire Google Drive download pipeline, implementing chunked downloading with progress monitoring that aborts downloads exceeding the threshold.
Key modifications include:
- Adding
size_threshold
parameter todownload_request()
function signature - Creating a new
_donwload_request()
helper function that usesMediaIoBaseDownload
with chunked downloading (chunk size set to 2x the threshold) - Implementing progress monitoring that stops downloads when
resumable_progress > size_threshold
- Threading the
size_threshold
parameter through_download_and_extract_sections_basic()
and other related functions - Early termination returns empty bytes when threshold is exceeded
The implementation uses Google's chunked download API to monitor progress and prevent full download of oversized files, which improves system stability by avoiding memory exhaustion from large Google Drive files. This change integrates with the existing Google Drive connector architecture while adding size enforcement at the download level.
Confidence score: 2/5
- This PR has critical implementation issues that could cause runtime failures
- Score significantly lowered due to a typo in function name
_donwload_request
(missing 'w') which will cause NameError exceptions - Pay close attention to the new download functions and verify the typo is fixed before merging
1 file reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 1 file
React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we can address the comments I think this should be good.
Maybe we think a bit more about the magic number portion and then clean this up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This review covers only the changes made since the last review (commit 381269a), not the entire PR.
The most recent changes address previously identified issues in the Google Drive file download size enforcement implementation. The developer has made several important corrections:
- Fixed function name typo: Corrected
_donwload_request
to_download_request
- Fixed log message typo: Changed 'Skipping2.' to 'Skipping.'
- Improved chunk size calculation: Modified from
size_threshold * 2
tosize_threshold + CHUNK_SIZE_BUFFER
(64 bytes) - Added null safety: Implemented a check for
download_progress
before accessing its properties - Added new constant: Introduced
CHUNK_SIZE_BUFFER = 64
for more precise chunk size control
These changes refine the size threshold enforcement mechanism that prevents memory spikes and timeouts when downloading large Google Drive files. The chunk size adjustment is particularly important as it ensures the system only downloads slightly more than the threshold (20MB + 64 bytes) rather than potentially double the limit, making the size cap more effective. The null safety check prevents potential AttributeError exceptions during the download progress monitoring.
Confidence score: 4/5
- This PR addresses all previously identified issues and is safe to merge with minimal risk
- Score reflects the fixes to typos, logical improvements to chunk sizing, and addition of proper null safety checks
- Pay close attention to the download logic in
_download_request
function to ensure the size threshold enforcement works as expected in production
2 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
langgraph==0.2.72 | ||
langgraph-checkpoint==2.0.13 | ||
langgraph-sdk==0.1.44 | ||
lazy_imports==1.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
integration tests were failing bc we were missing this import
* fix: downloads are never larger than 20mb * JT comments * import to fix integration tests
* fix: downloads are never larger than 20mb * JT comments * import to fix integration tests
* fix: downloads are never larger than 20mb * JT comments * import to fix integration tests
Description
see title
How Has This Been Tested?
tested in UI
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.
Summary by cubic
Enforces a 20 MB cap on Google Drive file downloads and exports. Files that exceed the limit are skipped early to avoid memory spikes and timeouts.