Adopt pagination to list_shared_examples to ensure all data returned #1687
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bug and Reproduce
When using client.clone_public_dataset(), only the first 100 examples of a large dataset would be cloned instead of the full dataset. This causes data miss and user confusion.
Reproduction steps (attach a script at the end of PR as well):
Solution
Refactored the method to use the existing helper function _get_paginated_list, which correctly handles paginated API responses.
This change resolves the input data extraction issue and is confirmed with a cross-langsmith site, see test section.
Test
With the PR's change, I conducted data clone (from langsmith personal cloud account, to a langsmith through local docker-compose) and successfully verified clone data. See reproduce with below script.
In addition, added a unit test in test_client.py to make sure list_shared_examples handles pagination correctly.
More Considerations
Feel free to let me know if you’d like to include more coverage in integration testing. I took a look of all methods using request_with_retries() in client.py, seems all safe (they either are POST, or single ID lookup with no pagination risk), however, i'm not sure about this one since i've not used splits feature, will verify offline with owner