Adopt pagination to list_shared_examples to ensure all data returned #1687

EugeneJinXin · 2025-04-23T23:29:50Z

Bug and Reproduce

When using client.clone_public_dataset(), only the first 100 examples of a large dataset would be cloned instead of the full dataset. This causes data miss and user confusion.

Reproduction steps (attach a script at the end of PR as well):

Find a public dataset with >100 examples
Run client.clone_public_dataset(...)
Count examples in the cloned dataset
Result: Only 100 examples are copied

Solution

Refactored the method to use the existing helper function _get_paginated_list, which correctly handles paginated API responses.

This change resolves the input data extraction issue and is confirmed with a cross-langsmith site, see test section.

Test

With the PR's change, I conducted data clone (from langsmith personal cloud account, to a langsmith through local docker-compose) and successfully verified clone data. See reproduce with below script.

eujin@eujin-mn1 langsmith-project % python3 /Users/eujin/langsmith-project/main.py -v
Passed! All 214 rows cloned

def reproduce_issue():
    """ Clone a dataset with 200+ example, verify output is exact match"

    ls_client = Client(api_url='http://localhost:1980/api/v1')
    dataset_name = "eujin_test_200_rows"

    dataset_public_url = (
        "https://smith.langchain.com/public/0dfe83c3-079e-4ee3-b6a5-01a6508066ea/d"
    )
    ls_client.clone_public_dataset(dataset_public_url)
    cloned_dataset = ls_client.read_dataset(dataset_name=dataset_name)
    
    assert cloned_dataset.example_count == 214, f"Expected 214 examples, got {cloned_dataset.example_count}"
    
    print("Passed! All 214 rows cloned")

In addition, added a unit test in test_client.py to make sure list_shared_examples handles pagination correctly.

More Considerations
Feel free to let me know if you’d like to include more coverage in integration testing. I took a look of all methods using request_with_retries() in client.py, seems all safe (they either are POST, or single ID lookup with no pagination risk), however, i'm not sure about this one since i've not used splits feature, will verify offline with owner

jacoblee93 · 2025-10-03T00:50:28Z

python/langsmith/client.py

            share_token (Union[UUID, str]): The share token or URL of the shared dataset.
            example_ids (Optional[List[UUID, str]], optional): The IDs of the examples to filter by. Defaults to None.
-
+            limit (Optional[int]): Maximum number of examples to return, by default None.


limit (Optional[int]): Maximum number of examples to return, by default None. -> limit (Optional[int]): Maximum number of examples to return. Defaults to no limit.

Adopt pagination to list_shared_examples to ensure all data returned

1de6a75

EugeneJinXin marked this pull request as ready for review April 24, 2025 00:04

EugeneJinXin requested review from baskaryan and jacoblee93 October 3, 2025 00:41

jacoblee93 approved these changes Oct 3, 2025

View reviewed changes

jacoblee93 reviewed Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adopt pagination to list_shared_examples to ensure all data returned #1687

Adopt pagination to list_shared_examples to ensure all data returned #1687

Uh oh!

EugeneJinXin commented Apr 23, 2025 •

edited

Loading

Uh oh!

jacoblee93 Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adopt pagination to list_shared_examples to ensure all data returned #1687

Are you sure you want to change the base?

Adopt pagination to list_shared_examples to ensure all data returned #1687

Uh oh!

Conversation

EugeneJinXin commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacoblee93 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EugeneJinXin commented Apr 23, 2025 •

edited

Loading