Skip to content

Conversation

Abhinavexists
Copy link

@Abhinavexists Abhinavexists commented May 27, 2025

Fixes #31334

This PR fixes a pydantic.ValidationError that occurs when using retriever.invoke() with ChromaDB-backed retrievers. The error is triggered when ChromaDB returns documents where page_content is None, which violates Pydantic's validation rules during Document instantiation.

To address this, the following internal methods were updated:

  • _results_to_docs_and_scores
  • _results_to_docs_and_vectors
  • get_by_ids

These methods now filter out documents with None as page_content, preventing invalid entries from reaching the validation layer. Additionally, this PR fixes a missing ID assignment in _results_to_docs_and_vectors() to ensure consistent ID handling across all retrieval methods.

Testing

image

  • All existing integration tests pass (34 passed, 3 skipped)
  • Added unit tests for retrieval scenarios involving None page_content

Copy link

vercel bot commented May 27, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
langchain Ignored Ignored Preview Sep 11, 2025 7:24pm

@dosubot dosubot bot added the size:M label May 27, 2025
@dosubot dosubot bot added Ɑ: retriever bug Related to a bug, vulnerability, unexpected error with an existing feature labels May 27, 2025
@Abhinavexists
Copy link
Author

@ccurme can you review this fix

Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of unintentional changes in this PR.

  1. Empty files
  2. changes to the uv lock file
  3. creation of an extra uv.lock file

etc

@eyurtsev eyurtsev self-assigned this Jun 2, 2025
@Abhinavexists
Copy link
Author

There are a number of unintentional changes in this PR.

  1. Empty files
  2. changes to the uv lock file
  3. creation of an extra uv.lock file

etc

Sure ,fixing those changes

Copy link

codspeed-hq bot commented Jun 2, 2025

CodSpeed WallTime Performance Report

Merging #31377 will not alter performance

Comparing Abhinavexists:feature/chromadb-validationerr (0bb71e8) with master (c687965)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 13 untouched

Copy link

codspeed-hq bot commented Jun 2, 2025

CodSpeed Instrumentation Performance Report

Merging #31377 will not alter performance

Comparing Abhinavexists:feature/chromadb-validationerr (0bb71e8) with master (c687965)

Summary

✅ 14 untouched

@Abhinavexists
Copy link
Author

@eyurtsev
I've reverted the unintentional dependency changes. However, the CI is still failing with ModuleNotFoundError: No module named 'pytest_benchmark' when importing from langchain_tests. The langchain-tests package already includes this dependency. Could you provide guidance on how to properly resolve this CI issue?

@mdrxy mdrxy changed the title chroma: fix pydantic validation error when using retriever.invoke fix(chroma): pydantic validation error when using retriever.invoke() Jul 16, 2025
@Abhinavexists
Copy link
Author

@eyurtsev is there any pending changes in the pr or everything is fine ?

@mdrxy mdrxy added the integration Related to a provider partner package integration label Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Related to a bug, vulnerability, unexpected error with an existing feature integration Related to a provider partner package integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pydantic validation error when using retriever.invoke()
4 participants