-
Notifications
You must be signed in to change notification settings - Fork 136
feat: add embedding hiding configuration and align spec with instrumentation #2162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I will pull out of draft once I unbreak the build |
7c023fa
to
5a25a52
Compare
5a25a52
to
8cfa839
Compare
@mikeldking this should be ready to review because the code that does embedding is up to date. There are a couple red packages unrelated to this change and need some work to get green. Agno: It essentially doesn't function because it's only partially ported to version 2.0. We need to bump the minimum version requirement to 2.0, as 1.5 is no longer compatible. Additionally, the tests aren't correct because they are only catching the LLM spans not the http requests made by the agent. This results in inconsistent runs.If you'd like, I can share my partially completed branch, but I won't be able to dedicate time to iterating on it further. smolagents : Fixing the mypy errors in smolagents might require less effort overall. |
Signed-off-by: Adrian Cole <adrian@tetrate.io>
here's the branch but I can't complete but got pretty far https://github.com/Arize-ai/openinference/compare/main...codefromthecrypt:openinference:refactor-agno-tests?expand=1 |
Signed-off-by: Adrian Cole <adrian@tetrate.io>
added the smolagents fix. crewai is new but I can't reproduce that locally. agno is the biggie, but not related to this change either |
"pytest-recording", | ||
"openai", | ||
"ddgs", | ||
"duckduckgo-search", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need this to be ddgs as it got renamed for agno
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool.
FYI I plan to break apart this PR into pieces, to make it easier to review. so any comments here I'll carry over to the partitioned PRs and mark this draft until only the spec/substrate remains. |
part 1: openai #2210 once in I'll do the same changes for litellm and pull both off this PR |
Impact
This PR enhances data privacy for embedding operations and brings consistency across all embedding instrumentation providers, including BeeAI and Haystack.
Key Features
🔒 Privacy Controls for Embeddings
OPENINFERENCE_HIDE_EMBEDDINGS_VECTORS
: Redacts embedding vectors with"__REDACTED__"
OPENINFERENCE_HIDE_EMBEDDINGS_TEXT
: Redacts embedding text content🎯 Standardized Embedding Instrumentation
All providers now use:
"CreateEmbeddings"
embedding.embeddings.N.embedding.{text|vector}
embedding.invocation_parameters
llm.system
attribute for provider identificationProvider-specific improvements:
BeeAI:
embedding.invocation_parameters
extraction from input eventsllm.system: "beeai"
identificationHaystack:
llm.system
based on component class nameOpenAI/LiteLLM:
📋 Specification Alignment