You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add latency tracking and enhanced token usage details (#665)
## Summary
Added latency tracking and enhanced token usage details for LLM responses, providing more granular metrics for performance monitoring and cost analysis.
## Changes
- Added latency tracking to Vertex provider responses, measuring and reporting API call duration in milliseconds
- Enhanced `BifrostLLMUsage` structure with detailed token usage fields for both prompt and completion tokens
- Added support for specialized token types like cached tokens, reasoning tokens, audio tokens, and prediction tokens
- Implemented conversion methods between different response formats to preserve token usage details
- Updated UI to display detailed token usage information when available
- Added support for vertex provider/model format in pricing lookup
- Removed commented-out code related to response pooling in Vertex provider
## Type of change
- [x] Feature
- [x] Refactor
## Affected areas
- [x] Core (Go)
- [ ] Transports (HTTP)
- [x] Providers/Integrations
- [x] Plugins
- [x] UI (Next.js)
- [ ] Docs
## How to test
Test the latency tracking and enhanced token usage details with various providers:
```sh
# Core/Transports
go version
go test ./...
# UI
cd ui
pnpm i
pnpm test
pnpm build
```
Verify that latency is reported in milliseconds in the response and that detailed token usage information is displayed in the UI when available.
## Breaking changes
- [ ] Yes
- [x] No
## Related issues
N/A
## Security considerations
No security implications as this PR only enhances metrics reporting.
## Checklist
- [x] I read `docs/contributing/README.md` and followed the guidelines
- [x] I added/updated tests where appropriate
- [x] I updated documentation where needed
- [x] I verified builds succeed (Go and UI)
- [x] I verified the CI pipeline passes locally if applicable
0 commit comments