Skip to content

Conversation

PhiRho
Copy link
Collaborator

@PhiRho PhiRho commented May 13, 2025

Adds instrumentation for basic RED metrics, as well as custom tracking on authentication endpoints to keep track of incoming email addresses.

Example dashboard is included (local grafana and prometheus)
image

PhiRho added 7 commits May 13, 2025 10:58
- Updated docker-compose.yml to include Prometheus and Grafana services.
- Added environment variable ENABLE_METRICS to enable Prometheus metrics in the backend.
- Implemented Prometheus middleware for FastAPI to track HTTP requests and audit events.
- Enhanced audit logging to record metrics for events and durations.
- Created Prometheus configuration file for scraping metrics from the FastAPI application.
- Added Grafana provisioning files for dashboards and data sources to visualize metrics.
- Updated requirements.txt to include Prometheus client libraries.
- Updated docker-compose.yml to include an env_file for environment variables.
- Refactored login and registration endpoints in auth.py to utilize a new metrics tracking function for authentication attempts.
- Introduced a new auth.py module for tracking authentication metrics with Prometheus.
- Simplified error handling and improved user feedback during login and registration processes.
- Updated Grafana dashboard configuration to visualize authentication metrics.
- Updated the login and registration endpoints in auth.py to enhance user experience with detailed documentation and error handling.
- Removed metrics tracking for authentication attempts to streamline the login process.
- Added support for both form data and JSON formats in login and registration requests.
- Improved the sign-in process to automatically register users and create teams if they do not exist.
- Enhanced error messages for invalid login and sign-in data.
- Added a new function to track authentication requests using Prometheus metrics in auth.py.
- Updated login, registration, and email validation endpoints to call the tracking function.
- Refactored Prometheus middleware to handle error responses and record metrics accurately.
- Modified Grafana dashboard configuration to reflect changes in authentication metrics tracking.
- Added a logging mechanism for authentication events in auth.py, including login attempts, registration, and email validation.
- Configured a TimedRotatingFileHandler to manage log files for authentication events.
- Updated Prometheus middleware to track authentication request metrics with success and failure statuses.
- Enhanced Grafana dashboard configuration to reflect changes in authentication metrics and improve visualization.
- Cleaned up unused code related to previous metrics tracking.
- Introduced AuthMiddleware to handle user authentication and store user data in request state.
- Updated existing middleware (AuditLogMiddleware and PrometheusMiddleware) to utilize user data from request state instead of directly querying for user information.
- Enhanced get_current_user_from_auth function to check for user in request state, improving efficiency.
- Cleaned up unused authentication code in middleware for better maintainability.
@PhiRho
Copy link
Collaborator Author

PhiRho commented May 14, 2025

Latest version of the default dashboard, taking into account cardinality fixes in the code:
image

Copy link

@czue czue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! keeping in mind that i don't know much about prometheus and didn't review the config files

class AuthMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
# Skip auth for certain paths
if request.url.path in ["/health", "/docs", "/openapi.json", "/metrics"]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit picky but could maybe benefit from being extracted to const.PUBLIC_ENDPOINTS or similar and then using that here and here, just imagining it'd be easier to manage centrally whenever the list changes...

authorization=auth_header if auth_header else None,
db=db
)
# Store essential user data instead of the full SQLAlchemy object
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the benefit of doing this? especially given that at least sometimes we need the full SQLAlchemy object later (via get_current_user_from_auth)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug fixing... I got a DetachedInstanceError in some of my tests due to lazy loading some of the fields on the user later on. We do still have to reload the SA object later, but it is fewer times than it was.

PhiRho added 2 commits May 14, 2025 13:00
- Updated the `update_budget_period` function to retrieve spend information from the `info` key in the response, improving data handling.
- Added error logging to capture exceptions during budget period updates, enhancing traceability.
- Refactored user ID retrieval in `AuditLogMiddleware` to handle different user data structures, improving robustness.
- Introduced a new test for updating budget duration as a team admin, ensuring proper functionality and API interaction.
- Added a new PUBLIC_PATHS setting in config.py to centralize the definition of public endpoints.
- Updated AuditLogMiddleware, AuthMiddleware, and PrometheusMiddleware to utilize the PUBLIC_PATHS setting for path checks, improving maintainability and consistency across middleware.
@PhiRho PhiRho merged commit 30e60f5 into dev May 14, 2025
1 check passed
@PhiRho PhiRho deleted the backend-metrics branch May 15, 2025 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants