-
Notifications
You must be signed in to change notification settings - Fork 82
feat(clp-package): Add Python profiling decorator for internal query performance measurement. #1423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new profiling utility module using pyinstrument, wires it into selected scheduler and executor task functions via a decorator, and declares pyinstrument as a dependency. No existing function signatures change; instrumentation is additive and supports sync/async functions with optional job/task context extraction. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Decorator as profile(...)
participant Target as Wrapped Function
participant Profiler as pyinstrument.Profiler
participant Storage as CLP_LOGS_DIR/profiles
Caller->>Decorator: invoke wrapped function(...)
alt profiling disabled
Decorator->>Target: call directly
Target-->>Decorator: result/exception
else profiling enabled
Decorator->>Decorator: extract job_id/task_id (args, dot-paths)
Decorator->>Profiler: start(interval)
note over Profiler,Decorator: If parent profiler active: skip with debug
Decorator->>Target: execute
Target-->>Decorator: result/exception
Decorator->>Profiler: stop()
Decorator->>Storage: save HTML + TXT with section/job/task/timestamp
end
Decorator-->>Caller: return result or raise
sequenceDiagram
autonumber
participant Celery as Celery Worker
participant Task as @profile + @app.task
participant Profiler as pyinstrument.Profiler
participant Storage as profiles/
Celery->>Task: run search/extract_stream
rect rgba(200,230,255,0.3)
note left of Task: Profiling wrapper
Task->>Profiler: start()
Task->>Task: execute task logic
Task->>Profiler: stop()
Task->>Storage: write reports
end
Task-->>Celery: result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/clp-py-utils/clp_py_utils/profiling_utils.py
(1 hunks)components/clp-py-utils/pyproject.toml
(1 hunks)components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py
(2 hunks)components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
(2 hunks)components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
profile
(28-119)
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (1)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
profile
(28-119)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (1)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
profile
(28-119)
🪛 Ruff (0.14.0)
components/clp-py-utils/clp_py_utils/profiling_utils.py
17-17: Import from collections.abc
instead: Callable
Import from collections.abc
(UP035)
17-17: typing.Tuple
is deprecated, use tuple
instead
(UP035)
54-54: Missing return type annotation for private function async_wrapper
(ANN202)
54-54: Missing type annotation for *args
(ANN002)
54-54: Missing type annotation for **kwargs
(ANN003)
70-71: Logging statement uses f-string
(G004)
88-88: Missing return type annotation for private function sync_wrapper
(ANN202)
88-88: Missing type annotation for *args
(ANN002)
88-88: Missing type annotation for **kwargs
(ANN003)
104-105: Logging statement uses f-string
(G004)
185-185: Do not catch blind exception: Exception
(BLE001)
186-186: Logging statement uses f-string
(G004)
216-216: Logging statement uses f-string
(G004)
222-222: datetime.datetime.now()
called without a tz
argument
(DTZ005)
246-246: f-string without any placeholders
Remove extraneous f
prefix
(F541)
257-259: Logging statement uses f-string
(G004)
263-263: Logging .exception(...)
should be used instead of .error(..., exc_info=True)
(G201)
263-263: Logging statement uses f-string
(G004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
🔇 Additional comments (1)
components/clp-py-utils/pyproject.toml (1)
19-19
: Verify runtime support for pyinstrument and packaging impact
- pyinstrument includes native components; confirm it installs cleanly in your worker/scheduler images and doesn’t bloat slim runtimes unnecessarily.
- If profiling is rarely enabled, consider making it an optional extra to keep minimal installs lean.
|
||
logger = logging.getLogger(__name__) | ||
|
||
F = TypeVar("F", bound=Callable[..., Any]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Type‑hints modernisation and stricter types (Ruff UP035, annotations)
- Import Callable from collections.abc.
- Prefer built‑in generic
tuple[str, str]
overtyping.Tuple
.
Apply this diff:
-from typing import Any, Callable, Optional, Tuple, TypeVar
+from typing import Any, Optional, TypeVar
+from collections.abc import Callable
@@
-) -> Tuple[str, str]:
+) -> tuple[str, str]:
Also applies to: 17-17, 128-128
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 17,
20-24 and 128, update type imports and tuple annotations: replace Callable
imported from typing with Callable from collections.abc, remove typing.Tuple
usage and use built-in generic tuple[str, str] where applicable, and adjust any
related imports (drop unused typing names) so the file uses modern annotations
consistent with Ruff UP035; ensure F = TypeVar("F", bound=Callable[..., Any])
still references the new Callable import.
except RuntimeError as e: | ||
# Skip profiling this function to avoid conflicts | ||
if "already a profiler running" in str(e): | ||
logger.debug( | ||
f"Skipping nested profiling for {name} " | ||
f"(parent profiler already active)" | ||
) | ||
return await func(*args, **kwargs) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Make nested-profiler detection resilient
String match is brittle. Compare case‑insensitively and match a broader substring.
Apply this diff:
- except RuntimeError as e:
+ except RuntimeError as e:
# Skip profiling this function to avoid conflicts
- if "already a profiler running" in str(e):
+ if "profiler" in str(e).lower() and "already" in str(e).lower():
logger.debug(
f"Skipping nested profiling for {name} "
f"(parent profiler already active)"
)
Repeat the same change in the sync wrapper block.
Also applies to: 100-108
🧰 Tools
🪛 Ruff (0.14.0)
70-71: Logging statement uses f-string
(G004)
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 66-74
(and repeat for the sync wrapper around lines 100-108), the nested-profiler
detection uses a brittle case-sensitive substring check; update it to perform a
case-insensitive containment check against a broader phrase (e.g. lowercasing
the exception text and testing for "already .*profiler" or simply "already a
profiler" using lowercased string containment) so it matches variants and
casing, then keep the existing debug log and return behavior when matched; apply
the same change to the sync wrapper block as well.
def extract_value(param_spec: str) -> str: | ||
"""Extract value from parameter, supporting dot notation for attributes.""" | ||
if not param_spec: | ||
return "" | ||
|
||
# Split on '.' to handle attribute access | ||
parts = param_spec.split(".") | ||
param_name = parts[0] | ||
|
||
# Find the parameter value | ||
value = None | ||
if param_name in kwargs: | ||
value = kwargs[param_name] | ||
elif param_name in param_names: | ||
idx = param_names.index(param_name) | ||
if idx < len(args): | ||
value = args[idx] | ||
|
||
if value is None: | ||
return "" | ||
|
||
# Navigate through attributes if dot notation was used | ||
for attr_name in parts[1:]: | ||
if hasattr(value, attr_name): | ||
value = getattr(value, attr_name) | ||
else: | ||
return "" | ||
|
||
return str(value) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Optional: support dict-style context extraction
If callers pass dict-like objects, walk keys as well as attributes.
Apply this diff:
- for attr_name in parts[1:]:
- if hasattr(value, attr_name):
- value = getattr(value, attr_name)
- else:
- return ""
+ for attr_name in parts[1:]:
+ if isinstance(value, dict) and attr_name in value:
+ value = value[attr_name]
+ elif hasattr(value, attr_name):
+ value = getattr(value, attr_name)
+ else:
+ return ""
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def extract_value(param_spec: str) -> str: | |
"""Extract value from parameter, supporting dot notation for attributes.""" | |
if not param_spec: | |
return "" | |
# Split on '.' to handle attribute access | |
parts = param_spec.split(".") | |
param_name = parts[0] | |
# Find the parameter value | |
value = None | |
if param_name in kwargs: | |
value = kwargs[param_name] | |
elif param_name in param_names: | |
idx = param_names.index(param_name) | |
if idx < len(args): | |
value = args[idx] | |
if value is None: | |
return "" | |
# Navigate through attributes if dot notation was used | |
for attr_name in parts[1:]: | |
if hasattr(value, attr_name): | |
value = getattr(value, attr_name) | |
else: | |
return "" | |
return str(value) | |
def extract_value(param_spec: str) -> str: | |
"""Extract value from parameter, supporting dot notation for attributes.""" | |
if not param_spec: | |
return "" | |
# Split on '.' to handle attribute access | |
parts = param_spec.split(".") | |
param_name = parts[0] | |
# Find the parameter value | |
value = None | |
if param_name in kwargs: | |
value = kwargs[param_name] | |
elif param_name in param_names: | |
idx = param_names.index(param_name) | |
if idx < len(args): | |
value = args[idx] | |
if value is None: | |
return "" | |
# Navigate through attributes if dot notation was used | |
- for attr_name in parts[1:]: | |
- if hasattr(value, attr_name): | |
- value = getattr(value, attr_name) | |
- else: | |
for attr_name in parts[1:]: | |
if isinstance(value, dict) and attr_name in value: | |
value = value[attr_name] | |
elif hasattr(value, attr_name): | |
value = getattr(value, attr_name) | |
else: | |
return "" | |
return str(value) |
def _is_profiling_enabled() -> bool: | ||
""" | ||
Check if profiling is enabled. | ||
TODO: Add `CLPConfig` mechanism to enable/disable profiling for each component. | ||
:return: False | ||
""" | ||
return False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Enable profiling via env and make interval tunable
Avoid code edits to toggle profiling. Read from env and allow interval override.
Apply this diff:
-PROFILING_INTERVAL = 0.001
+PROFILING_INTERVAL = float(os.getenv("CLP_PROFILING_INTERVAL", "0.005"))
@@
-def _is_profiling_enabled() -> bool:
+def _is_profiling_enabled() -> bool:
@@
- :return: False
+ :return: True if CLP_ENABLE_PROFILING is truthy, else False
@@
- return False
+ return os.getenv("CLP_ENABLE_PROFILING", "").lower() in {"1", "true", "yes", "on"}
Also applies to: 25-25
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 191-199
(and also apply same change at line 25), replace the hardcoded return False with
logic that reads an environment variable to enable profiling (e.g.
CLP_PROFILING_ENABLED or CLP_PROFILING) and returns True for common truthy
values ("1","true","yes") and False otherwise; also make the profiling interval
tunable by reading an environment variable (e.g. CLP_PROFILING_INTERVAL_SECONDS)
elsewhere where the interval is used (or add a helper _get_profiling_interval()
that reads the env and falls back to the current default), parsing ints safely
and using a sensible default on parse failure.
output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles" | ||
output_dir.mkdir(exist_ok=True, parents=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: handle missing CLP_LOGS_DIR to avoid crash when enabled
Path(os.getenv("CLP_LOGS_DIR")) raises a TypeError if the env var is unset. Provide a safe fallback and warn.
Apply this diff:
- output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
+ logs_dir = os.getenv("CLP_LOGS_DIR")
+ if not logs_dir:
+ logger.warning("CLP_LOGS_DIR is not set; writing profiles to CWD ./profiles")
+ output_dir = Path.cwd() / "profiles"
+ else:
+ output_dir = Path(logs_dir) / "profiles"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles" | |
output_dir.mkdir(exist_ok=True, parents=True) | |
logs_dir = os.getenv("CLP_LOGS_DIR") | |
if not logs_dir: | |
logger.warning("CLP_LOGS_DIR is not set; writing profiles to CWD ./profiles") | |
output_dir = Path.cwd() / "profiles" | |
else: | |
output_dir = Path(logs_dir) / "profiles" | |
output_dir.mkdir(exist_ok=True, parents=True) |
🤖 Prompt for AI Agents
components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 233-235:
currently Path(os.getenv("CLP_LOGS_DIR")) will raise a TypeError if the env var
is unset; instead read the env var into a variable, check if it's None or empty,
log a warning (use the module logger or warnings.warn) and choose a safe
fallback directory (e.g. tempfile.mkdtemp(prefix="clp_logs_") or
os.getcwd()/".clp_logs"), then construct a Path from that string and call
output_dir.mkdir(exist_ok=True, parents=True); ensure the code uses the
checked/fallback path variable rather than calling Path on a possibly None
value.
f.write(f"CLP Query Profiling Report (pyinstrument)\n") | ||
f.write(f"Section: {section_name}\n") | ||
if job_id: | ||
f.write(f"Job ID: {job_id}\n") | ||
if task_id: | ||
f.write(f"Task ID: {task_id}\n") | ||
f.write(f"Timestamp: {timestamp}\n") | ||
f.write("=" * 80 + "\n\n") | ||
f.write(profiler.output_text(unicode=True, color=False)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Logging/text nits: remove stray f, use logger.exception, timezone
- Remove the stray f-string with no placeholders.
- Prefer logger.exception over error(..., exc_info=True).
- Consider timezone-aware timestamps or
time.time()
for filenames.
Apply this diff:
- f.write(f"CLP Query Profiling Report (pyinstrument)\n")
+ f.write("CLP Query Profiling Report (pyinstrument)\n")
@@
- logger.info(
+ logger.info(
f"Profile saved: {section_name} "
f"(duration={duration:.6f}s, samples={sample_count}) "
f"HTML={html_path}, TXT={txt_path}"
)
@@
- except Exception as e:
- logger.error(f"Failed to save profile for {section_name}: {e}", exc_info=True)
+ except Exception:
+ logger.exception("Failed to save profile for %s", section_name)
Optionally:
- timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S_%f")
+ timestamp = datetime.datetime.now(datetime.timezone.utc).strftime("%Y%m%d_%H%M%S_%fZ")
Also applies to: 256-260, 263-263, 222-222
🧰 Tools
🪛 Ruff (0.14.0)
246-246: f-string without any placeholders
Remove extraneous f
prefix
(F541)
WorkerConfig, | ||
) | ||
from clp_py_utils.clp_logging import set_logging_level | ||
from clp_py_utils.profiling_utils import profile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Decorator usage looks correct; add explicit context mapping and env check
- Order is right (
@profile
wraps the function that Celery registers). - Consider passing
job_id_param="job_id", task_id_param="task_id"
for clarity. - Ensure CLP_LOGS_DIR is set in worker env; otherwise saving profiles will fail once enabled (see _save_profile).
Also applies to: 183-184
🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py
around lines 17 and 183-184, update the @profile decorator usage to pass
explicit context mapping by adding job_id_param="job_id" and
task_id_param="task_id" so the profiler knows which function args map to
job/task IDs, and add an environment check (e.g., verify CLP_LOGS_DIR is present
and non-empty in os.environ) before attempting to save profiles in _save_profile
to avoid failures when the worker env is missing CLP_LOGS_DIR.
WorkerConfig, | ||
) | ||
from clp_py_utils.clp_logging import set_logging_level | ||
from clp_py_utils.profiling_utils import profile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Good instrumentation; consider explicit job/task mapping and env readiness
- Order of decorators is correct.
- Optionally pass
job_id_param="job_id", task_id_param="task_id"
to be explicit. - Ensure CLP_LOGS_DIR exists in the search worker environment when profiling is enabled.
Also applies to: 169-170
🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
around line 15 (and also at lines 169-170), the profile decorator is used but
not explicit about which args map to job and task and the code assumes
CLP_LOGS_DIR exists; update the profile decorator calls to include
job_id_param="job_id", task_id_param="task_id" for clarity, and add an
environment-readiness check that ensures CLP_LOGS_DIR exists (create the
directory if missing or raise a clear error) before profiling is enabled so
profiling can write logs reliably.
) | ||
from clp_py_utils.core import read_yaml_config_file | ||
from clp_py_utils.decorators import exception_default_value | ||
from clp_py_utils.profiling_utils import profile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Scheduler profiling: sane targets; add guardrails for async sections and interval
- Targets are sensible;
job_id_param="job.id"
is correct. - For long‑lived async functions (
acquire_reducer_for_job
,handle_finished_search_job
), consider:- Using a coarser interval (e.g., 5–10ms) to reduce overhead/trace size.
- Making interval and enablement configurable via env to avoid code changes when toggling.
Also applies to: 547-555, 569-611, 898-902
🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
around lines 49 (and also apply similar changes at 547-555, 569-611, 898-902),
the profiler is applied with defaults that are too fine-grained for long‑lived
async sections; update profiling to use a coarser sampling interval (e.g.,
5–10ms) and add guardrails: make both profiling enablement and interval
configurable via environment variables (e.g., JOB_SCHEDULER_PROFILING_ENABLED,
JOB_SCHEDULER_PROFILE_INTERVAL_MS) and wrap or conditionally apply the profiler
only when enabled so async functions like acquire_reducer_for_job and
handle_finished_search_job use the higher interval (or are skipped) to reduce
overhead and trace size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not a formal review)
note we are migrating the build system from poetry to uv in #1405 so feel free to hold off on the dependency changes
mariadb = "~1.0.11" | ||
mysql-connector-python = "^8.2.0" | ||
pydantic = "^2.11.9" | ||
pyinstrument = "^5.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we plan to include the profiling feature in the formal releases? if not, maybe we should put this into a
[dependency-groups]
dev = [
...
]
that said, the pyinstrument
package is only ~270KB (https://pypi.org/project/pyinstrument/#files) so it shouldn't hurt to be bundled into the release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review either. Just catch some errors when taking a glance.
It would be better if we can add some instructions in the PR to indicate:
- How to turn on the profiler (seems like to tune the flag in the code level and rebuild the package)
- A step-by-step setup for testing, and what are the expected outputs?
|
||
|
||
def profile( | ||
section_name: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
section_name: Optional[str] = None, | |
section_name: str | None = None, |
Do it for all occurrences.
task_id_param: Optional[str] = None, | ||
) -> Callable[[F], F]: | ||
""" | ||
Decorator for profiling function execution with automatic context extraction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decorator
to Decorators
.
Do it for all docstrings.
:param section_name: Override for profile section name. If None, uses function name. | ||
:param job_id_param: Parameter name to extract job_id from (default: "job_id"). | ||
Can use dot notation for attributes, e.g., "job.id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WE don't indent to :
; we should only indent for 4 spaces. @junhaoliao can u confirm just in case I'm outdated?
|
||
return async_wrapper # type: ignore | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add else
after the previous if
block unconditionally returns.
:param job_id_param: Name/path of the parameter containing job_id (default: "job_id"). | ||
:param task_id_param: Name/path of the parameter containing task_id (default: "task_id"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not making the parameters to default these values instead of setting them optional?
""" | ||
Extract job_id and task_id from function arguments. | ||
:param func: The function being profiled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param func: The function being profiled | |
:param func: The function being profiled. |
Don't miss .
. Do it for all docstrings.
Check if profiling is enabled. | ||
TODO: Add `CLPConfig` mechanism to enable/disable profiling for each component. | ||
:return: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:return: False | |
:return: Whether the profiler is enabled. |
Description
This pull request introduces a new lightweight profiling utility for CLP query execution based on PyInstrument and applies it to certain query execution functions. The profiling decorator generates HTML flame graphs and text summaries for performance analysis, with context extraction for job and task IDs.
Currently, profiling can only be enabled/disabled by modifying the return value of
is_profiling_enabled
inprofiling_utils.py
. Per-component config options should be added in a future PR.profiling_utils.py
module inclp_py_utils
providing aprofile
decorator for both sync and async functions, automatic context extraction, and output of HTML and text profiling reports to a designated directory.pyinstrument
as a dependency inpyproject.toml
to support profiling features.@profile
decorator to somequery_scheduler.py
functions.@profile
decorator tosearch
andextract_stream
task functions.Checklist
breaking change.
Validation performed
Performed queries using the CLP package with profiling enabled and disabled.
Summary by CodeRabbit
New Features
Chores