feat(clp-package): Add Python profiling decorator for internal query performance measurement. #1423

Eden-D-Zhang · 2025-10-15T19:22:27Z

Description

This pull request introduces a new lightweight profiling utility for CLP query execution based on PyInstrument and applies it to certain query execution functions. The profiling decorator generates HTML flame graphs and text summaries for performance analysis, with context extraction for job and task IDs.

Currently, profiling can only be enabled/disabled by modifying the return value of is_profiling_enabled in profiling_utils.py. Per-component config options should be added in a future PR.

Added new profiling_utils.py module in clp_py_utils providing a profile decorator for both sync and async functions, automatic context extraction, and output of HTML and text profiling reports to a designated directory.
Added pyinstrument as a dependency in pyproject.toml to support profiling features.
Added @profile decorator to some query_scheduler.py functions.
Added @profile decorator to search and extract_stream task functions.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Performed queries using the CLP package with profiling enabled and disabled.

Summary by CodeRabbit

New Features
- Optional runtime profiling can be enabled to produce interactive HTML and plain-text summaries with timestamps and contextual job/task IDs. Works for synchronous and asynchronous operations and avoids conflicts when nested.
Chores
- Added profiling hooks across scheduler and task workflows; no behavioural or interface changes to existing functionality.

…nt_profile

…trument_profile

coderabbitai · 2025-10-15T19:22:55Z

Walkthrough

Adds a new profiling utility module using pyinstrument, wires it into selected scheduler and executor task functions via a decorator, and declares pyinstrument as a dependency. No existing function signatures change; instrumentation is additive and supports sync/async functions with optional job/task context extraction.

Changes

Cohort / File(s)	Summary
Profiling utilities module `components/clp-py-utils/clp_py_utils/profiling_utils.py`	New decorator-based profiling utility supporting sync/async functions, context extraction (`job_id`/`task_id`), nested-profiler guard, and saving HTML/TXT outputs to `CLP_LOGS_DIR/profiles`.
Dependency update `components/clp-py-utils/pyproject.toml`	Added dependency: `pyinstrument = "^5.1.1"`.
Executor task instrumentation `components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py`, `components/job-orchestration/job_orchestration/executor/query/fs_search_task.py`	Imported `profile` and decorated Celery tasks (`extract_stream`, `search`) with section names for profiling.
Scheduler instrumentation `components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py`	Imported `profile` and decorated `dispatch_query_job`, `acquire_reducer_for_job` (async), and `handle_finished_search_job` (async) with section names and `job_id_param="job.id"`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant Decorator as profile(...)
  participant Target as Wrapped Function
  participant Profiler as pyinstrument.Profiler
  participant Storage as CLP_LOGS_DIR/profiles

  Caller->>Decorator: invoke wrapped function(...)
  alt profiling disabled
    Decorator->>Target: call directly
    Target-->>Decorator: result/exception
  else profiling enabled
    Decorator->>Decorator: extract job_id/task_id (args, dot-paths)
    Decorator->>Profiler: start(interval)
    note over Profiler,Decorator: If parent profiler active: skip with debug
    Decorator->>Target: execute
    Target-->>Decorator: result/exception
    Decorator->>Profiler: stop()
    Decorator->>Storage: save HTML + TXT with section/job/task/timestamp
  end
  Decorator-->>Caller: return result or raise

sequenceDiagram
  autonumber
  participant Celery as Celery Worker
  participant Task as @profile + @app.task
  participant Profiler as pyinstrument.Profiler
  participant Storage as profiles/

  Celery->>Task: run search/extract_stream
  rect rgba(200,230,255,0.3)
    note left of Task: Profiling wrapper
    Task->>Profiler: start()
    Task->>Task: execute task logic
    Task->>Profiler: stop()
    Task->>Storage: write reports
  end
  Task-->>Celery: result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "Add Python profiling decorator for internal query performance measurement" directly and accurately reflects the primary changes in the changeset. The PR introduces a new profiling utilities module with a decorator mechanism (profiling_utils.py), adds pyinstrument as a dependency, and applies this decorator to key query execution functions. The title is specific, clear, and conveys the core purpose without vague terminology or misleading information. A teammate reviewing commit history would immediately understand that this PR adds profiling capabilities to the query execution pipeline.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc85f93 and b013429.

⛔ Files ignored due to path filters (1)

components/clp-py-utils/poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (1)

components/clp-py-utils/pyproject.toml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: package-image
GitHub Check: rust-checks (ubuntu-22.04)
GitHub Check: rust-checks (ubuntu-24.04)
GitHub Check: rust-checks (macos-15)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d164950 and cc85f93.

📒 Files selected for processing (5)

components/clp-py-utils/clp_py_utils/profiling_utils.py (1 hunks)
components/clp-py-utils/pyproject.toml (1 hunks)
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (2 hunks)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (2 hunks)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)

components/clp-py-utils/clp_py_utils/profiling_utils.py (1)

profile (28-119)

components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (1)

components/clp-py-utils/clp_py_utils/profiling_utils.py (1)

profile (28-119)

components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (1)

components/clp-py-utils/clp_py_utils/profiling_utils.py (1)

profile (28-119)

🪛 Ruff (0.14.0)

components/clp-py-utils/clp_py_utils/profiling_utils.py

17-17: Import from collections.abc instead: Callable

Import from collections.abc

(UP035)

17-17: typing.Tuple is deprecated, use tuple instead

(UP035)

54-54: Missing return type annotation for private function async_wrapper

(ANN202)

54-54: Missing type annotation for *args

(ANN002)

54-54: Missing type annotation for **kwargs

(ANN003)

70-71: Logging statement uses f-string

(G004)

88-88: Missing return type annotation for private function sync_wrapper

(ANN202)

88-88: Missing type annotation for *args

(ANN002)

88-88: Missing type annotation for **kwargs

(ANN003)

104-105: Logging statement uses f-string

(G004)

185-185: Do not catch blind exception: Exception

(BLE001)

186-186: Logging statement uses f-string

(G004)

216-216: Logging statement uses f-string

(G004)

222-222: datetime.datetime.now() called without a tz argument

(DTZ005)

246-246: f-string without any placeholders

Remove extraneous f prefix

(F541)

257-259: Logging statement uses f-string

(G004)

263-263: Logging .exception(...) should be used instead of .error(..., exc_info=True)

(G201)

263-263: Logging statement uses f-string

(G004)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: package-image
GitHub Check: lint-check (macos-15)
GitHub Check: rust-checks (macos-15)
GitHub Check: rust-checks (ubuntu-22.04)
GitHub Check: rust-checks (ubuntu-24.04)

🔇 Additional comments (1)

components/clp-py-utils/pyproject.toml (1)

19-19: Verify runtime support for pyinstrument and packaging impact

pyinstrument includes native components; confirm it installs cleanly in your worker/scheduler images and doesn’t bloat slim runtimes unnecessarily.

If profiling is rarely enabled, consider making it an optional extra to keep minimal installs lean.

coderabbitai · 2025-10-15T19:29:24Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+
+logger = logging.getLogger(__name__)
+
+F = TypeVar("F", bound=Callable[..., Any])
+


🧹 Nitpick | 🔵 Trivial

Type‑hints modernisation and stricter types (Ruff UP035, annotations)

Import Callable from collections.abc.

Prefer built‑in generic tuple[str, str] over typing.Tuple.

Apply this diff:

-from typing import Any, Callable, Optional, Tuple, TypeVar +from typing import Any, Optional, TypeVar +from collections.abc import Callable @@ -) -> Tuple[str, str]: +) -> tuple[str, str]:

Also applies to: 17-17, 128-128

🤖 Prompt for AI Agents

In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 17, 20-24 and 128, update type imports and tuple annotations: replace Callable imported from typing with Callable from collections.abc, remove typing.Tuple usage and use built-in generic tuple[str, str] where applicable, and adjust any related imports (drop unused typing names) so the file uses modern annotations consistent with Ruff UP035; ensure F = TypeVar("F", bound=Callable[..., Any]) still references the new Callable import.

coderabbitai · 2025-10-15T19:29:24Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+                except RuntimeError as e:
+                    # Skip profiling this function to avoid conflicts
+                    if "already a profiler running" in str(e):
+                        logger.debug(
+                            f"Skipping nested profiling for {name} "
+                            f"(parent profiler already active)"
+                        )
+                        return await func(*args, **kwargs)
+                    raise


🧹 Nitpick | 🔵 Trivial

Make nested-profiler detection resilient

String match is brittle. Compare case‑insensitively and match a broader substring.

Apply this diff:

- except RuntimeError as e: + except RuntimeError as e: # Skip profiling this function to avoid conflicts - if "already a profiler running" in str(e): + if "profiler" in str(e).lower() and "already" in str(e).lower(): logger.debug( f"Skipping nested profiling for {name} " f"(parent profiler already active)" )

Repeat the same change in the sync wrapper block.

Also applies to: 100-108

🧰 Tools

🪛 Ruff (0.14.0)

70-71: Logging statement uses f-string

(G004)

🤖 Prompt for AI Agents

In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 66-74 (and repeat for the sync wrapper around lines 100-108), the nested-profiler detection uses a brittle case-sensitive substring check; update it to perform a case-insensitive containment check against a broader phrase (e.g. lowercasing the exception text and testing for "already .*profiler" or simply "already a profiler" using lowercased string containment) so it matches variants and casing, then keep the existing debug log and return behavior when matched; apply the same change to the sync wrapper block as well.

coderabbitai · 2025-10-15T19:29:25Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+        def extract_value(param_spec: str) -> str:
+            """Extract value from parameter, supporting dot notation for attributes."""
+            if not param_spec:
+                return ""
+
+            # Split on '.' to handle attribute access
+            parts = param_spec.split(".")
+            param_name = parts[0]
+
+            # Find the parameter value
+            value = None
+            if param_name in kwargs:
+                value = kwargs[param_name]
+            elif param_name in param_names:
+                idx = param_names.index(param_name)
+                if idx < len(args):
+                    value = args[idx]
+
+            if value is None:
+                return ""
+
+            # Navigate through attributes if dot notation was used
+            for attr_name in parts[1:]:
+                if hasattr(value, attr_name):
+                    value = getattr(value, attr_name)
+                else:
+                    return ""
+
+            return str(value)
+


🧹 Nitpick | 🔵 Trivial

Optional: support dict-style context extraction

If callers pass dict-like objects, walk keys as well as attributes.

Apply this diff:

- for attr_name in parts[1:]: - if hasattr(value, attr_name): - value = getattr(value, attr_name) - else: - return "" + for attr_name in parts[1:]: + if isinstance(value, dict) and attr_name in value: + value = value[attr_name] + elif hasattr(value, attr_name): + value = getattr(value, attr_name) + else: + return ""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def extract_value(param_spec: str) -> str:

"""Extract value from parameter, supporting dot notation for attributes."""

if not param_spec:

return ""

# Split on '.' to handle attribute access

parts = param_spec.split(".")

param_name = parts[0]

# Find the parameter value

value = None

if param_name in kwargs:

value = kwargs[param_name]

elif param_name in param_names:

idx = param_names.index(param_name)

if idx < len(args):

value = args[idx]

if value is None:

return ""

# Navigate through attributes if dot notation was used

for attr_name in parts[1:]:

if hasattr(value, attr_name):

value = getattr(value, attr_name)

else:

return ""

return str(value)

def extract_value(param_spec: str) -> str:

"""Extract value from parameter, supporting dot notation for attributes."""

if not param_spec:

return ""

# Split on '.' to handle attribute access

parts = param_spec.split(".")

param_name = parts[0]

# Find the parameter value

value = None

if param_name in kwargs:

value = kwargs[param_name]

elif param_name in param_names:

idx = param_names.index(param_name)

if idx < len(args):

value = args[idx]

if value is None:

return ""

# Navigate through attributes if dot notation was used

- for attr_name in parts[1:]:

- if hasattr(value, attr_name):

- value = getattr(value, attr_name)

- else:

for attr_name in parts[1:]:

if isinstance(value, dict) and attr_name in value:

value = value[attr_name]

elif hasattr(value, attr_name):

value = getattr(value, attr_name)

else:

return ""

return str(value)

coderabbitai · 2025-10-15T19:29:25Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+def _is_profiling_enabled() -> bool:
+    """
+    Check if profiling is enabled.
+    TODO: Add `CLPConfig` mechanism to enable/disable profiling for each component.
+
+    :return: False
+    """
+    return False
+


🛠️ Refactor suggestion | 🟠 Major

Enable profiling via env and make interval tunable

Avoid code edits to toggle profiling. Read from env and allow interval override.

Apply this diff:

-PROFILING_INTERVAL = 0.001 +PROFILING_INTERVAL = float(os.getenv("CLP_PROFILING_INTERVAL", "0.005")) @@ -def _is_profiling_enabled() -> bool: +def _is_profiling_enabled() -> bool: @@ - :return: False + :return: True if CLP_ENABLE_PROFILING is truthy, else False @@ - return False + return os.getenv("CLP_ENABLE_PROFILING", "").lower() in {"1", "true", "yes", "on"}

Also applies to: 25-25

🤖 Prompt for AI Agents

In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 191-199 (and also apply same change at line 25), replace the hardcoded return False with logic that reads an environment variable to enable profiling (e.g. CLP_PROFILING_ENABLED or CLP_PROFILING) and returns True for common truthy values ("1","true","yes") and False otherwise; also make the profiling interval tunable by reading an environment variable (e.g. CLP_PROFILING_INTERVAL_SECONDS) elsewhere where the interval is used (or add a helper _get_profiling_interval() that reads the env and falls back to the current default), parsing ints safely and using a sensible default on parse failure.

coderabbitai · 2025-10-15T19:29:25Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+        output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
+        output_dir.mkdir(exist_ok=True, parents=True)
+


⚠️ Potential issue | 🔴 Critical

Critical: handle missing CLP_LOGS_DIR to avoid crash when enabled

Path(os.getenv("CLP_LOGS_DIR")) raises a TypeError if the env var is unset. Provide a safe fallback and warn.

Apply this diff:

- output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles" + logs_dir = os.getenv("CLP_LOGS_DIR") + if not logs_dir: + logger.warning("CLP_LOGS_DIR is not set; writing profiles to CWD ./profiles") + output_dir = Path.cwd() / "profiles" + else: + output_dir = Path(logs_dir) / "profiles"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"

output_dir.mkdir(exist_ok=True, parents=True)

logs_dir = os.getenv("CLP_LOGS_DIR")

if not logs_dir:

logger.warning("CLP_LOGS_DIR is not set; writing profiles to CWD ./profiles")

output_dir = Path.cwd() / "profiles"

else:

output_dir = Path(logs_dir) / "profiles"

output_dir.mkdir(exist_ok=True, parents=True)

🤖 Prompt for AI Agents

components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 233-235: currently Path(os.getenv("CLP_LOGS_DIR")) will raise a TypeError if the env var is unset; instead read the env var into a variable, check if it's None or empty, log a warning (use the module logger or warnings.warn) and choose a safe fallback directory (e.g. tempfile.mkdtemp(prefix="clp_logs_") or os.getcwd()/".clp_logs"), then construct a Path from that string and call output_dir.mkdir(exist_ok=True, parents=True); ensure the code uses the checked/fallback path variable rather than calling Path on a possibly None value.

coderabbitai · 2025-10-15T19:29:25Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+            f.write(f"CLP Query Profiling Report (pyinstrument)\n")
+            f.write(f"Section: {section_name}\n")
+            if job_id:
+                f.write(f"Job ID: {job_id}\n")
+            if task_id:
+                f.write(f"Task ID: {task_id}\n")
+            f.write(f"Timestamp: {timestamp}\n")
+            f.write("=" * 80 + "\n\n")
+            f.write(profiler.output_text(unicode=True, color=False))
+


🧹 Nitpick | 🔵 Trivial

Logging/text nits: remove stray f, use logger.exception, timezone

Remove the stray f-string with no placeholders.

Prefer logger.exception over error(..., exc_info=True).

Consider timezone-aware timestamps or time.time() for filenames.

Apply this diff:

- f.write(f"CLP Query Profiling Report (pyinstrument)\n") + f.write("CLP Query Profiling Report (pyinstrument)\n") @@ - logger.info( + logger.info( f"Profile saved: {section_name} " f"(duration={duration:.6f}s, samples={sample_count}) " f"HTML={html_path}, TXT={txt_path}" ) @@ - except Exception as e: - logger.error(f"Failed to save profile for {section_name}: {e}", exc_info=True) + except Exception: + logger.exception("Failed to save profile for %s", section_name)

Optionally:

- timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S_%f") + timestamp = datetime.datetime.now(datetime.timezone.utc).strftime("%Y%m%d_%H%M%S_%fZ")

Also applies to: 256-260, 263-263, 222-222

🧰 Tools

🪛 Ruff (0.14.0)

246-246: f-string without any placeholders

Remove extraneous f prefix

(F541)

coderabbitai · 2025-10-15T19:29:25Z

components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py

    WorkerConfig,
 )
 from clp_py_utils.clp_logging import set_logging_level
+from clp_py_utils.profiling_utils import profile


🧹 Nitpick | 🔵 Trivial

Decorator usage looks correct; add explicit context mapping and env check

Order is right (@profile wraps the function that Celery registers).

Consider passing job_id_param="job_id", task_id_param="task_id" for clarity.

Ensure CLP_LOGS_DIR is set in worker env; otherwise saving profiles will fail once enabled (see _save_profile).

Also applies to: 183-184

🤖 Prompt for AI Agents

In components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py around lines 17 and 183-184, update the @profile decorator usage to pass explicit context mapping by adding job_id_param="job_id" and task_id_param="task_id" so the profiler knows which function args map to job/task IDs, and add an environment check (e.g., verify CLP_LOGS_DIR is present and non-empty in os.environ) before attempting to save profiles in _save_profile to avoid failures when the worker env is missing CLP_LOGS_DIR.

coderabbitai · 2025-10-15T19:29:25Z

components/job-orchestration/job_orchestration/executor/query/fs_search_task.py

    WorkerConfig,
 )
 from clp_py_utils.clp_logging import set_logging_level
+from clp_py_utils.profiling_utils import profile


🧹 Nitpick | 🔵 Trivial

Good instrumentation; consider explicit job/task mapping and env readiness

Order of decorators is correct.

Optionally pass job_id_param="job_id", task_id_param="task_id" to be explicit.

Ensure CLP_LOGS_DIR exists in the search worker environment when profiling is enabled.

Also applies to: 169-170

🤖 Prompt for AI Agents

In components/job-orchestration/job_orchestration/executor/query/fs_search_task.py around line 15 (and also at lines 169-170), the profile decorator is used but not explicit about which args map to job and task and the code assumes CLP_LOGS_DIR exists; update the profile decorator calls to include job_id_param="job_id", task_id_param="task_id" for clarity, and add an environment-readiness check that ensures CLP_LOGS_DIR exists (create the directory if missing or raise a clear error) before profiling is enabled so profiling can write logs reliably.

coderabbitai · 2025-10-15T19:29:25Z

components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py

 )
 from clp_py_utils.core import read_yaml_config_file
 from clp_py_utils.decorators import exception_default_value
+from clp_py_utils.profiling_utils import profile


🧹 Nitpick | 🔵 Trivial

Scheduler profiling: sane targets; add guardrails for async sections and interval

Targets are sensible; job_id_param="job.id" is correct.

For long‑lived async functions (acquire_reducer_for_job, handle_finished_search_job), consider:

Using a coarser interval (e.g., 5–10ms) to reduce overhead/trace size.

Making interval and enablement configurable via env to avoid code changes when toggling.

Also applies to: 547-555, 569-611, 898-902

🤖 Prompt for AI Agents

In components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py around lines 49 (and also apply similar changes at 547-555, 569-611, 898-902), the profiler is applied with defaults that are too fine-grained for long‑lived async sections; update profiling to use a coarser sampling interval (e.g., 5–10ms) and add guardrails: make both profiling enablement and interval configurable via environment variables (e.g., JOB_SCHEDULER_PROFILING_ENABLED, JOB_SCHEDULER_PROFILE_INTERVAL_MS) and wrap or conditionally apply the profiler only when enabled so async functions like acquire_reducer_for_job and handle_finished_search_job use the higher interval (or are skipped) to reduce overhead and trace size.

junhaoliao

(not a formal review)

note we are migrating the build system from poetry to uv in #1405 so feel free to hold off on the dependency changes

components/clp-py-utils/pyproject.toml

junhaoliao · 2025-10-15T22:34:15Z

components/clp-py-utils/pyproject.toml

 mariadb = "~1.0.11"
 mysql-connector-python = "^8.2.0"
 pydantic = "^2.11.9"
+pyinstrument = "^5.0.0"


do we plan to include the profiling feature in the formal releases? if not, maybe we should put this into a

[dependency-groups] dev = [ ... ]

that said, the pyinstrument package is only ~270KB (https://pypi.org/project/pyinstrument/#files) so it shouldn't hurt to be bundled into the release.

LinZhihao-723

Not a full review either. Just catch some errors when taking a glance.
It would be better if we can add some instructions in the PR to indicate:

How to turn on the profiler (seems like to tune the flag in the code level and rebuild the package)
A step-by-step setup for testing, and what are the expected outputs?

LinZhihao-723 · 2025-10-16T17:21:52Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+
+
+def profile(
+    section_name: Optional[str] = None,


Suggested change

section_name: Optional[str] = None,

section_name: str | None = None,

Do it for all occurrences.

LinZhihao-723 · 2025-10-16T17:22:12Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+    task_id_param: Optional[str] = None,
+) -> Callable[[F], F]:
+    """
+    Decorator for profiling function execution with automatic context extraction.


Decorator to Decorators.
Do it for all docstrings.

LinZhihao-723 · 2025-10-16T17:23:06Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+
+    :param section_name: Override for profile section name. If None, uses function name.
+    :param job_id_param: Parameter name to extract job_id from (default: "job_id").
+                         Can use dot notation for attributes, e.g., "job.id"


WE don't indent to :; we should only indent for 4 spaces. @junhaoliao can u confirm just in case I'm outdated?

LinZhihao-723 · 2025-10-16T17:23:52Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+
+            return async_wrapper  # type: ignore
+
+        else:


Don't add else after the previous if block unconditionally returns.

LinZhihao-723 · 2025-10-16T17:24:46Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+    :param job_id_param: Name/path of the parameter containing job_id (default: "job_id").
+    :param task_id_param: Name/path of the parameter containing task_id (default: "task_id").


Why not making the parameters to default these values instead of setting them optional?

LinZhihao-723 · 2025-10-16T17:25:10Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+    """
+    Extract job_id and task_id from function arguments.
+
+    :param func: The function being profiled


Suggested change

:param func: The function being profiled

:param func: The function being profiled.

Don't miss .. Do it for all docstrings.

LinZhihao-723 · 2025-10-16T17:27:26Z

components/clp-py-utils/clp_py_utils/profiling_utils.py

+    Check if profiling is enabled.
+    TODO: Add `CLPConfig` mechanism to enable/disable profiling for each component.
+
+    :return: False


Suggested change

:return: False

:return: Whether the profiler is enabled.

Eden-D-Zhang added 9 commits October 14, 2025 17:22

Add python profiling utilities

752ab1a

Add profiling to query functions

29436b6

Remove CLP_ENABLE_PROFILING environment variable check

4d19c81

Lint

b5bdcf1

Merge branch 'main' of https://github.com/y-scope/clp into pyinstrume…

02b29fb

…nt_profile

Clean up docstrings

6f65035

Remove __all__

c2be1f6

Merge branch 'main' of https://github.com/Eden-D-Zhang/clp into pyins…

a677177

…trument_profile

Remove unnecessary function

cc85f93

Eden-D-Zhang requested a review from a team as a code owner October 15, 2025 19:22

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

junhaoliao requested changes Oct 15, 2025

View reviewed changes

Eden-D-Zhang and others added 2 commits October 16, 2025 13:13

Address review

9810d31

Merge branch 'main' into pyinstrument_profile

b013429

LinZhihao-723 requested changes Oct 16, 2025

View reviewed changes


		logger = logging.getLogger(__name__)

		F = TypeVar("F", bound=Callable[..., Any])

		output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
		output_dir.mkdir(exist_ok=True, parents=True)

-        output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
-        output_dir.mkdir(exist_ok=True, parents=True)
+        logs_dir = os.getenv("CLP_LOGS_DIR")
+        if not logs_dir:
+            logger.warning("CLP_LOGS_DIR is not set; writing profiles to CWD ./profiles")
+            output_dir = Path.cwd() / "profiles"
+        else:
+            output_dir = Path(logs_dir) / "profiles"
+        output_dir.mkdir(exist_ok=True, parents=True)

	section_name: Optional[str] = None,
	section_name: str \| None = None,

		:param job_id_param: Name/path of the parameter containing job_id (default: "job_id").
		:param task_id_param: Name/path of the parameter containing task_id (default: "task_id").

	:param func: The function being profiled
	:param func: The function being profiled.

feat(clp-package): Add Python profiling decorator for internal query performance measurement. #1423

Are you sure you want to change the base?

feat(clp-package): Add Python profiling decorator for internal query performance measurement. #1423

Conversation

Eden-D-Zhang commented Oct 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

junhaoliao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LinZhihao-723 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Eden-D-Zhang commented Oct 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 15, 2025 •

edited

Loading