Skip to content

Conversation

sitaowang1998
Copy link
Contributor

@sitaowang1998 sitaowang1998 commented Oct 14, 2025

Description

This PR adds pydantic annotated serializer for pathlib.Path and StrEnum to avoid adding custom model serialization for fields of these types.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Celery compression works.

Summary by CodeRabbit

  • New Features

    • Configuration enums now serialize as human-readable strings (storage, database, query engines, AWS auth type).
    • File-system paths in settings and outputs now serialize as plain strings.
    • Added pre-validation for AWS authentication to catch invalid inputs earlier.
  • Refactor

    • Unified path and enum serialization across public models for consistent external behaviour.
    • Removed bespoke primitive-dump methods in favour of standardized serialization.

junhaoliao and others added 23 commits September 29, 2025 22:46
…ost type and removing duplicated validators.
@sitaowang1998 sitaowang1998 requested a review from a team as a code owner October 14, 2025 15:39
Copy link
Contributor

coderabbitai bot commented Oct 14, 2025

Walkthrough

Refactors config models to use string-serialized enum aliases and path-as-string types (PathStr) via new Pydantic serializers. Removes multiple dump_to_primitive_dict methods, adds a pre-validation hook in AwsAuthentication, and updates public field annotations and serialization behavior across config-related models.

Changes

Cohort / File(s) Summary
Pydantic serialisation utilities
components/clp-py-utils/clp_py_utils/pydantic_serialization_utils.py
Added PlainSerializer-based helpers: StrEnumSerializer (serializes enums via .value) and PathStr (serializes pathlib.Path to str).
Config models: enum and path serialisation, validation hook
components/clp-py-utils/clp_py_utils/clp_config.py
Added string-serialized enum aliases: StorageEngineStr, DatabaseEngineStr, QueryEngineStr, AwsAuthTypeStr. Replaced public enum and path field types with these aliases and PathStr. Removed multiple dump_to_primitive_dict methods. Added @model_validator(mode="before") validate_authentication for AwsAuthentication. Updated signatures for Package, Database, AwsAuthentication, FsStorage, S3Storage, ArchiveOutput, StreamOutput, CLPConfig, and WorkerConfig.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Config as CLPConfig / Package / Models
  participant Auth as AwsAuthentication
  participant Pyd as Pydantic (validators)
  participant Ser as Serializers (StrEnumSerializer / PathStr)

  User->>Config: Instantiate models with enums and paths
  activate Pyd
  Pyd->>Auth: Run model validators
  note right of Auth #DDEBF7: validate_authentication (mode="before")
  Auth-->>Pyd: Normalized auth data
  deactivate Pyd

  User->>Config: Serialize models (e.g., .model_dump())
  activate Ser
  Ser-->>Config: Enums -> string via .value
  Ser-->>Config: Paths -> string via str(path)
  deactivate Ser
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely summarises the primary refactor by highlighting the introduction of an annotated serializer to replace custom serialization in the clp-package and directly relates to the main changes in the pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)

672-672: Fix broken enum comparison after type change.

After changing AwsAuthentication.type to AwsAuthTypeStr (an enum with string serialization), the comparison at Line 672 is now broken. The code compares AwsAuthType.profile.value (a string) with auth.type (an AwsAuthType enum object), which will always evaluate to False. This breaks the validation logic that checks whether profile authentication is used.

Apply this diff to fix the comparison:

-        if AwsAuthType.profile.value == auth.type:
+        if AwsAuthType.profile == auth.type:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77b122c and 3a57db7.

📒 Files selected for processing (2)
  • components/clp-py-utils/clp_py_utils/clp_config.py (9 hunks)
  • components/clp-py-utils/clp_py_utils/pydantic_serialization_utils.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-24.04)
🔇 Additional comments (8)
components/clp-py-utils/clp_py_utils/pydantic_serialization_utils.py (2)

6-6: LGTM!

The StrEnumSerializer correctly extracts the string value from enum instances for serialization, enabling clean enum-to-string conversion in Pydantic models.


8-8: LGTM!

The PathStr type correctly serializes pathlib.Path objects as strings, eliminating the need for custom serialization methods in models.

components/clp-py-utils/clp_py_utils/clp_config.py (6)

24-24: LGTM!

The import correctly brings in the new serialization utilities that enable cleaner, more declarative model serialization.


118-118: LGTM!

The string-serialized enum type aliases correctly wrap the base enum types with StrEnumSerializer, ensuring enums are serialized as their string values rather than enum objects.

Also applies to: 126-126, 135-135, 150-150


154-155: LGTM!

The field type updates correctly use the string-serialized enum aliases, maintaining enum type safety at runtime while enabling automatic string serialization.

Also applies to: 181-181, 369-369


373-398: LGTM!

The pre-validation hook correctly validates the authentication configuration, ensuring that required fields are set based on the authentication type and preventing invalid combinations.


415-415: LGTM!

The path fields correctly use PathStr to enable automatic string serialization, eliminating the need for custom serialization methods.

Also applies to: 430-430, 453-453, 457-457, 461-461, 465-465, 469-469


571-571: LGTM!

The CLPConfig path fields correctly use PathStr for automatic string serialization, including the private attributes.

Also applies to: 577-582

Comment on lines +1 to +8
import pathlib
from typing import Annotated

from pydantic import PlainSerializer

StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value)

PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding docstrings for public utilities.

Adding a module docstring and docstrings for StrEnumSerializer and PathStr would improve discoverability and maintainability. Consider also adding an __all__ export list to explicitly declare the public API.

Apply this diff to add documentation:

+"""
+Pydantic serialization utilities for common types.
+
+Provides annotated types and serializers to handle enum and path serialization
+without requiring custom model serialization methods.
+"""
+
 import pathlib
 from typing import Annotated
 
 from pydantic import PlainSerializer
 
+# Serializer that converts enum instances to their string values
 StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value)
 
+# Annotated type for pathlib.Path that serializes to string
 PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))]
+
+__all__ = ["StrEnumSerializer", "PathStr"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import pathlib
from typing import Annotated
from pydantic import PlainSerializer
StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value)
PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))]
"""
Pydantic serialization utilities for common types.
Provides annotated types and serializers to handle enum and path serialization
without requiring custom model serialization methods.
"""
import pathlib
from typing import Annotated
from pydantic import PlainSerializer
# Serializer that converts enum instances to their string values
StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value)
# Annotated type for pathlib.Path that serializes to string
PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))]
__all__ = ["StrEnumSerializer", "PathStr"]
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/pydantic_serialization_utils.py around
lines 1 to 8, add a module-level docstring describing the purpose (helpers to
create Pydantic PlainSerializers for common types), add concise
docstrings/comments for the StrEnumSerializer and PathStr explaining what they
serialize and expected input/output (e.g., StrEnumSerializer converts Enum
instances to their .value string; PathStr serializes pathlib.Path to str), and
add an __all__ list exporting ("StrEnumSerializer", "PathStr") so the public API
is explicit; keep wording short and place docstrings immediately above the
definitions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the file name, how about just file_utils? Though I understand the utils here likely are only compatible with Pydantic, the name "Pydantic" is still a bit implementation-specific which i feel can be omitted from the name.

Copy link
Member

@junhaoliao junhaoliao Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, can we also:

  • Only export the lambda functions in serialization_utils.py: serialize_path = lambda ...
  • Create the Pydantic types in clp_config.py (or move them into a separate types.py source file. maybe in a future refactoring PR) e.g.,
# in clp_config.py
PathStr = Annotated[
    pathlib.Path,
    PlainSerializer(serialize_path)
]

so that we can decouples the serialization utils from Pydantic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel TypeStr is not a very intuitive name. Is there a better suggestion?

Comment on lines +581 to +582
_container_image_id_path: PathStr = PrivateAttr(default=CLP_PACKAGE_CONTAINER_IMAGE_ID_PATH)
_version_file_path: PathStr = PrivateAttr(default=CLP_VERSION_FILE_PATH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just to show we have considered -)

Since these are PrivateAttr, they won't serialize by default anyway. The type change is harmless though.


StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value)

PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i'm not sure if this is worth doing but we shall investigate)

can we embed the validator _validate_directory with BeforeValidator() into the Annotation as well?
https://docs.pydantic.dev/latest/api/functional_validators/#pydantic.functional_validators.BeforeValidator

@sitaowang1998 sitaowang1998 changed the title refactor(cl-package): Use annoated serializer to avoid custom serialization. refactor(clp-package): Use annoated serializer to avoid custom serialization. Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants