Skip to content

SDK and CLI for CrateDB Cloud Cluster APIs #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from
Draft

Conversation

amotl
Copy link
Member

@amotl amotl commented Nov 14, 2023

About

Wraps a few calls of croud into a different kind of API, mostly about managing clusters. It is in the same spirit as, and is also building upon GH-73.

Documentation

https://cratedb-toolkit--81.org.readthedocs.build/cluster/
https://cratedb-toolkit--81.org.readthedocs.build/util/shell.html

Configuration

Configuration works by defining a few environment variables. For convenience, put them within a dotenv file (.env).

CRATEDB_CLUSTER_NAME=Hotzenplotz
CRATEDB_USERNAME='admin'
CRATEDB_PASSWORD='H3IgNXNvQBJM3CiElOiVHuSp6CjXMCiQYhB4I9dLccVHGvvvitPSYr1vTpt4'

Python SDK

Deploy/start/resume cluster, run database workload, and suspend cluster again.

from pprint import pprint
from cratedb_toolkit import ManagedCluster

# Acquire database cluster handle, obtaining cluster identifier
# or name from the user's environment.
cluster = ManagedCluster.from_env().start()

# Run database workload.
results = cluster.query("SELECT * from sys.summits LIMIT 2;")
pprint(results)

# Suspend cluster.
cluster.suspend()

See also examples/python/cloud_cluster.py and examples/python/cloud_import.py.

CLI interface

Deploy cluster, import data, and query a few samples worth of data.

ctk cluster start
ctk load table "https://cdn.crate.io/downloads/datasets/cratedb-datasets/machine-learning/timeseries/nab-machine-failure.csv"
ctk shell --command 'SELECT * FROM "nab-machine-failure" LIMIT 10;'

See also examples/shell/cloud_cluster.sh and examples/shell/cloud_import.sh.

References

Backlog

  • Software tests
  • Documentation

Trivia

Favorite poems

In the toolkit’s warren, new features abound,
With clusters managed, imports now sound.
CLI and shell, both smarter and spry,
Python and notebooks reach for the sky.
Docker skips gently if it’s not around,
While tests and examples in harmony are found.
Oh, what a hop for this code-loving bunny!
Data to the cloud—smooth, fast, and sunny!

A rabbit hopped to cloud and sky,
With toolkit new, it aimed to try:
"Start a cluster, import with flair,
Use Python, shell, or notebook there.
Settings checked, containers spun,
Tests and docs—oh what fun!
Cloud and code now leap as one! 🐇☁️

In the cloud and on the ground,
Rabbits hop where clusters are found.
With toolkit new and CLI bright,
They probe, deploy, and import with might.
From shell to Python, tests abound,
Data hops in—jobs resound!
🐇✨

🐇
A toolkit grows with clusters grand,
Cloud or local, at your command.
With settings checked and errors clear,
New guides and tests are gathered here.
Import, deploy, or query deep,
The rabbit’s promise: data you’ll keep!
Hopping forward, code refined,
Cloud and shell, now intertwined.

This comment was marked as outdated.

@amotl amotl force-pushed the amo/cloud-second branch 3 times, most recently from 89f8c3a to f794fd9 Compare November 15, 2023 12:19
@amotl amotl marked this pull request as ready for review November 16, 2023 13:14
Comment on lines 146 to 154
# FIXME: Fix `croud clusters deploy`.
# It yields *two* payloads to stdout, making it
# unusable in JSON-capturing situations.
# The main advantage of the `JSONDecoder` class is that it also provides
# a `.raw_decode` method, which will ignore extra data after the end of the JSON.
# https://stackoverflow.com/a/75168292
payload = wr.invoke()
decoder = json.JSONDecoder()
data = decoder.raw_decode(payload)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

This works around a minor flaw of croud, which yields a JSON output two times, rendering the output not parseable.

Suggestion

croud should be changed to only output a single JSON payload to stdout when invoking croud clusters deploy.

Copy link
Member Author

@amotl amotl Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We submitted an upstream fix about this obstacle.

Comment on lines 66 to 86
def fix_job_info_table_name(self):
"""
Adjust full-qualified table name by adding appropriate quotes.
Fixes a minor flaw on the upstream API.

Currently, the API returns `testdrive.pems-1`, but that can not be used at
all, because it is not properly quoted. It also can not be used 1:1, because
it is not properly quoted.

So, converge the table name into `"testdrive"."pems-1"` manually, for a
full-qualified representation.

FIXME: Remove after upstream has fixed the flaw.
"""
job_info = self.info
if "destination" in job_info and "table" in job_info["destination"]:
table = job_info["destination"]["table"]
if '"' not in table and "." in table:
schema, table = table.split(".")
table = f'"{schema}"."{table}"'
job_info["destination"]["table"] = table
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

This works around another minor upstream flaw.

Suggestion

The API should either return schema and table names within separate attributes (preferred), but also quote the value of the existing destination.table attribute so that it can be re-used without further ado.

Example

{
  "destination": {
    "table": "\"testdrive\".\"pems-1\""
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We created an upstream ticket about this observation.

This comment was marked as off-topic.

Comment on lines 67 to 97
def configure_croud(no_spinner: bool = None, use_spinner: bool = None):
"""
Turn off croud's Halo spinner when running in Jupyter Notebooks. It does not work well.

- https://github.com/ManrajGrover/halo/issues/32
- https://github.com/manrajgrover/halo/issues/179
"""
if no_spinner or ((CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST) and not use_spinner):
mod = types.ModuleType(
"croud.tools.spinner", "Mocking the croud.tools.spinner module, to turn off the Halo spinner"
)
setattr(mod, "HALO", NoopContextManager()) # noqa: B010
sys.modules["croud.tools.spinner"] = mod


class NoopContextManager:
"""
For making the Halo progressbar a no-op.
"""

def __init__(self, *args, **kwargs):
pass

def __enter__(self):
pass

def __exit__(self, exc_type, exc_value, exc_traceback):
pass

def stop(self):
pass
Copy link
Member Author

@amotl amotl Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

We had to turn off the Halo spinner currently used in croud, because it did not work well within a Jupyter Notebook environment. We are exactly observing those issues, despite the former being officially resolved. Apparently, it came back.

Suggestion

Submit a patch to croud to only use interactivity when is_tty() is True, or such. At least, don't start the HALO at module-scope level, but initialize/configure it at runtime instead.

Copy link
Member Author

@amotl amotl Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've reported this to the issue tracker of the upstream project.

@amotl amotl force-pushed the amo/cloud-second branch 3 times, most recently from 062ee9b to 7d89748 Compare November 18, 2023 01:56
Comment on lines +18 to +19
# Log in to CrateDB Cloud.
croud login --idp azuread
Copy link
Member Author

@amotl amotl Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

There is an alternative, headless way of doing that: Using croud config show, you can display the location of the croud configuration file in YAML format.

current-profile: cratedb.cloud
default-format: table
profiles:
  cratedb.cloud:
    auth-token: xxxxxxxxxx
    endpoint: https://console.cratedb.cloud
    key: REDACTED
    organization-id: null
    region: _any_
    secret: xxxxxxxxxx

If you fill in the key and secret values, obtained by running croud api-keys create, to create an API key 1, operations like croud clusters list will start working without further ado, even after logging out again using croud logout.

It works well. Thanks, @proddata.

Suggestion

Based on those insights, improve the SDK correspondingly, by also accepting environment variables CRATEDB_CLOUD_API_KEY and CRATEDB_CLOUD_API_SECRET.

Footnotes

  1. Alternatively, you can obtain an API key on your account page, at the "API keys" section. -- https://console.cratedb.cloud/account/settings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've implemented headless/unattended operations in croud and ctk. Thanks again.

@property
def headless_config(self) -> Configuration:
tmp_file = NamedTemporaryFile()
tmp_path = Path(tmp_file.name)
config = Configuration("headless.yaml", tmp_path)
config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY")
config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET")
config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID")
return config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the interface could be further improved by using the new JWT authentication mechanism instead of traditional credentials?

Copy link
Member Author

@amotl amotl Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the topic about using JWT to the backlog, to be processed on a later iteration.

Copy link
Member Author

@amotl amotl May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fluent authentication using provided per-cluster JWT token has been implemented per 72d3fd9.

amotl added 25 commits May 2, 2025 19:18
Various functional and usability refinements to support headless and
unattended operations on CrateDB Cloud clusters, now also including
support for resume and suspend operations.
... likewise, rename `--cratedb-http-url` to `--http-url`.
- `CRATEDB_CLOUD_CLUSTER_ID` to `CRATEDB_CLUSTER_ID`
- `CRATEDB_CLOUD_CLUSTER_NAME` to `CRATEDB_CLUSTER_NAME`
@amotl amotl force-pushed the amo/cloud-second branch from 69cce55 to 8e1a905 Compare May 2, 2025 18:48
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🔭 Outside diff range comments (1)
tests/io/test_import.py (1)

111-112: 🛠️ Refactor suggestion

Implement database verification for data integrity

Similar to the previous TODO, this test is missing validation that the data was actually imported correctly.

Consider adding:

# Read back data from database to verify import success
cloud_environment.database.run_sql("REFRESH TABLE basic;")
result = cloud_environment.database.run_sql("SELECT COUNT(*) FROM basic;")
assert result > [(0,)]  # Ensure table has data
♻️ Duplicate comments (6)
cratedb_toolkit/cfr/cli.py (1)

112-113: Updated database address retrieval for consistency.

The code now uses the standardized variable names (http_url and sqlalchemy_url) for retrieving the database address information, maintaining backward compatibility and improving consistency across the codebase.

Also applies to: 141-142, 164-165, 181-182, 207-208

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 112-112: cratedb_toolkit/cfr/cli.py#L112
Added line #L112 was not covered by tests

cratedb_toolkit/adapter/rockset/cli.py (1)

9-9: Option name standardization improves consistency across the codebase.

Refactoring the CLI options to use the standardized option_sqlalchemy_url decorator and consistent parameter names improves maintainability and user experience. This change aligns with the broader effort to unify cluster addressing options throughout the toolkit.

Also applies to: 29-29, 34-34, 38-38, 41-41

tests/conftest.py (1)

168-169: Consider improving the skip message with specific missing variables

When tests are skipped due to missing environment variables, it would be helpful to specify which ones are missing.

-    if any(setting is None for setting in settings.values()):
-        raise pytest.skip("Missing environment variables for headless mode with croud")
+    missing_vars = [k for k, v in settings.items() if v is None]
+    if missing_vars:
+        raise pytest.skip(f"Missing environment variables for headless mode with croud: {', '.join(missing_vars)}")
tests/io/test_import.py (1)

57-63: 🛠️ Refactor suggestion

Convert non-deterministic skipif to explicit xfail for better test reliability

This skip condition depends on an environment variable and makes test execution non-deterministic. As mentioned in a previous review (which I see was acknowledged but not yet addressed), this approach can mask real regressions.

-@pytest.mark.skipif(
-    os.getenv("FORCE", "no") != "yes",
-    reason="Only works when invoked exclusively, using "
-    "`FORCE=yes pytest --no-cov tests/io/test_import.py -k managed_csv_local`. "
-    "Otherwise croaks per `AssertionError: ERROR: The following arguments are required: --url`. "
-    "We don't know why.",
-)
+@pytest.mark.xfail(
+    reason="ctk CLI currently requires an explicit --url argument",
+    strict=True,
+)

This would surface failures instead of silently skipping them, making test behaviors more consistent and failures more visible.

cratedb_toolkit/io/croud.py (1)

158-160: Avoid mutating TableAddress in-place & use central quoting helper

The code is still directly modifying the target.table property in-place, which was flagged in a previous review. This can lead to unexpected behavior if the TableAddress object is used elsewhere after this operation.

-if target.schema is not None:
-    target.table = f'"{target.schema}"."{target.table}"'
+if target.schema is not None:
+    # Create a local variable instead of modifying target
+    quoted_table = table_fqn(f"{target.schema}.{target.table}")
+else:
+    quoted_table = table_fqn(target.table)
+
+# Use the quoted name in the API call but keep target intact
+return CloudJob(info=self.cluster.create_import_job(
+    resource=resource,
+    target=dataclasses.replace(target, table=quoted_table)))
cratedb_toolkit/cluster/core.py (1)

246-275: 🛠️ Refactor suggestion

acquire() returns immediately after deploy but cluster may not be ready.

The acquire() method calls deploy() and then probe() but doesn't implement any retry logic or waiting for the cluster to fully start. This could cause downstream code to fail if it tries to interact with a cluster that's not yet ready.

Add polling with timeout to ensure the cluster is fully operational:

     def acquire(self) -> "ManagedCluster":
         """
         Acquire a database cluster.
         This means going through the steps of deploy and/or start, as applicable.
 
         - When cluster does not exist, create/deploy it.
         - When a cluster exists, but is stopped/hibernated, start/resume it.
         """
         self.probe()
         if self.exists:
             suspended = self.info.cloud.get("suspended")
             if suspended is True:
                 logger.info(f"Cluster is suspended, resuming it: id={self.cluster_id}, name={self.cluster_name}")
                 self.resume()
+                self._wait_for_cluster_ready()
             elif suspended is False:
                 logger.info(f"Cluster is running: id={self.cluster_id}, name={self.cluster_name}")
             else:
                 raise CroudException(f"Cluster in unknown state: {self.cluster_name}")
         else:
             logger.info(f"Cluster does not exist, deploying it: id={self.cluster_id}, name={self.cluster_name}")
             self.deploy()
             logger.info(f"Cluster deployed: id={self.cluster_id}, name={self.cluster_name}")
             self.probe()
+            self._wait_for_cluster_ready()
             if not self.exists:
                 # TODO: Gather and propagate more information why the deployment failed.
                 #       Capture deployment logs or status if available.
                 #       Possibly use details from `last_async_operation`.
                 raise CroudException(f"Deployment of cluster failed: {self.cluster_name}")
         return self
+        
+    def _wait_for_cluster_ready(self, timeout: int = 300, interval: int = 5) -> bool:
+        """
+        Wait for cluster to be fully operational with exponential backoff.
+        
+        Args:
+            timeout: Maximum time to wait in seconds
+            interval: Initial polling interval in seconds
+            
+        Returns:
+            bool: True if cluster is ready, False if timed out
+        """
+        start_time = time.time()
+        current_interval = interval
+        
+        while time.time() - start_time < timeout:
+            try:
+                # Test if cluster is ready by running a simple query
+                result = self.query("SELECT 1")
+                if result == [{"1": 1}]:
+                    logger.info(f"Cluster is ready: id={self.cluster_id}, name={self.cluster_name}")
+                    return True
+            except Exception as ex:
+                logger.debug(f"Cluster not yet ready: {ex}")
+                
+            # Sleep with exponential backoff (capped at 30 seconds)
+            time.sleep(min(current_interval, 30))
+            current_interval *= 1.5
+            
+        logger.warning(f"Timed out waiting for cluster to be ready: id={self.cluster_id}, name={self.cluster_name}")
+        return False
🧹 Nitpick comments (14)
doc/cluster/_address.md (1)

6-9: Consider adding examples for each addressing option.

While the documentation is clear, adding brief examples showing how to use each option would further improve usability. For example, showing sample command lines and environment variable assignments would help users adopt these options quickly.

doc/cluster/python.md (2)

40-42: Consider using proper MyST directive syntax for definition lists

The environment variables definition appears to use a custom format with colons. For better compatibility and rendering, consider using MyST's standard definition list syntax:

-:Environment variables:
-  `CRATEDB_CLUSTER_ID`, `CRATEDB_CLUSTER_NAME`, `CRATEDB_SQLALCHEMY_URL`, `CRATEDB_HTTP_URL`
+**Environment variables:**
+: `CRATEDB_CLUSTER_ID`, `CRATEDB_CLUSTER_NAME`, `CRATEDB_SQLALCHEMY_URL`, `CRATEDB_HTTP_URL`
🧰 Tools
🪛 LanguageTool

[uncategorized] ~40-~40: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...

(UNLIKELY_OPENING_PUNCTUATION)


53-60: Consider adding more comprehensive Python example

While the current example demonstrates basic usage, consider expanding it to show more real-world scenarios:

 from pprint import pprint
 from cratedb_toolkit import ManagedCluster
 
 # Connect to CrateDB Cloud and run the database workload.
 with ManagedCluster.from_env() as cluster:
     pprint(cluster.query("SELECT * from sys.summits LIMIT 2;"))
+
+# Alternative: explicitly specify cluster by name
+cluster = ManagedCluster(name="my-cluster-name")
+cluster.start()
+try:
+    # Run some database operations
+    results = cluster.query("SELECT count(*) FROM sys.nodes;")
+    print(f"Number of nodes: {results[0][0]}")
+finally:
+    # Explicitly suspend the cluster when done
+    cluster.suspend()
doc/util/shell.md (2)

50-51: Consider adding example with parameter quoting

For SQL commands with special characters or spaces, users might need to handle shell quoting. Consider adding an example with proper quoting:

 ```shell
 ctk shell --cluster-name hotzenplotz --command "SELECT * from sys.summits LIMIT 2;"
+```
+```shell
+# Example with complex query requiring careful quoting
+ctk shell --cluster-name hotzenplotz --command "SELECT name, height FROM sys.summits WHERE name LIKE '%Alps%' ORDER BY height DESC LIMIT 5;"

34-47: Duplicated authentication guidance

This authentication section appears to be identical to the one in doc/cluster/python.md. Consider extracting this to a shared include file to maintain consistency and reduce duplication.

# Create an authentication include file (e.g., ../cluster/_authentication.md)
# Then in both files use:
:::{include} ../cluster/_authentication.md
:::
tests/conftest.py (2)

98-98: Consider implementing the TODO synchronization

The TODO comment suggests synchronizing reset_environment with prune_environment, but doesn't provide specifics. Consider either implementing this synchronization or providing more detail about the intended approach.

A potential implementation would be to extract the common functionality into a helper function that both fixtures can use.


59-62: Use a constant for the container image name

The hardcoded "crate/crate:nightly" value should be extracted to a constant at the module level for better maintainability. This would also support any future implementation of the TODO on line 59.

+# Default CrateDB container image to use for tests
+CRATEDB_TEST_IMAGE = "crate/crate:nightly"
+
 class CrateDBFixture(PytestTestcontainerAdapter):
     # ...
     def setup(self):
         from cratedb_toolkit.testing.testcontainers.cratedb import CrateDBContainer

         # TODO: Make image name configurable.
-        self.container = CrateDBContainer("crate/crate:nightly")
+        self.container = CrateDBContainer(CRATEDB_TEST_IMAGE)
doc/cluster/backlog.md (1)

29-31: Consider rewording the Shell section for clarity.

The sentence structure is slightly awkward with the comma before "because".

-  Shell: Currently, because parameters like `username` and `password` are obtained from
+  Shell: Currently parameters like `username` and `password` are obtained from
🧰 Tools
🪛 LanguageTool

[formatting] ~29-~29: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...rt_managed_csv_local- Shell: Currently, because parameters likeusernameandpasswor...

(COMMA_BEFORE_BECAUSE)

cratedb_toolkit/util/setting.py (1)

93-104: Enhance error message in check_mutual_exclusiveness

The guidance string could be difficult to read if there are many parameters. Consider formatting them as bullet points or adding line breaks.

-    guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}"
+    cli_options = ", ".join(parameter_names)
+    env_options = ", ".join(environment_variables)
+    guidance = f"Use one of:\n- CLI arguments: {cli_options}\n- Environment variables: {env_options}"
cratedb_toolkit/cluster/cli.py (2)

112-118: Add missing documentation for stop command

The ManagedCluster(...).probe().stop() call is missing a comment explaining what the probe() method does before stop().

    with handle_command_errors("stop cluster"):
        # Acquire the database cluster handle and submit the `suspend` command.
-        cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).probe().stop()
+        # The probe() method validates the cluster exists before attempting to stop it
+        cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).probe().stop()
        logger.info(f"Successfully suspended cluster: {cluster}")
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 112-112: cratedb_toolkit/cluster/cli.py#L112
Added line #L112 was not covered by tests


[warning] 114-115: cratedb_toolkit/cluster/cli.py#L114-L115
Added lines #L114 - L115 were not covered by tests


[warning] 118-118: cratedb_toolkit/cluster/cli.py#L118
Added line #L118 was not covered by tests


138-151: Consider consistent yaml/json handling library usage

For the YAML output, you're using yaml.dump() but for JSON you're using json.dumps() with explicit indentation. Consider using a helper function to maintain consistent formatting.

-        if output_format == "json":
-            print(json.dumps(data, indent=2), file=sys.stdout)  # noqa: T201
-        elif output_format == "yaml":
-            print(yaml.dump(data), file=sys.stdout)  # noqa: T201
-        else:
-            raise ValueError(f"Unknown output format: {output_format}")
+        def format_output(data, fmt):
+            if fmt == "json":
+                return json.dumps(data, indent=2)
+            elif fmt == "yaml":
+                return yaml.dump(data)
+            else:
+                raise ValueError(f"Unknown output format: {fmt}")
+                
+        print(format_output(data, output_format), file=sys.stdout)  # noqa: T201
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 138-138: cratedb_toolkit/cluster/cli.py#L138
Added line #L138 was not covered by tests


[warning] 140-143: cratedb_toolkit/cluster/cli.py#L140-L143
Added lines #L140 - L143 were not covered by tests


[warning] 146-149: cratedb_toolkit/cluster/cli.py#L146-L149
Added lines #L146 - L149 were not covered by tests


[warning] 151-151: cratedb_toolkit/cluster/cli.py#L151
Added line #L151 was not covered by tests

cratedb_toolkit/config.py (1)

30-32: Plan for implementing the TODO comment

There's a TODO comment to provide CONFIG as a context manager. This would be a valuable improvement for temporarily configuring the environment.

Would you like me to suggest an implementation for the context manager approach mentioned in the TODO comment?

cratedb_toolkit/cluster/croud.py (2)

15-16: TODO marked as potential improvement for version handling strategy.

The TODO comment suggests a potential shift to using latest or nightly as the default CrateDB version instead of a fixed version. This would ensure users always get the most recent features, but could introduce instability.

Would you like guidance on implementing a configuration system that allows users to choose between stable, latest, or nightly versions with appropriate warnings?


354-355: URL detection can be further improved.

The current URL detection relies on checking for "://" in the URL and then Path.exists() which may fail for some valid cloud storage URLs or network paths.

Consider enhancing the URL detection to handle more edge cases:

-        is_remote = "://" in resource.url
-        if not is_remote and Path(resource.url).exists():
+        # Check for known remote URL schemes or patterns
+        is_remote = ("://" in resource.url and 
+                   any(resource.url.startswith(prefix) for prefix in 
+                       ['http://', 'https://', 's3://', 'gs://', 'azure://']))
+        
+        # If it doesn't look like a remote URL, check if it's a local file
+        if not is_remote:
+            path = Path(resource.url)
+            if not path.exists():
+                logger.warning(f"File not found: {resource.url}")
+                if "://" in resource.url:
+                    # Might be an uncommon URL scheme, treat as remote
+                    is_remote = True
+                else:
+                    raise ValueError(f"Resource not found: {resource.url}")
+            
+        if not is_remote:  # It's a confirmed local file
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69cce55 and 8e1a905.

📒 Files selected for processing (93)
  • .github/workflows/main.yml (1 hunks)
  • .github/workflows/mongodb.yml (1 hunks)
  • CHANGES.md (1 hunks)
  • cratedb_toolkit/__init__.py (2 hunks)
  • cratedb_toolkit/adapter/rockset/cli.py (2 hunks)
  • cratedb_toolkit/adapter/rockset/server/dependencies.py (1 hunks)
  • cratedb_toolkit/api/cli.py (0 hunks)
  • cratedb_toolkit/api/guide.py (0 hunks)
  • cratedb_toolkit/api/main.py (0 hunks)
  • cratedb_toolkit/cfr/cli.py (8 hunks)
  • cratedb_toolkit/cfr/marimo.py (1 hunks)
  • cratedb_toolkit/cli.py (2 hunks)
  • cratedb_toolkit/cluster/cli.py (2 hunks)
  • cratedb_toolkit/cluster/core.py (1 hunks)
  • cratedb_toolkit/cluster/croud.py (1 hunks)
  • cratedb_toolkit/cluster/guide.py (1 hunks)
  • cratedb_toolkit/cluster/model.py (1 hunks)
  • cratedb_toolkit/cluster/util.py (0 hunks)
  • cratedb_toolkit/cmd/tail/cli.py (3 hunks)
  • cratedb_toolkit/config.py (1 hunks)
  • cratedb_toolkit/exception.py (2 hunks)
  • cratedb_toolkit/info/cli.py (3 hunks)
  • cratedb_toolkit/info/http.py (1 hunks)
  • cratedb_toolkit/io/cli.py (4 hunks)
  • cratedb_toolkit/io/croud.py (4 hunks)
  • cratedb_toolkit/io/mongodb/adapter.py (3 hunks)
  • cratedb_toolkit/io/mongodb/api.py (1 hunks)
  • cratedb_toolkit/job/cli.py (0 hunks)
  • cratedb_toolkit/job/croud.py (0 hunks)
  • cratedb_toolkit/model.py (3 hunks)
  • cratedb_toolkit/option.py (1 hunks)
  • cratedb_toolkit/options.py (0 hunks)
  • cratedb_toolkit/settings/compare.py (4 hunks)
  • cratedb_toolkit/shell/cli.py (3 hunks)
  • cratedb_toolkit/testing/testcontainers/cratedb.py (5 hunks)
  • cratedb_toolkit/testing/testcontainers/influxdb2.py (2 hunks)
  • cratedb_toolkit/testing/testcontainers/minio.py (2 hunks)
  • cratedb_toolkit/testing/testcontainers/mongodb.py (2 hunks)
  • cratedb_toolkit/testing/testcontainers/util.py (4 hunks)
  • cratedb_toolkit/util/app.py (1 hunks)
  • cratedb_toolkit/util/client.py (1 hunks)
  • cratedb_toolkit/util/common.py (2 hunks)
  • cratedb_toolkit/util/crash.py (2 hunks)
  • cratedb_toolkit/util/croud.py (8 hunks)
  • cratedb_toolkit/util/database.py (2 hunks)
  • cratedb_toolkit/util/runtime.py (1 hunks)
  • cratedb_toolkit/util/setting.py (1 hunks)
  • doc/backlog/main.md (2 hunks)
  • doc/bugs.md (1 hunks)
  • doc/cfr/systable.md (1 hunks)
  • doc/cluster/_address.md (1 hunks)
  • doc/cluster/backlog.md (1 hunks)
  • doc/cluster/cli.md (1 hunks)
  • doc/cluster/index.md (1 hunks)
  • doc/cluster/python.md (1 hunks)
  • doc/cluster/tutorial.md (1 hunks)
  • doc/conf.py (1 hunks)
  • doc/index.md (1 hunks)
  • doc/info/index.md (1 hunks)
  • doc/io/index.md (2 hunks)
  • doc/io/influxdb/loader.md (1 hunks)
  • doc/io/mongodb/loader.md (2 hunks)
  • doc/sandbox.md (2 hunks)
  • doc/util/index.md (1 hunks)
  • doc/util/settings.md (0 hunks)
  • doc/util/shell.md (1 hunks)
  • examples/cloud_import.py (0 hunks)
  • examples/notebook/cloud_import.ipynb (1 hunks)
  • examples/python/cloud_cluster.py (1 hunks)
  • examples/python/cloud_import.py (1 hunks)
  • examples/shell/cloud_cluster.sh (1 hunks)
  • examples/shell/cloud_import.sh (1 hunks)
  • pyproject.toml (8 hunks)
  • tests/api/test_util.py (1 hunks)
  • tests/cluster/test_cli.py (1 hunks)
  • tests/cluster/test_core.py (1 hunks)
  • tests/cluster/test_guide.py (1 hunks)
  • tests/cluster/test_model.py (1 hunks)
  • tests/cmd/test_tail.py (1 hunks)
  • tests/conftest.py (3 hunks)
  • tests/examples/test_python.py (1 hunks)
  • tests/examples/test_shell.py (1 hunks)
  • tests/examples/test_shell.sh (1 hunks)
  • tests/io/dynamodb/__init__.py (0 hunks)
  • tests/io/dynamodb/conftest.py (1 hunks)
  • tests/io/influxdb/conftest.py (4 hunks)
  • tests/io/mongodb/conftest.py (3 hunks)
  • tests/io/test_import.py (6 hunks)
  • tests/retention/test_cli.py (2 hunks)
  • tests/retention/test_examples.py (2 hunks)
  • tests/shell/test_cli.py (1 hunks)
  • tests/test_shell.py (0 hunks)
  • tests/util/shunit2 (1 hunks)
💤 Files with no reviewable changes (11)
  • tests/io/dynamodb/init.py
  • doc/util/settings.md
  • cratedb_toolkit/api/cli.py
  • cratedb_toolkit/cluster/util.py
  • cratedb_toolkit/api/guide.py
  • cratedb_toolkit/job/cli.py
  • cratedb_toolkit/options.py
  • cratedb_toolkit/job/croud.py
  • examples/cloud_import.py
  • tests/test_shell.py
  • cratedb_toolkit/api/main.py
✅ Files skipped from review due to trivial changes (6)
  • doc/bugs.md
  • examples/shell/cloud_cluster.sh
  • cratedb_toolkit/info/http.py
  • examples/python/cloud_cluster.py
  • examples/shell/cloud_import.sh
  • tests/util/shunit2
🚧 Files skipped from review as they are similar to previous changes (54)
  • doc/io/influxdb/loader.md
  • doc/index.md
  • doc/conf.py
  • doc/cfr/systable.md
  • cratedb_toolkit/io/mongodb/api.py
  • tests/retention/test_examples.py
  • doc/info/index.md
  • tests/io/dynamodb/conftest.py
  • doc/util/index.md
  • .github/workflows/mongodb.yml
  • tests/cluster/test_model.py
  • CHANGES.md
  • tests/cmd/test_tail.py
  • cratedb_toolkit/info/cli.py
  • cratedb_toolkit/util/common.py
  • tests/io/mongodb/conftest.py
  • tests/examples/test_shell.sh
  • cratedb_toolkit/io/mongodb/adapter.py
  • doc/cluster/index.md
  • doc/cluster/tutorial.md
  • cratedb_toolkit/init.py
  • cratedb_toolkit/util/runtime.py
  • doc/sandbox.md
  • tests/examples/test_shell.py
  • cratedb_toolkit/cmd/tail/cli.py
  • cratedb_toolkit/testing/testcontainers/influxdb2.py
  • tests/io/influxdb/conftest.py
  • examples/notebook/cloud_import.ipynb
  • cratedb_toolkit/testing/testcontainers/mongodb.py
  • tests/retention/test_cli.py
  • .github/workflows/main.yml
  • cratedb_toolkit/model.py
  • cratedb_toolkit/testing/testcontainers/minio.py
  • cratedb_toolkit/util/database.py
  • cratedb_toolkit/util/client.py
  • cratedb_toolkit/option.py
  • cratedb_toolkit/settings/compare.py
  • examples/python/cloud_import.py
  • cratedb_toolkit/shell/cli.py
  • cratedb_toolkit/exception.py
  • tests/cluster/test_core.py
  • tests/cluster/test_cli.py
  • tests/api/test_util.py
  • doc/io/mongodb/loader.md
  • pyproject.toml
  • doc/cluster/cli.md
  • tests/cluster/test_guide.py
  • cratedb_toolkit/util/crash.py
  • tests/shell/test_cli.py
  • cratedb_toolkit/util/app.py
  • tests/examples/test_python.py
  • cratedb_toolkit/cluster/guide.py
  • cratedb_toolkit/testing/testcontainers/cratedb.py
  • cratedb_toolkit/cluster/model.py
🧰 Additional context used
🧠 Learnings (4)
tests/conftest.py (2)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:55-62
Timestamp: 2025-04-26T17:45:20.650Z
Learning: In the CrateDB toolkit, the container start operation is idempotent, meaning it can be safely called multiple times without causing issues. This allows for flexible container lifecycle management where both `setup()` may call `container.start()` and parent classes like `PytestTestcontainerAdapter` may also invoke `start()` methods.
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:168-170
Timestamp: 2025-05-02T15:26:10.385Z
Learning: In the `cloud_environment` fixture in tests/conftest.py, environment variables like TEST_CRATEDB_CLOUD_API_KEY, TEST_CRATEDB_CLOUD_API_SECRET, TEST_CRATEDB_CLOUD_ORGANIZATION_ID, and TEST_CRATEDB_CLUSTER_NAME are partially mutually exclusive, meaning not all of them need to be present simultaneously for tests to work.
cratedb_toolkit/cluster/cli.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cluster/cli.py:82-89
Timestamp: 2025-04-27T03:07:04.375Z
Learning: In the cratedb-toolkit codebase, the `handle_command_errors` context manager is designed as an error handler for specific operations (not as a general protective wrapper). It's intentionally used to handle errors for targeted operations rather than wrapping entire function executions.
cratedb_toolkit/cluster/core.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/api/main.py:263-265
Timestamp: 2025-04-26T21:59:59.463Z
Learning: When handling DNS propagation delays after resource creation (like CrateDB Cloud clusters), both fixed sleeps and immediate polling have drawbacks. Fixed sleeps are brittle, but immediate polling risks negative DNS caching. A hybrid approach with a short initial sleep followed by polling with exponential backoff is preferred.
cratedb_toolkit/cfr/marimo.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cfr/marimo.py:53-54
Timestamp: 2025-05-02T11:43:42.639Z
Learning: The submodules within `cratedb_toolkit/io` employ a different naming scheme than other parts of the codebase and retain their own conventions (e.g., may continue using `cratedb_sqlalchemy_url` instead of `sqlalchemy_url`).
🧬 Code Graph Analysis (3)
cratedb_toolkit/cli.py (1)
cratedb_toolkit/util/setting.py (1)
  • init_dotenv (30-37)
tests/conftest.py (5)
cratedb_toolkit/cluster/core.py (2)
  • ManagedClusterSettings (38-101)
  • start (233-244)
cratedb_toolkit/testing/testcontainers/cratedb.py (6)
  • CrateDBTestAdapter (165-222)
  • CrateDBContainer (31-162)
  • start (176-182)
  • get_connection_url (134-151)
  • get_connection_url (201-207)
  • reset (191-199)
cratedb_toolkit/testing/testcontainers/util.py (4)
  • PytestTestcontainerAdapter (164-206)
  • setup (182-183)
  • start (79-129)
  • start (202-203)
cratedb_toolkit/util/database.py (1)
  • DatabaseAdapter (36-412)
cratedb_toolkit/config.py (1)
  • configure (35-51)
cratedb_toolkit/testing/testcontainers/util.py (5)
tests/conftest.py (1)
  • setup (56-62)
tests/io/dynamodb/conftest.py (1)
  • setup (45-56)
tests/io/mongodb/conftest.py (1)
  • setup (38-48)
tests/io/influxdb/conftest.py (1)
  • setup (28-34)
cratedb_toolkit/testing/testcontainers/cratedb.py (2)
  • start (176-182)
  • stop (184-189)
🪛 LanguageTool
doc/cluster/backlog.md

[formatting] ~19-~19: These punctuation marks differ from each other. Use only one if you are ending a sentence.
Context: ...=nightly? => Add --channeloption!? ## Iteration +2 - Python API: Makecl...

(DOUBLE_PUNCTUATION_PREMIUM)


[uncategorized] ~26-~26: Loose punctuation mark.
Context: ...upport. Powershell? - ctk cluster list: New - cluster/core.py: FIXME: Accept ...

(UNLIKELY_OPENING_PUNCTUATION)


[formatting] ~29-~29: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...rt_managed_csv_local- Shell: Currently, because parameters likeusernameandpasswor...

(COMMA_BEFORE_BECAUSE)

doc/backlog/main.md

[style] ~4-~4: Consider using a more polite way of asking this question.
Context: # Main Backlog ## Iteration +0 - CLI: Is it possible to just use --url instead of `--sqlal...

(IS_IT_POSSIBLE_TO)


[style] ~5-~5: Consider using a more polite way of asking this question.
Context: ...sqlalchemy-urlor--http-url? - CLI: Is it possible to just use --cluster` to address a cl...

(IS_IT_POSSIBLE_TO)

doc/cluster/_address.md

[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...

(UNLIKELY_OPENING_PUNCTUATION)

doc/cluster/python.md

[uncategorized] ~40-~40: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...

(UNLIKELY_OPENING_PUNCTUATION)

doc/io/index.md

[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (io-subsystem)= # I/O Subsystem Load and extract data int...

(AI_EN_LECTOR_MISSING_PUNCTUATION)

doc/util/shell.md

[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (shell)= # ctk shell The database shell interface ...

(AI_EN_LECTOR_MISSING_PUNCTUATION)

🪛 GitHub Actions: Tests: Common
tests/io/test_import.py

[error] 109-109: Test failure in 'test_import_managed_parquet_remote': AssertionError due to missing exact log message 'Import succeeded (status: SUCCEEDED)' in captured log messages, despite import job reporting success.

🪛 GitHub Check: codecov/patch
cratedb_toolkit/cluster/cli.py

[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests


[warning] 94-94: cratedb_toolkit/cluster/cli.py#L94
Added line #L94 was not covered by tests


[warning] 96-97: cratedb_toolkit/cluster/cli.py#L96-L97
Added lines #L96 - L97 were not covered by tests


[warning] 100-100: cratedb_toolkit/cluster/cli.py#L100
Added line #L100 was not covered by tests


[warning] 112-112: cratedb_toolkit/cluster/cli.py#L112
Added line #L112 was not covered by tests


[warning] 114-115: cratedb_toolkit/cluster/cli.py#L114-L115
Added lines #L114 - L115 were not covered by tests


[warning] 118-118: cratedb_toolkit/cluster/cli.py#L118
Added line #L118 was not covered by tests


[warning] 138-138: cratedb_toolkit/cluster/cli.py#L138
Added line #L138 was not covered by tests


[warning] 140-143: cratedb_toolkit/cluster/cli.py#L140-L143
Added lines #L140 - L143 were not covered by tests


[warning] 146-149: cratedb_toolkit/cluster/cli.py#L146-L149
Added lines #L146 - L149 were not covered by tests


[warning] 151-151: cratedb_toolkit/cluster/cli.py#L151
Added line #L151 was not covered by tests


[warning] 157-160: cratedb_toolkit/cluster/cli.py#L157-L160
Added lines #L157 - L160 were not covered by tests


[warning] 166-171: cratedb_toolkit/cluster/cli.py#L166-L171
Added lines #L166 - L171 were not covered by tests

cratedb_toolkit/cluster/core.py

[warning] 99-101: cratedb_toolkit/cluster/core.py#L99-L101
Added lines #L99 - L101 were not covered by tests


[warning] 118-125: cratedb_toolkit/cluster/core.py#L118-L125
Added lines #L118 - L125 were not covered by tests


[warning] 128-131: cratedb_toolkit/cluster/core.py#L128-L131
Added lines #L128 - L131 were not covered by tests


[warning] 135-138: cratedb_toolkit/cluster/core.py#L135-L138
Added lines #L135 - L138 were not covered by tests


[warning] 142-142: cratedb_toolkit/cluster/core.py#L142
Added line #L142 was not covered by tests


[warning] 146-146: cratedb_toolkit/cluster/core.py#L146
Added line #L146 was not covered by tests


[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests


[warning] 150-153: cratedb_toolkit/cluster/core.py#L150-L153
Added lines #L150 - L153 were not covered by tests

cratedb_toolkit/adapter/rockset/cli.py

[warning] 38-38: cratedb_toolkit/adapter/rockset/cli.py#L38
Added line #L38 was not covered by tests


[warning] 41-41: cratedb_toolkit/adapter/rockset/cli.py#L41
Added line #L41 was not covered by tests

cratedb_toolkit/adapter/rockset/server/dependencies.py

[warning] 12-13: cratedb_toolkit/adapter/rockset/server/dependencies.py#L12-L13
Added lines #L12 - L13 were not covered by tests

cratedb_toolkit/cfr/cli.py

[warning] 112-112: cratedb_toolkit/cfr/cli.py#L112
Added line #L112 was not covered by tests


[warning] 141-141: cratedb_toolkit/cfr/cli.py#L141
Added line #L141 was not covered by tests


[warning] 164-164: cratedb_toolkit/cfr/cli.py#L164
Added line #L164 was not covered by tests


[warning] 181-181: cratedb_toolkit/cfr/cli.py#L181
Added line #L181 was not covered by tests


[warning] 207-207: cratedb_toolkit/cfr/cli.py#L207
Added line #L207 was not covered by tests

cratedb_toolkit/cfr/marimo.py

[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by tests

🔇 Additional comments (47)
cratedb_toolkit/cfr/cli.py (2)

15-15: Import updated to use centralized CLI utility.

The import change correctly reflects the migration of make_cli from api.cli to util.app, which is part of the codebase restructuring to centralize cluster addressing options.


32-32: Variable renaming aligned with standardization efforts.

The variables have been consistently renamed from cratedb_sqlalchemy_url to sqlalchemy_url, which aligns with the toolkit-wide standardization of connection variable names.

Also applies to: 61-61

cratedb_toolkit/cfr/marimo.py (1)

53-54: Environment variable name standardized correctly.

The variable rename from cratedb_sqlalchemy_url to sqlalchemy_url is consistent with the codebase-wide standardization effort to remove the cratedb_ prefix from option names and variable identifiers.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by tests

doc/io/index.md (3)

1-1: Added useful document anchor.

The anchor (io-subsystem)= provides a reference point for linking to this section from other documentation, enhancing cross-document navigation.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (io-subsystem)= # I/O Subsystem Load and extract data int...

(AI_EN_LECTOR_MISSING_PUNCTUATION)


57-57: Updated environment variable name for consistency.

The environment variable has been renamed from CRATEDB_CLOUD_CLUSTER_ID to CRATEDB_CLUSTER_ID to align with the updated cluster addressing conventions introduced in the cluster API and CLI subsystems.


64-64: Updated URL to use official CDN.

The example URL for loading the data_marketing.json.gz dataset has been updated to use the official CDN URL, providing a more reliable and consistent source for the example data.

doc/backlog/main.md (2)

3-7: Backlog items for CLI simplification align well with PR objectives.

These new backlog items for simplifying cluster addressing options align perfectly with the SDK and CLI enhancements introduced in this PR. The suggestions to use --url and consolidate cluster addressing options reflect good UX thinking.

🧰 Tools
🪛 LanguageTool

[style] ~4-~4: Consider using a more polite way of asking this question.
Context: # Main Backlog ## Iteration +0 - CLI: Is it possible to just use --url instead of `--sqlal...

(IS_IT_POSSIBLE_TO)


[style] ~5-~5: Consider using a more polite way of asking this question.
Context: ...sqlalchemy-urlor--http-url? - CLI: Is it possible to just use --cluster` to address a cl...

(IS_IT_POSSIBLE_TO)


60-60: Environment variable name update properly reflects cluster abstraction.

Changing from CRATEDB_CLOUD_CLUSTER_ID to CRATEDB_CLUSTER_ID in the backlog supports the unified cluster abstraction layer introduced in this PR, which handles both cloud and standalone clusters.

cratedb_toolkit/cli.py (1)

16-16: Environment variable loading from .env files adds flexibility for configuration.

Adding support for loading environment variables from .env files is an excellent enhancement that aligns with the PR objectives. This implementation ensures configuration through environment variables is simple and flexible for both the SDK and CLI.

Also applies to: 25-25

doc/cluster/_address.md (1)

1-16: Excellent documentation of cluster addressing options.

This documentation clearly explains all available options for addressing clusters and their precedence rules. The explicit mention of environment variables being storable in .env files directly supports the PR's focus on configuration flexibility.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...

(UNLIKELY_OPENING_PUNCTUATION)

doc/cluster/python.md (1)

1-70: Well-structured documentation for the new CrateDB Cluster Python API

The documentation provides a clear overview of the ManagedCluster class and its capabilities. I particularly appreciate:

  • The logical flow from installation to authentication, configuration, and usage
  • Clear examples showing how to use the context manager pattern
  • Explicit mention of mutual exclusivity between address options
  • Documentation of environment variable support including .env file usage
🧰 Tools
🪛 LanguageTool

[uncategorized] ~40-~40: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...

(UNLIKELY_OPENING_PUNCTUATION)

doc/util/shell.md (1)

1-89: Comprehensive documentation for the shell command

The documentation thoroughly covers the shell command functionality with clear sections for installation, usage, and different connection scenarios. The examples are practical and demonstrate both CrateDB Cloud and standalone usage patterns.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~1-~1: A punctuation mark might be missing here.
Context: (shell)= # ctk shell The database shell interface ...

(AI_EN_LECTOR_MISSING_PUNCTUATION)

tests/conftest.py (3)

45-71: Well-structured test fixture for CrateDB

The CrateDBFixture class provides a clean abstraction for test container management, inheriting from PytestTestcontainerAdapter and offering appropriate reset functionality. This maintains separation of concerns and follows good testing practices.


89-106: Good environment management with specific reset

The new reset_environment fixture appropriately complements the existing prune_environment fixture by targeting specific variables from ManagedClusterSettings. The implementation also properly guards against None envvar names.


156-182: Clear cloud environment fixture implementation

The cloud_environment fixture properly sets up the required environment for cloud integration testing, handles missing variables, and ensures proper toolkit configuration. Based on the retrieved learnings, the implementation correctly acknowledges that the variables are partially mutually exclusive.

cratedb_toolkit/io/cli.py (3)

8-11: Clean import refactoring

The imports have been appropriately updated to use the new DatabaseCluster abstraction and standardized option decorators, resulting in cleaner and more focused imports.


62-72: Good simplification of cluster creation and handling

The refactoring to use DatabaseCluster.create streamlines the code by delegating cluster creation details to the appropriate class. The renamed source variable also improves clarity over the previous resource name.


66-71:

✅ Verification successful

Verify DatabaseCluster.create handles empty string parameters correctly

From past review comments, there was an issue with empty strings being treated as valid identifiers. Ensure that DatabaseCluster.create properly normalizes empty strings to None to prevent validation errors.


🏁 Script executed:

#!/bin/bash
# Check how DatabaseCluster.create handles empty strings
ast-grep --pattern 'DatabaseCluster.create' -A 20

Length of output: 3255


🏁 Script executed:

#!/bin/bash
set -e
# Locate the class definition for DatabaseCluster
rg -n "class DatabaseCluster" 

Length of output: 99


🏁 Script executed:

#!/bin/bash
set -e
# Display the DatabaseCluster.create method implementation
rg -n -A200 "def create" cratedb_toolkit/cluster/core.py

Length of output: 2372


🏁 Script executed:

#!/bin/bash
set -e
# Display the ManagedCluster class implementation to see parameter handling
rg -n -A200 "class ManagedCluster" cratedb_toolkit/cluster/core.py

Length of output: 11691


Empty strings are already ignored by DatabaseCluster.create

The create method filters out empty strings via option.strip() != "" and uses truthiness checks to convert "" to None, raising the appropriate errors when no valid address is provided. No changes needed.

doc/cluster/backlog.md (1)

1-70: Structured backlog document facilitates development path.

This well-organized backlog document provides a clear roadmap for future development of the CrateDB Cluster API subsystem. The organization into iterations helps prioritize work and track completed items.

🧰 Tools
🪛 LanguageTool

[formatting] ~19-~19: These punctuation marks differ from each other. Use only one if you are ending a sentence.
Context: ...=nightly? => Add --channeloption!? ## Iteration +2 - Python API: Makecl...

(DOUBLE_PUNCTUATION_PREMIUM)


[uncategorized] ~26-~26: Loose punctuation mark.
Context: ...upport. Powershell? - ctk cluster list: New - cluster/core.py: FIXME: Accept ...

(UNLIKELY_OPENING_PUNCTUATION)


[formatting] ~29-~29: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...rt_managed_csv_local- Shell: Currently, because parameters likeusernameandpasswor...

(COMMA_BEFORE_BECAUSE)

cratedb_toolkit/util/setting.py (3)

13-20: Good design choice using dataclass for Setting.

Using a dataclass for the Setting object provides a clean way to associate click parameters with optional group identifiers, which is useful for the mutual exclusivity checks.


30-37: Robust environment variable handling in init_dotenv

The implementation properly integrates with CONFIG to skip loading from .env files when running in test environments. This helps prevent test environments from being affected by local development settings.


40-69: Flexible configuration options for different contexts

The obtain_settings function provides a comprehensive approach to configuration, supporting multiple sources (CLI, env vars) and different contexts (Jupyter, pytest). This implementation promotes flexibility in how the toolkit can be used.

cratedb_toolkit/cluster/cli.py (3)

21-58: Well-documented command help functions

The help functions for each command provide clear examples which will be valuable for users learning the CLI. This is a good practice that improves usability.


80-82: Properly implemented info command with error handling

The info command correctly uses the handle_command_errors context manager and the ClusterInformation.from_id_or_name method, following the recommended pattern from previous reviews.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests


94-100: Well-structured start command with appropriate error handling

The start command properly implements the cluster start operation and displays the result. The code follows the project convention of using the error handler for the specific operation.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 94-94: cratedb_toolkit/cluster/cli.py#L94
Added line #L94 was not covered by tests


[warning] 96-97: cratedb_toolkit/cluster/cli.py#L96-L97
Added lines #L96 - L97 were not covered by tests


[warning] 100-100: cratedb_toolkit/cluster/cli.py#L100
Added line #L100 was not covered by tests

cratedb_toolkit/config.py (3)

8-28: Good environment detection implementation

The properties for detecting Jupyter and pytest environments are clean and maintainable. Using environment variables for detection is a reliable approach.


68-87: Effective solution for Halo spinner issues

The approach to disable the Halo spinner in Jupyter and pytest environments is clever and well-implemented. The reference to the upstream issue is helpful for context.


89-96: Clean implementation of NoopHalo

The NoopHalo class effectively extends nullcontext with the additional stop method, providing a seamless replacement for the Halo spinner.

cratedb_toolkit/io/croud.py (3)

47-65: Well-designed CloudJob properties with defensive error handling

The properties for accessing job information (id, status, message) correctly handle missing keys with .get() and default values, preventing KeyError exceptions. This is good defensive programming, especially for cloud API integrations where response structures might vary.


67-84: Good handling of upstream API quirk with proper documentation

The fix_job_info_table_name method addresses an upstream API issue with clear documentation about the problem and the workaround. The FIXME comment properly flags this as temporary until the upstream issue is fixed.


111-126: Improved error handling with meaningful messages

The error handling in load_resource has been enhanced with a specific timeout-based approach, clear error messages, and appropriate fallback to use the original job info if needed. This makes debugging cloud import issues much easier.

cratedb_toolkit/testing/testcontainers/util.py (3)

48-54: Robust port detection and normalization for container address

The improvements to get_real_host_address() make it more resilient by:

  1. Dynamically determining the port attribute from either self.port or self.port_to_expose
  2. Adding proper error handling when no port is found
  3. Stripping protocol suffixes from port strings (e.g., "5432/tcp" → "5432")

This enhances reliability when working with different container types that might expose port information differently.


141-162: Well-designed Docker exception handling for test environments

The DockerSkippingContainer class provides a clean approach to handling Docker unavailability in tests by:

  1. Setting _container early to avoid destructor issues
  2. Catching Docker exceptions and converting them to pytest skips
  3. Identifying different error conditions like "Connection aborted" and "Error while fetching server API version"

This makes tests more robust in environments where Docker might not be available.


164-206: Good abstraction with PytestTestcontainerAdapter for test containers

The PytestTestcontainerAdapter class provides a clean interface for Pytest-based test containers with:

  1. An abstract setup() method that subclasses must implement
  2. Error handling that converts Docker exceptions to pytest skips
  3. Standard start() and stop() methods for lifecycle management

This promotes consistent patterns across test fixtures and reduces code duplication.

However, there's some duplication in the Docker exception handling logic between this class and DockerSkippingContainer. Consider extracting the common error detection logic to a shared helper function:

def is_docker_unavailable_error(ex):
    """Check if exception indicates Docker daemon is unavailable."""
    return any(token in str(ex) for token in (
        "Connection aborted", 
        "Error while fetching server API version"
    ))

This would help maintain consistency as noted in your FIXME comments.

cratedb_toolkit/util/croud.py (4)

47-62: Good solution for handling partial JSON responses

The invoke_safedecode method elegantly handles the case where the API might return multiple JSON payloads, using JSONDecoder.raw_decode() to extract only the first valid JSON object. The method includes clear documentation explaining the upstream issue and properly unpacks the tuple returned by raw_decode().


170-179: Improved log redirection with context

The approach to redirecting croud log messages to the Python logger is well done. Setting the level to DEBUG and prefixing with [croud] makes it clear where these messages originated, and the detailed comment explains the alternative approaches considered.


213-260: Well-designed CroudClient with custom User-Agent and headless support

The CroudClient class provides several improvements:

  1. Custom User-Agent header for better telemetry and tracking
  2. Factory method with clear documentation
  3. Support for headless operation using environment variables
  4. Monkey-patching the upstream client for consistent behavior

However, the headless configuration doesn't verify if required credentials are actually available before using them:

@classproperty
def get_headless_config(cls) -> Configuration:
    cfg = Configuration("croud.yaml")
    if cfg._file_path.exists() and "CRATEDB_CLOUD_API_KEY" not in os.environ:
        return cfg

    tmp_file = NamedTemporaryFile()
    tmp_path = Path(tmp_file.name)
    config = Configuration("headless.yaml", tmp_path)

    # Get credentials from the environment.
    config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY")
    config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET")
    config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID")
+    
+    # Check if required credentials are available
+    if not config.profile["key"] or not config.profile["secret"]:
+        logger.warning("Missing required cloud credentials: CRATEDB_CLOUD_API_KEY and/or CRATEDB_CLOUD_API_SECRET")

    return config

This would provide better diagnostics when credentials are missing.


282-320: Well-documented and robust table name quoting function

The table_fqn function is:

  1. Well-documented with clear docstring and examples
  2. Handles various cases (simple table names, schema-qualified names)
  3. Includes proper validation for empty input
  4. Has good examples showing expected behavior

This will be useful for building proper SQL queries with quoted identifiers.

cratedb_toolkit/cluster/croud.py (6)

152-169: LGTM! Robust project creation implementation.

The create_project method properly handles command arguments safely by passing them as separate items rather than concatenated strings, which prevents issues with special characters in project names and organization IDs.


179-227: Well-structured cluster deployment with smart subscription handling.

The code intelligently selects a subscription when only one exists and provides clear error messages with helpful context when multiple subscriptions exist. The credential validation is thorough and user-friendly.


227-253: Robust command construction with safe argument handling.

The deployment arguments are correctly passed as separate items rather than concatenated strings, which prevents issues with special characters. The code also leverages environment variables for configuration parameters like tier and product name.


281-298: Clean implementation of suspend/resume operations.

The cluster suspension and resumption API calls are properly implemented using the REST API approach with clear error reporting. The code follows a consistent pattern for error checking and exception handling.


310-394: Import job creation logic handles local and remote resources appropriately.

The code correctly differentiates between local files and remote URLs using a robust check, and dynamically constructs the appropriate command arguments. The validation of required parameters is thorough and occurs early.


395-406: JWT token retrieval implemented correctly.

The implementation of get_jwt_token follows a consistent error handling pattern and returns a properly formatted response object. This is critical for authenticating cluster operations.

cratedb_toolkit/cluster/core.py (3)

37-95: Well-structured settings management with proper mutual exclusivity.

The ManagedClusterSettings class effectively encapsulates cluster configuration and handles mutual exclusivity validation for settings like cluster ID vs. cluster name. The implementation correctly leverages environment variables and CLI options with appropriate help text.


144-163: Robust context manager implementation with proper error handling.

The __exit__ method implementation correctly handles exceptions during cleanup. It follows best practices by:

  1. Logging the original exception without suppressing it
  2. Attempting to close connections first
  3. Using a finally block to ensure stop is called when needed regardless of connection closing errors
  4. Properly logging all errors but not suppressing the original exception

This ensures resources are properly cleaned up even when errors occur.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 146-146: cratedb_toolkit/cluster/core.py#L146
Added line #L146 was not covered by tests


[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests


[warning] 150-153: cratedb_toolkit/cluster/core.py#L150-L153
Added lines #L150 - L153 were not covered by tests


586-642: Great implementation of the factory pattern for cluster creation.

The DatabaseCluster class provides a clean factory implementation that enforces exactly one connection method. The validation logic is robust, and the error messages are clear and informative. This design pattern makes it easy to instantiate the appropriate cluster type based on the available connection information.

Comment on lines +86 to +87
# TODO: Read back data from database.

Copy link

@coderabbitai coderabbitai bot May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Implement database verification for data integrity

The TODO indicates the test isn't fully validating the import operation by verifying the data actually made it to the database.

Consider adding verification similar to the standalone tests:

# Read back data from database to verify import success
cloud_environment.database.run_sql("REFRESH TABLE dummy;")
result = cloud_environment.database.run_sql("SELECT COUNT(*) FROM dummy;")
assert result == [(2,)]  # Assuming same data as in dummy_csv
🤖 Prompt for AI Agents (early access)
In tests/io/test_import.py at lines 86 to 87, the test currently lacks verification that imported data is correctly stored in the database. To fix this, add code to refresh the relevant database table and query it to confirm the expected number of rows exist, similar to the example provided. This will ensure the test validates data integrity after import.

Copy link
Member Author

@amotl amotl May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai: I do not see a todo item: It's a comment that describes what the subsequent code does.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotl There are actually two TODO items in the code:

  1. In the test_import_managed_csv_local function at line 86:
# TODO: Read back data from database.
  1. In the test_import_managed_parquet_remote function at the end:
# TODO: Read back data from database.

These TODOs indicate that verification of the imported data should be added to ensure data integrity after import, which is what the previous suggestion was addressing.

@amotl amotl force-pushed the amo/cloud-second branch from 8e1a905 to 6b8f1aa Compare May 2, 2025 18:59
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
doc/cluster/backlog.md (1)

44-50: 🛠️ Refactor suggestion

Missing resolution action for concurrent resume operations

The issue with concurrent resume operations is well-identified, but after the step to evaluate the status (cloud.info.last_async_operation.status), there's no clear action defined for what to do when this situation is detected.

  => Evaluate `cloud.info.last_async_operation.status` = `SENT|IN_PROGRESS`.

  See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384.
+  => Add retry mechanism with exponential backoff to automatically handle this case, or 
+     provide clear error messages instructing users to wait for the existing operation to complete.
cratedb_toolkit/io/croud.py (1)

162-164: In-place modification of TableAddress may surprise callers

The current implementation mutates target.table when a schema is provided, which could lead to unexpected behavior for code that still needs to access the original schema and table separately.

Consider creating a new instance rather than modifying the original:

-        if target.schema is not None:
-            target.table = f'"{target.schema}"."{target.table}"'
+        table_param = target.table
+        if target.schema is not None:
+            table_param = f'"{target.schema}"."{target.table}"'
+
+        # Create job without modifying the original target
+        return CloudJob(info=self.cluster.create_import_job(
+            resource=resource,
+            target=dataclasses.replace(target, table=table_param)
+        ))
🧹 Nitpick comments (4)
cratedb_toolkit/cluster/model.py (1)

25-28: Consider tracking TODOs as GitHub issues

There are TODOs in the code for token persistence and automatic refreshing, but they might get lost or forgotten.

Consider converting these TODOs to GitHub issues to ensure they're tracked properly and addressed in future iterations. You can reference the issues in the code comments to maintain the context.

doc/backlog/main.md (1)

3-6: Consider more collaborative phrasing for CLI suggestions

The current phrasing uses "Is it possible to..." which might come across as questioning rather than suggesting. A more collaborative or direct tone might be more effective.

## Iteration +0
- CLI: Is it possible to just use `--url` instead of `--sqlalchemy-url` or `--http-url`?
- CLI: Is it possible to just use `--cluster` to address a cluster, effectively shrinking
  all four addressing options into a single one? Maybe optionally?
+## Iteration +0
+- CLI: Consider simplifying by using `--url` instead of `--sqlalchemy-url` or `--http-url`
+- CLI: Explore consolidating all four addressing options into a single `--cluster` option
+  for more intuitive cluster addressing. Could be implemented as an option alongside existing methods.
🧰 Tools
🪛 LanguageTool

[style] ~4-~4: Consider using a more polite way of asking this question.
Context: # Main Backlog ## Iteration +0 - CLI: Is it possible to just use --url instead of `--sqlal...

(IS_IT_POSSIBLE_TO)


[style] ~5-~5: Consider using a more polite way of asking this question.
Context: ...sqlalchemy-urlor--http-url? - CLI: Is it possible to just use --cluster` to address a cl...

(IS_IT_POSSIBLE_TO)

cratedb_toolkit/cluster/cli.py (1)

146-151: Simplify output format validation in list_jobs

The format validation is duplicating what Click already does with the Choice type.

# Display job information.
if output_format == "json":
    print(json.dumps(data, indent=2), file=sys.stdout)  # noqa: T201
elif output_format == "yaml":
    print(yaml.dump(data), file=sys.stdout)  # noqa: T201
-else:
-    raise ValueError(f"Unknown output format: {output_format}")

The else case is unnecessary since Click's Choice type already validates that output_format will only be either "json" or "yaml".

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 146-149: cratedb_toolkit/cluster/cli.py#L146-L149
Added lines #L146 - L149 were not covered by tests


[warning] 151-151: cratedb_toolkit/cluster/cli.py#L151
Added line #L151 was not covered by tests

cratedb_toolkit/io/croud.py (1)

111-117: Retry loop might be too simple for real-world cloud scenarios

The job discovery logic uses fixed retry delay which could be insufficient for cloud environments where job processing times may vary significantly.

Consider implementing exponential backoff to handle varying latencies:

-        for _ in range(max_retries):
-            job = self.find_job(job_id=job_id)
-            if job.found:
-                break
-            time.sleep(retry_delay)
+        for attempt in range(max_retries):
+            job = self.find_job(job_id=job_id)
+            if job.found:
+                break
+            # Exponential backoff with base delay, capped at a reasonable maximum
+            sleep_time = min(retry_delay * (2 ** attempt), 5.0)
+            time.sleep(sleep_time)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e1a905 and 6b8f1aa.

📒 Files selected for processing (12)
  • cratedb_toolkit/cli.py (2 hunks)
  • cratedb_toolkit/cluster/cli.py (2 hunks)
  • cratedb_toolkit/cluster/core.py (1 hunks)
  • cratedb_toolkit/cluster/croud.py (1 hunks)
  • cratedb_toolkit/cluster/model.py (1 hunks)
  • cratedb_toolkit/io/croud.py (4 hunks)
  • cratedb_toolkit/job/cli.py (0 hunks)
  • cratedb_toolkit/job/croud.py (0 hunks)
  • cratedb_toolkit/util/client.py (1 hunks)
  • doc/backlog/main.md (3 hunks)
  • doc/cluster/backlog.md (1 hunks)
  • tests/cluster/test_cli.py (1 hunks)
💤 Files with no reviewable changes (2)
  • cratedb_toolkit/job/croud.py
  • cratedb_toolkit/job/cli.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • cratedb_toolkit/cli.py
  • cratedb_toolkit/util/client.py
  • tests/cluster/test_cli.py
🧰 Additional context used
🧠 Learnings (2)
cratedb_toolkit/cluster/core.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/api/main.py:263-265
Timestamp: 2025-04-26T21:59:59.463Z
Learning: When handling DNS propagation delays after resource creation (like CrateDB Cloud clusters), both fixed sleeps and immediate polling have drawbacks. Fixed sleeps are brittle, but immediate polling risks negative DNS caching. A hybrid approach with a short initial sleep followed by polling with exponential backoff is preferred.
cratedb_toolkit/cluster/cli.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cluster/cli.py:82-89
Timestamp: 2025-04-27T03:07:04.375Z
Learning: In the cratedb-toolkit codebase, the `handle_command_errors` context manager is designed as an error handler for specific operations (not as a general protective wrapper). It's intentionally used to handle errors for targeted operations rather than wrapping entire function executions.
🪛 GitHub Check: codecov/patch
cratedb_toolkit/cluster/core.py

[warning] 99-101: cratedb_toolkit/cluster/core.py#L99-L101
Added lines #L99 - L101 were not covered by tests


[warning] 118-125: cratedb_toolkit/cluster/core.py#L118-L125
Added lines #L118 - L125 were not covered by tests


[warning] 128-131: cratedb_toolkit/cluster/core.py#L128-L131
Added lines #L128 - L131 were not covered by tests


[warning] 135-138: cratedb_toolkit/cluster/core.py#L135-L138
Added lines #L135 - L138 were not covered by tests


[warning] 142-142: cratedb_toolkit/cluster/core.py#L142
Added line #L142 was not covered by tests


[warning] 146-146: cratedb_toolkit/cluster/core.py#L146
Added line #L146 was not covered by tests


[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests


[warning] 150-153: cratedb_toolkit/cluster/core.py#L150-L153
Added lines #L150 - L153 were not covered by tests

cratedb_toolkit/cluster/cli.py

[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests


[warning] 94-94: cratedb_toolkit/cluster/cli.py#L94
Added line #L94 was not covered by tests


[warning] 96-97: cratedb_toolkit/cluster/cli.py#L96-L97
Added lines #L96 - L97 were not covered by tests


[warning] 100-100: cratedb_toolkit/cluster/cli.py#L100
Added line #L100 was not covered by tests


[warning] 112-112: cratedb_toolkit/cluster/cli.py#L112
Added line #L112 was not covered by tests


[warning] 114-115: cratedb_toolkit/cluster/cli.py#L114-L115
Added lines #L114 - L115 were not covered by tests


[warning] 118-118: cratedb_toolkit/cluster/cli.py#L118
Added line #L118 was not covered by tests


[warning] 138-138: cratedb_toolkit/cluster/cli.py#L138
Added line #L138 was not covered by tests


[warning] 140-143: cratedb_toolkit/cluster/cli.py#L140-L143
Added lines #L140 - L143 were not covered by tests


[warning] 146-149: cratedb_toolkit/cluster/cli.py#L146-L149
Added lines #L146 - L149 were not covered by tests


[warning] 151-151: cratedb_toolkit/cluster/cli.py#L151
Added line #L151 was not covered by tests


[warning] 157-160: cratedb_toolkit/cluster/cli.py#L157-L160
Added lines #L157 - L160 were not covered by tests


[warning] 166-171: cratedb_toolkit/cluster/cli.py#L166-L171
Added lines #L166 - L171 were not covered by tests

🪛 LanguageTool
doc/backlog/main.md

[style] ~4-~4: Consider using a more polite way of asking this question.
Context: # Main Backlog ## Iteration +0 - CLI: Is it possible to just use --url instead of `--sqlal...

(IS_IT_POSSIBLE_TO)


[style] ~5-~5: Consider using a more polite way of asking this question.
Context: ...sqlalchemy-urlor--http-url? - CLI: Is it possible to just use --cluster` to address a cl...

(IS_IT_POSSIBLE_TO)

doc/cluster/backlog.md

[uncategorized] ~9-~9: The preposition “in” seems more likely in this position.
Context: ...issues], if you want to signal interest about them. If you feel you could whip up and...

(AI_EN_LECTOR_REPLACEMENT_PREPOSITION)


[formatting] ~19-~19: These punctuation marks differ from each other. Use only one if you are ending a sentence.
Context: ...=nightly? => Add --channeloption!? ## Iteration +2 - Python API: Makecl...

(DOUBLE_PUNCTUATION_PREMIUM)


[uncategorized] ~26-~26: Loose punctuation mark.
Context: ...upport. Powershell? - ctk cluster list: New - cluster/core.py: FIXME: Accept ...

(UNLIKELY_OPENING_PUNCTUATION)


[formatting] ~29-~29: If the ‘because’ clause is essential to the meaning, do not use a comma before the clause.
Context: ...rt_managed_csv_local- Shell: Currently, because parameters likeusernameandpasswor...

(COMMA_BEFORE_BECAUSE)

⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: build-and-publish
  • GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
🔇 Additional comments (11)
cratedb_toolkit/io/croud.py (3)

23-84: Well-structured CloudJob dataclass introduction

This is a clean implementation of a dataclass that encapsulates job details with proper error handling for missing fields. The properties and methods provide convenient access to job attributes without having to navigate nested dictionaries.


67-84: Good workaround for upstream API flaw with a clear FIXME note

The method properly addresses the issue with unquoted table names in the API response. The documentation clearly explains the problem, solution, and signals this is temporary via the FIXME comment.


142-149: Robust retry mechanism implementation for job retrieval

The code now includes a solid retry mechanism to find jobs after submission, with proper error handling if the job never appears in the listing.

cratedb_toolkit/cluster/croud.py (5)

22-90: Well-structured URL generators improve code organization

The CloudRootUrlGenerator and CloudClusterUrlGenerator classes provide a clean, maintainable approach to generating API URLs. This pattern reduces duplication and centralizes URL construction logic.


198-226: Smart subscription selection with clear error messages

The implementation intelligently selects a subscription when only one exists and provides clear error messages when there are none or multiple subscriptions. The username and password requirements are properly validated with helpful error messages.


228-252: Improved deployment configuration with environment variables

The code now uses environment variables for all deployment parameters, including tier and product name, making the deployment process more flexible without requiring code changes.


354-382: Remote resource detection logic improved

The code now properly detects remote resources using the presence of "://" in the URL, rather than relying solely on Path.exists(), which can give false positives/negatives in certain scenarios.


383-385: Safe command argument construction

Arguments are now passed as separate items rather than being concatenated with =, which prevents issues with special characters in values.

cratedb_toolkit/cluster/core.py (3)

37-101: Well-designed settings class with mutual exclusiveness validation

The ManagedClusterSettings class provides a comprehensive approach for handling cluster settings with proper validation of mutually exclusive options. The separation of CLI and environment variable configuration is clean and follows best practices.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 99-101: cratedb_toolkit/cluster/core.py#L99-L101
Added lines #L99 - L101 were not covered by tests


140-163: Robust context manager implementation with proper error handling

The __enter__ and __exit__ methods ensure proper resource management, with careful handling of exceptions and appropriate logging. The design ensures both connections and clusters are cleaned up properly, with separate try/except/finally blocks.

The static analysis indicates this code lacks test coverage. Consider adding tests specifically for the context manager behavior:

# Example test for context manager behavior
def test_managed_cluster_context_manager():
    # Setup test doubles/mocks
    cluster = ManagedCluster(cluster_id="test-id")
    cluster.start = MagicMock(return_value="start_result")
    cluster.close_connections = MagicMock()
    cluster.stop = MagicMock()
    
    # Test normal operation
    with cluster as result:
        assert result == "start_result"
    
    # Verify method calls
    cluster.start.assert_called_once()
    cluster.close_connections.assert_called_once()
    
    # Test with stop_on_exit=True
    cluster = ManagedCluster(cluster_id="test-id", stop_on_exit=True)
    cluster.start = MagicMock(return_value="start_result")
    cluster.close_connections = MagicMock()
    cluster.stop = MagicMock()
    
    with cluster:
        pass
    
    cluster.stop.assert_called_once()
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 142-142: cratedb_toolkit/cluster/core.py#L142
Added line #L142 was not covered by tests


[warning] 146-146: cratedb_toolkit/cluster/core.py#L146
Added line #L146 was not covered by tests


[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests


[warning] 150-153: cratedb_toolkit/cluster/core.py#L150-L153
Added lines #L150 - L153 were not covered by tests


586-642: Excellent cluster factory pattern with clear validation

The DatabaseCluster class provides a clean factory pattern that enforces mutual exclusivity of connection methods and clearly signals errors with specific exceptions. The whitespace validation in address counting is a nice touch to avoid empty strings being counted as valid options.

Comment on lines +40 to +46
@property
def cloud_id(self) -> str:
return self.cloud["id"]

@property
def cloud_name(self) -> str:
return self.cloud["name"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling to cloud_id and cloud_name properties

The properties directly access dictionary keys which could raise KeyError if the keys don't exist.

@property
def cloud_id(self) -> str:
-    return self.cloud["id"]
+    if "id" not in self.cloud:
+        raise ValueError("Cloud cluster information is missing 'id' field")
+    return self.cloud["id"]

@property
def cloud_name(self) -> str:
-    return self.cloud["name"]
+    if "name" not in self.cloud:
+        raise ValueError("Cloud cluster information is missing 'name' field")
+    return self.cloud["name"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@property
def cloud_id(self) -> str:
return self.cloud["id"]
@property
def cloud_name(self) -> str:
return self.cloud["name"]
@property
def cloud_id(self) -> str:
if "id" not in self.cloud:
raise ValueError("Cloud cluster information is missing 'id' field")
return self.cloud["id"]
@property
def cloud_name(self) -> str:
if "name" not in self.cloud:
raise ValueError("Cloud cluster information is missing 'name' field")
return self.cloud["name"]
🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/cluster/model.py around lines 40 to 46, the cloud_id and cloud_name properties directly access dictionary keys without error handling, which can raise KeyError if the keys are missing. Modify these properties to safely access the keys using the dictionary get method with a default value or add try-except blocks to catch KeyError and handle it gracefully, such as returning None or a default string.

Comment on lines +204 to +230
def probe(self) -> "ManagedCluster":
"""
Probe a CrateDB Cloud cluster, API-wise.
TODO: Investigate callers, and reduce number of invocations.
TODO: self._jwt_ctx is created once in probe() and reused for every query.
If the per-cluster JWT expires (default 1 h) subsequent queries will fail.
Consider refreshing self._jwt_ctx (or simply calling jwt_token_patch()
inside query()) when probe() is older than e.g. 30 min.
"""
try:
self.info = ClusterInformation.from_id_or_name(cluster_id=self.cluster_id, cluster_name=self.cluster_name)
self.cluster_id = self.info.cloud["id"]
self.cluster_name = self.info.cloud["name"]
self.address = DatabaseAddress.from_httpuri(self.info.cloud["url"])
self._jwt_ctx = jwt_token_patch(self.info.jwt.token)
if self.cluster_id:
self.operation = CloudClusterServices(cluster_id=self.cluster_id)

except (CroudException, DatabaseAddressMissingError) as ex:
self.exists = False
if "Cluster not found" not in str(ex):
raise
if self.info.cloud:
self.exists = True
logger.info(f"Cluster information: name={self.info.cloud.get('name')}, url={self.info.cloud.get('url')}")
return self
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

JWT token expiration not handled

The current implementation creates a JWT token context once in probe() and reuses it for every query. If the token expires (typically after 1 hour), subsequent queries will fail.

Consider tracking token creation time and refreshing it when needed:

def __init__(self, ...):
    # Existing code...
    self._jwt_ctx = nullcontext()
+   self._jwt_token_created_at = 0  # Timestamp when token was created

def probe(self):
    # Existing code...
    self._jwt_ctx = jwt_token_patch(self.info.jwt.token)
+   self._jwt_token_created_at = time.time()
    # Rest of the method...

def query(self, sql: str):
    # Ensure we have cluster connection details
    if not self.info or not self.info.cloud.get("url"):
        self.probe()
+   
+   # Refresh token if it's older than 30 minutes (to be safe before the 1h expiry)
+   current_time = time.time()
+   if current_time - self._jwt_token_created_at > 1800:  # 30 minutes
+       logger.debug("Refreshing JWT token")
+       self._jwt_ctx = jwt_token_patch(self.info.jwt.token)
+       self._jwt_token_created_at = current_time
    
    with self._jwt_ctx:
        # Rest of the method...

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents (early access)
In cratedb_toolkit/cluster/core.py around lines 204 to 230, the JWT token context is created once in the probe() method and reused indefinitely, which causes failures when the token expires after about an hour. To fix this, add tracking of the JWT token creation time when setting self._jwt_ctx, then modify the query logic to check the token age before each query and refresh the token by calling jwt_token_patch() again if the token is older than a threshold (e.g., 30 minutes). This ensures the token is valid for all queries after probe() is called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant