PyCaret: On CI, only use model `arima`, providing a smaller MASE by amotl · Pull Request #300 · crate/cratedb-examples

amotl · 2024-02-13T22:07:05Z

Problem

ML: Update to mlflow-cratedb 2.10.2 #296 is constantly failing now.
AutoML: CI trips with ValueError: Input contains NaN. #298

Solution

@andnig suggested:

Use a different model for the test run, one which has less MASE.

Thanks!

References

AutoML: CI trips with CellTimeoutError / ValueError: Input contains NaN. #170 (comment)
AutoML: CI trips with ValueError: Input contains NaN. #298 (comment)

amotl · 2024-02-27T12:12:43Z

@andnig suggested at #298 (comment) that this patch is wrong, so I am closing the PR again. Thanks!

coderabbitai · 2026-04-24T01:19:34Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2208d354-b028-4d0b-b022-7d646f2ede1f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	⚠️ Warning	The pull request title references 'arima' model, but the actual changes show only 'et_cds_dt' is used in CI, not 'arima'. This is misleading.	Update the title to accurately reflect that only 'et_cds_dt' model is used on CI, not 'arima'. For example: 'PyCaret: On CI, only use model et_cds_dt, providing smallest MASE'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description is directly related to the changeset, explaining the problem (failing CI tests GH-296 and GH-298) and the solution (using a model with lower MASE) implemented in the code changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch amo/tame-mlflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8208e842-75c3-4366-b8ce-870cb047a493

📥 Commits

Reviewing files that changed from the base of the PR and between 8e3582f and 452b061.

📒 Files selected for processing (2)

topic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.ipynb
topic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.py

coderabbitai · 2026-04-24T01:22:16Z

    "    best_models = compare_models(sort=\"MASE\",\n",
-    "                                 include=[\"ets\", \"et_cds_dt\", \"naive\"],\n",
+    "                                 include=[\"et_cds_dt\"],\n",
    "                                 n_select=3)\n",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm downstream usage of `best_models` to see if 3 entries are assumed. rg -nP -C3 '\bbest_models\b' --iglob '*pycaret*' # List PyCaret time-series model ids used in this repo for sanity-checking et_cds_dt vs ets_cds_dt. rg -nP -C2 '\b(et|ets)_cds_dt\b' --iglob '*pycaret*'

Repository: crate/cratedb-examples

Length of output: 12423

🌐 Web query:

PyCaret time series model ids: does "et_cds_dt" and/or "ets_cds_dt" exist, and what do they refer to?

💡 Result:

Yes, "et_cds_dt" exists as a PyCaret time series model ID. It refers to "Extra Trees w/ Cond. Deseasonalize & Detrending", an Extra Trees Regressor (from scikit-learn) applied after conditional deseasonalization and detrending of the time series data. "ets_cds_dt" does not exist; "ets" is a separate model ID for the ETS (Exponential Smoothing State Space Model) forecaster, but it is not suffixed with "_cds_dt".

Citations:

1: https://pycaret.readthedocs.io/en/stable/api/time_series.html

2: https://github.com/pycaret/pycaret/blob/master/pycaret/containers/models/time_series.py

3: https://pycaret.readthedocs.io/en/latest/api/time_series.html

4: https://github.com/pycaret/pycaret/blob/master/tutorials/time_series/forecasting/univariate_without_exogeneous_part2.ipynb

Fix n_select=3 to align with the single-model include list.

With include=["et_cds_dt"], compare_models has only one candidate to rank, so n_select=3 will silently return a single-element list on CI. Downstream code iterates over best_models (lines 1163, 1660) and passes it to blend_models, which may behave differently with 1 model versus 3. Change n_select=3 to n_select=1 on the CI branch to align intent with behavior.

Proposed fix

if "PYTEST_CURRENT_TEST" in os.environ: best_models = compare_models(sort="MASE", include=["et_cds_dt"], - n_select=3) + n_select=1)

> If I look at the failed CI run, I see the the esm model has an > incredibly high MASE and RMSSE. This mostly indicates that the > model is not very well suited for the data. I suggested it, as > it is very lightweight, but well, too lightweight as it seems 😓. > To go forward, you could use a different model for the test run, one > which has less MASE. Run the whole PyCaret model suite locally and > select one of the top 5 models instead of `exp_smooth`.

amotl force-pushed the amo/tame-mlflow branch from 1e7f486 to e8c65af Compare February 13, 2024 22:31

amotl changed the title ~~MLflow: Use models "arima" and "ets" only on CI, to reduce consumption~~ MLflow: On CI, use model "ets_cds_dt" only, having the smallest MASE Feb 13, 2024

amotl mentioned this pull request Feb 14, 2024

AutoML: CI trips with ValueError: Input contains NaN. #298

Open

amotl closed this Feb 27, 2024

amotl deleted the amo/tame-mlflow branch February 27, 2024 12:12

amotl restored the amo/tame-mlflow branch April 24, 2026 01:19

amotl reopened this Apr 24, 2026

amotl force-pushed the amo/tame-mlflow branch from e8c65af to 452b061 Compare April 24, 2026 01:19

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

amotl linked an issue Apr 24, 2026 that may be closed by this pull request

AutoML: CI trips with ValueError: Input contains NaN. #298

Open

amotl changed the title ~~MLflow: On CI, use model "ets_cds_dt" only, having the smallest MASE~~ PyCaret: On CI, use model "ets_cds_dt" only, having the smallest MASE Apr 24, 2026

amotl changed the title ~~PyCaret: On CI, use model "ets_cds_dt" only, having the smallest MASE~~ PyCaret: On CI, use model "ets_cds_dt" only, providing the smallest MASE Apr 24, 2026

amotl force-pushed the amo/tame-mlflow branch 3 times, most recently from 707cbcd to 89fb1ad Compare April 25, 2026 22:42

amotl changed the title ~~PyCaret: On CI, use model "ets_cds_dt" only, providing the smallest MASE~~ PyCaret: On CI, only use model arima, providing a smaller MASE Apr 25, 2026

amotl force-pushed the amo/tame-mlflow branch from 89fb1ad to d1a00b3 Compare April 25, 2026 23:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyCaret: On CI, only use model `arima`, providing a smaller MASE#300

PyCaret: On CI, only use model `arima`, providing a smaller MASE#300
amotl wants to merge 1 commit intomainfrom
amo/tame-mlflow

amotl commented Feb 13, 2024 •

edited

Loading

Uh oh!

amotl commented Feb 27, 2024 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Review skipped

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amotl commented Feb 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

References

Uh oh!

amotl commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

amotl commented Feb 13, 2024 •

edited

Loading

amotl commented Feb 27, 2024 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading