Skip to content

PyCaret: On CI, only use model arima, providing a smaller MASE#300

Draft
amotl wants to merge 1 commit intomainfrom
amo/tame-mlflow
Draft

PyCaret: On CI, only use model arima, providing a smaller MASE#300
amotl wants to merge 1 commit intomainfrom
amo/tame-mlflow

Conversation

@amotl
Copy link
Copy Markdown
Member

@amotl amotl commented Feb 13, 2024

@amotl amotl changed the title MLflow: Use models "arima" and "ets" only on CI, to reduce consumption MLflow: On CI, use model "ets_cds_dt" only, having the smallest MASE Feb 13, 2024
@amotl
Copy link
Copy Markdown
Member Author

amotl commented Feb 27, 2024

@andnig suggested at #298 (comment) that this patch is wrong, so I am closing the PR again. Thanks!

@amotl amotl closed this Feb 27, 2024
@amotl amotl deleted the amo/tame-mlflow branch February 27, 2024 12:12
@amotl amotl restored the amo/tame-mlflow branch April 24, 2026 01:19
@amotl amotl reopened this Apr 24, 2026
@amotl amotl force-pushed the amo/tame-mlflow branch from e8c65af to 452b061 Compare April 24, 2026 01:19
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2208d354-b028-4d0b-b022-7d646f2ede1f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ⚠️ Warning The pull request title references 'arima' model, but the actual changes show only 'et_cds_dt' is used in CI, not 'arima'. This is misleading. Update the title to accurately reflect that only 'et_cds_dt' model is used on CI, not 'arima'. For example: 'PyCaret: On CI, only use model et_cds_dt, providing smallest MASE'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description is directly related to the changeset, explaining the problem (failing CI tests GH-296 and GH-298) and the solution (using a model with lower MASE) implemented in the code changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch amo/tame-mlflow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8208e842-75c3-4366-b8ce-870cb047a493

📥 Commits

Reviewing files that changed from the base of the PR and between 8e3582f and 452b061.

📒 Files selected for processing (2)
  • topic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.ipynb
  • topic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.py

Comment on lines 1090 to 1092
" best_models = compare_models(sort=\"MASE\",\n",
" include=[\"ets\", \"et_cds_dt\", \"naive\"],\n",
" include=[\"et_cds_dt\"],\n",
" n_select=3)\n",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm downstream usage of `best_models` to see if 3 entries are assumed.
rg -nP -C3 '\bbest_models\b' --iglob '*pycaret*'
# List PyCaret time-series model ids used in this repo for sanity-checking et_cds_dt vs ets_cds_dt.
rg -nP -C2 '\b(et|ets)_cds_dt\b' --iglob '*pycaret*'

Repository: crate/cratedb-examples

Length of output: 12423


🌐 Web query:

PyCaret time series model ids: does "et_cds_dt" and/or "ets_cds_dt" exist, and what do they refer to?

💡 Result:

Yes, "et_cds_dt" exists as a PyCaret time series model ID. It refers to "Extra Trees w/ Cond. Deseasonalize & Detrending", an Extra Trees Regressor (from scikit-learn) applied after conditional deseasonalization and detrending of the time series data. "ets_cds_dt" does not exist; "ets" is a separate model ID for the ETS (Exponential Smoothing State Space Model) forecaster, but it is not suffixed with "_cds_dt".

Citations:


Fix n_select=3 to align with the single-model include list.

With include=["et_cds_dt"], compare_models has only one candidate to rank, so n_select=3 will silently return a single-element list on CI. Downstream code iterates over best_models (lines 1163, 1660) and passes it to blend_models, which may behave differently with 1 model versus 3. Change n_select=3 to n_select=1 on the CI branch to align intent with behavior.

Proposed fix
 if "PYTEST_CURRENT_TEST" in os.environ:
     best_models = compare_models(sort="MASE",
                                  include=["et_cds_dt"],
-                                 n_select=3)
+                                 n_select=1)

@amotl amotl linked an issue Apr 24, 2026 that may be closed by this pull request
@amotl amotl changed the title MLflow: On CI, use model "ets_cds_dt" only, having the smallest MASE PyCaret: On CI, use model "ets_cds_dt" only, having the smallest MASE Apr 24, 2026
@amotl amotl changed the title PyCaret: On CI, use model "ets_cds_dt" only, having the smallest MASE PyCaret: On CI, use model "ets_cds_dt" only, providing the smallest MASE Apr 24, 2026
@amotl amotl force-pushed the amo/tame-mlflow branch 3 times, most recently from 707cbcd to 89fb1ad Compare April 25, 2026 22:42
@amotl amotl changed the title PyCaret: On CI, use model "ets_cds_dt" only, providing the smallest MASE PyCaret: On CI, only use model arima, providing a smaller MASE Apr 25, 2026
> If I look at the failed CI run, I see the the esm model has an
> incredibly high MASE and RMSSE. This mostly indicates that the
> model is not very well suited for the data. I suggested it, as
> it is very lightweight, but well, too lightweight as it seems 😓.

> To go forward, you could use a different model for the test run, one
> which has less MASE. Run the whole PyCaret model suite locally and
> select one of the top 5 models instead of `exp_smooth`.
@amotl amotl force-pushed the amo/tame-mlflow branch from 89fb1ad to d1a00b3 Compare April 25, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AutoML: CI trips with ValueError: Input contains NaN.

1 participant