PyCaret: On CI, only use model arima, providing a smaller MASE#300
PyCaret: On CI, only use model arima, providing a smaller MASE#300
arima, providing a smaller MASE#300Conversation
1e7f486 to
e8c65af
Compare
|
@andnig suggested at #298 (comment) that this patch is wrong, so I am closing the PR again. Thanks! |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8208e842-75c3-4366-b8ce-870cb047a493
📒 Files selected for processing (2)
topic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.ipynbtopic/machine-learning/pycaret/automl_timeseries_forecasting_with_pycaret.py
| " best_models = compare_models(sort=\"MASE\",\n", | ||
| " include=[\"ets\", \"et_cds_dt\", \"naive\"],\n", | ||
| " include=[\"et_cds_dt\"],\n", | ||
| " n_select=3)\n", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm downstream usage of `best_models` to see if 3 entries are assumed.
rg -nP -C3 '\bbest_models\b' --iglob '*pycaret*'
# List PyCaret time-series model ids used in this repo for sanity-checking et_cds_dt vs ets_cds_dt.
rg -nP -C2 '\b(et|ets)_cds_dt\b' --iglob '*pycaret*'Repository: crate/cratedb-examples
Length of output: 12423
🌐 Web query:
PyCaret time series model ids: does "et_cds_dt" and/or "ets_cds_dt" exist, and what do they refer to?
💡 Result:
Yes, "et_cds_dt" exists as a PyCaret time series model ID. It refers to "Extra Trees w/ Cond. Deseasonalize & Detrending", an Extra Trees Regressor (from scikit-learn) applied after conditional deseasonalization and detrending of the time series data. "ets_cds_dt" does not exist; "ets" is a separate model ID for the ETS (Exponential Smoothing State Space Model) forecaster, but it is not suffixed with "_cds_dt".
Citations:
- 1: https://pycaret.readthedocs.io/en/stable/api/time_series.html
- 2: https://github.com/pycaret/pycaret/blob/master/pycaret/containers/models/time_series.py
- 3: https://pycaret.readthedocs.io/en/latest/api/time_series.html
- 4: https://github.com/pycaret/pycaret/blob/master/tutorials/time_series/forecasting/univariate_without_exogeneous_part2.ipynb
Fix n_select=3 to align with the single-model include list.
With include=["et_cds_dt"], compare_models has only one candidate to rank, so n_select=3 will silently return a single-element list on CI. Downstream code iterates over best_models (lines 1163, 1660) and passes it to blend_models, which may behave differently with 1 model versus 3. Change n_select=3 to n_select=1 on the CI branch to align intent with behavior.
Proposed fix
if "PYTEST_CURRENT_TEST" in os.environ:
best_models = compare_models(sort="MASE",
include=["et_cds_dt"],
- n_select=3)
+ n_select=1)707cbcd to
89fb1ad
Compare
arima, providing a smaller MASE
> If I look at the failed CI run, I see the the esm model has an > incredibly high MASE and RMSSE. This mostly indicates that the > model is not very well suited for the data. I suggested it, as > it is very lightweight, but well, too lightweight as it seems 😓. > To go forward, you could use a different model for the test run, one > which has less MASE. Run the whole PyCaret model suite locally and > select one of the top 5 models instead of `exp_smooth`.
Problem
ValueError: Input contains NaN.#298Solution
@andnig suggested:
Thanks!
References
CellTimeoutError/ValueError: Input contains NaN.#170 (comment)ValueError: Input contains NaN.#298 (comment)