-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
Description
When training a model with the lightgbm.dask
estimators, this warning is always emitted:
/Users/runner/miniforge/envs/test-env/lib/python3.11/site-packages/lightgbm/dask.py:549: UserWarning: Parameter n_jobs will be ignored.
_log_warning(f"Parameter {param_alias} will be ignored.")
Nothing in lightgbm
's public interface can suppress this, and it shows up even when using all default values of parameters. That's a little annoying and a little confusing.... it should be changed.
Specifically... for num_threads
and its aliases, the warning should be not raised if the value is -1
or None
.
Reproducible example
import dask.array as da
import lightgbm as lgb
from distributed import Client, LocalCluster
from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=1000, n_features=50, centers=2)
cluster = LocalCluster()
client = Client(cluster)
dX = da.from_array(X, chunks=(100, 50))
dy = da.from_array(y, chunks=(100,))
dask_model = lgb.DaskLGBMClassifier(n_estimators=10)
dask_model.fit(dX, dy)
Environment info
LightGBM version or commit hash: 3654eca
Command(s) you used to install LightGBM
cmake -B build -S .
cmake --build build --target _lightgbm
sh build-python.sh install --precompile
Additional Comments
This is coming from here:
LightGBM/python-package/lightgbm/dask.py
Lines 544 to 550 in 3654eca
# Some passed-in parameters can be removed: | |
# * 'num_machines': set automatically from Dask worker list | |
# * 'num_threads': overridden to match nthreads on each Dask process | |
for param_alias in _ConfigAliases.get("num_machines", "num_threads"): | |
if param_alias in params: | |
_log_warning(f"Parameter {param_alias} will be ignored.") | |
params.pop(param_alias) |
I think because n_jobs
is an alias for num_threads
.
from lightgbm.basic import _ConfigAliases
_ConfigAliases.get("num_threads")
# {'n_jobs', 'num_threads', 'num_thread', 'nthread', 'nthreads'}
And it's guaranteed to be present in params
there, because it's in the signature of the estimators.
LightGBM/python-package/lightgbm/dask.py
Line 1135 in 3654eca
n_jobs: Optional[int] = None, |
I noticed this in CI logs: