Skip to content

[python-package] [dask] Dask estimators raise an unavoidable warning #6797

@jameslamb

Description

@jameslamb

Description

When training a model with the lightgbm.dask estimators, this warning is always emitted:

/Users/runner/miniforge/envs/test-env/lib/python3.11/site-packages/lightgbm/dask.py:549: UserWarning: Parameter n_jobs will be ignored.
_log_warning(f"Parameter {param_alias} will be ignored.")

Nothing in lightgbm's public interface can suppress this, and it shows up even when using all default values of parameters. That's a little annoying and a little confusing.... it should be changed.

Specifically... for num_threads and its aliases, the warning should be not raised if the value is -1 or None.

Reproducible example

import dask.array as da
import lightgbm as lgb
from distributed import Client, LocalCluster
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=1000, n_features=50, centers=2)

cluster = LocalCluster()
client = Client(cluster)

dX = da.from_array(X, chunks=(100, 50))
dy = da.from_array(y, chunks=(100,))
dask_model = lgb.DaskLGBMClassifier(n_estimators=10)
dask_model.fit(dX, dy)

Environment info

LightGBM version or commit hash: 3654eca

Command(s) you used to install LightGBM

cmake -B build -S .
cmake --build build --target _lightgbm
sh build-python.sh install --precompile

Additional Comments

This is coming from here:

# Some passed-in parameters can be removed:
# * 'num_machines': set automatically from Dask worker list
# * 'num_threads': overridden to match nthreads on each Dask process
for param_alias in _ConfigAliases.get("num_machines", "num_threads"):
if param_alias in params:
_log_warning(f"Parameter {param_alias} will be ignored.")
params.pop(param_alias)

I think because n_jobs is an alias for num_threads.

from lightgbm.basic import _ConfigAliases
_ConfigAliases.get("num_threads")
# {'n_jobs', 'num_threads', 'num_thread', 'nthread', 'nthreads'}

And it's guaranteed to be present in params there, because it's in the signature of the estimators.

n_jobs: Optional[int] = None,

I noticed this in CI logs:

https://github.com/microsoft/LightGBM/actions/runs/12922031182/job/36037058867#step:3:6718

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions