LightGBM on binarized dataset

beetle 🪲

Features

are binary:

1 if feature value is zero, and
0 otherwise.

Model

is lightGBM

1.6 CV
1.77 Public LB

Observation

Surprisingly this model has decent results. Our take on it is that 0 are quite important here.

Pipeline diagram

dummies_missing is Step that implements binarization (shown below). Implementation in the data_cleaning.py:L78.

class DummiesMissing(BaseTransformer):
    def __init__(self, missing_value=0):
        self.missing_value = missing_value

    def transform(self, X, **kwargs):
        missing_mask = np.where(X.values == self.missing_value, True, False)
        missing_columns = ['{}_is_missing'.format(col) for col in X.columns]
        X_is_missing = pd.DataFrame(missing_mask.astype(int), columns=missing_columns)
        return {'categorical_features': X_is_missing}

pipeline-solution-2

check our GitHub organization https://github.com/neptune-ml for more cool stuff 😃

Kamil & Kuba, core contributors

Open solutions

honey bee 🐝 LightGBM and 5fold CV
beetle 🪲 LightGBM on binarized dataset
dromedary camel 🐪 LightGBM with row aggregations
whale 🐳 LightGBM on dimension reduced dataset
water buffalo 🐃 Exploring various dimension reduction techniques
blowfish 🐡 bucketing row aggregations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LightGBM on binarized dataset

beetle 🪲

Features

Model

Observation

Pipeline diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Open solutions

Clone this wiki locally