Skip to content
This repository was archived by the owner on Jun 22, 2022. It is now read-only.

LightGBM on binarized dataset

Kamil A. Kaczmarek edited this page Jul 10, 2018 · 2 revisions

beetle 🪲

Features

are binary:

  • 1 if feature value is zero, and
  • 0 otherwise.

Model

is lightGBM

  • 1.6 CV
  • 1.77 Public LB

Observation

Surprisingly this model has decent results. Our take on it is that 0 are quite important here.

Pipeline diagram

dummies_missing is Step that implements binarization (shown below). Implementation in the data_cleaning.py:L78.

class DummiesMissing(BaseTransformer):
    def __init__(self, missing_value=0):
        self.missing_value = missing_value

    def transform(self, X, **kwargs):
        missing_mask = np.where(X.values == self.missing_value, True, False)
        missing_columns = ['{}_is_missing'.format(col) for col in X.columns]
        X_is_missing = pd.DataFrame(missing_mask.astype(int), columns=missing_columns)
        return {'categorical_features': X_is_missing}

pipeline-solution-2