This repository was archived by the owner on Jun 22, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 21
LightGBM on binarized dataset
Kamil A. Kaczmarek edited this page Jul 10, 2018
·
2 revisions
are binary:
-
1
if feature value is zero, and -
0
otherwise.
is lightGBM
- 1.6 CV
- 1.77 Public LB
Surprisingly this model has decent results. Our take on it is that 0 are quite important here.
dummies_missing
is Step that implements binarization (shown below). Implementation in the data_cleaning.py:L78.
class DummiesMissing(BaseTransformer):
def __init__(self, missing_value=0):
self.missing_value = missing_value
def transform(self, X, **kwargs):
missing_mask = np.where(X.values == self.missing_value, True, False)
missing_columns = ['{}_is_missing'.format(col) for col in X.columns]
X_is_missing = pd.DataFrame(missing_mask.astype(int), columns=missing_columns)
return {'categorical_features': X_is_missing}
check our GitHub organization https://github.com/neptune-ml for more cool stuff 😃
Kamil & Kuba, core contributors
- honey bee 🐝 LightGBM and 5fold CV
- beetle 🪲 LightGBM on binarized dataset
- dromedary camel 🐪 LightGBM with row aggregations
- whale 🐳 LightGBM on dimension reduced dataset
- water buffalo 🐃 Exploring various dimension reduction techniques
- blowfish 🐡 bucketing row aggregations