Skip to content
This repository was archived by the owner on Jun 22, 2022. It is now read-only.
Jakub edited this page Jun 26, 2018 · 24 revisions

Write-up

Solution 1:

Validation: 5-fold CV with folds generated via custom:

class KFoldByTargetValue(BaseCrossValidator):
 ...

where data is sorted by target value and then observations are put to seperate folds one by one. It is implemented in the src/utils.py

  1. Preprocessing:
  • drop constant columns
  • drop duplicate columns
  • drop columns where zero over % of time
  1. Feature Extraction:
  • as is
  1. Model:
  • lightGBM raw 1.39 CV 1.43 Public LB
  • zero treated as missing

Solution 2:

  1. Preprocessing:
  2. Feature Extraction:
  • is_missing dummy table
  1. Model:
  • lightGBM is_missing 1.6 CV 1.77 Public LB

Solution 3:

  1. Preprocessing:
  2. Feature Extraction:
  • row aggregation features mean/max/min//std/count_non_zero/fraction_non_zero
  1. Model:
  • lightGBM agg 1.36 CV 1.48 LB
  • lightGBM raw + agg 1.35 CV 1.41 LB

Solution 4:

  1. Preprocessing:
  2. Feature Extraction:
  • truncated svd projection
  truncated_svd__n_components: 50
  truncated_svd__n_iter: 10
  • pca projection
  pca__n_components: 100
  • fast ica projection
fast_ica__n_components: 15
  • factor analysis
  factor_analysis__n_components: 50
  • gaussian random projection
  gaussian_random_projection__n_components: 50
  gaussian_projection__eps: 0.1

Note as it turns out the eps parameter doesn't matter (tried 0.01,0.1,1.0) with exact same results

  • sparse random projection
  sparse_random_projection__n_components: 50
  1. Model:
  • lightGBM truncated svd 1.56 CV
  • lightGBM pca 1.55 CV
  • lightGBM fast ica 1.57 CV
  • lightGBM factor analysis 1.51 CV
  • lightGBM gaussian random projection 1.63 CV
  • lightGBM sparse random projection 1.47 CV
  • lightGBM projections (all) 1.47 CV
  • lightGBM projections best (sparse random projection + factor analysis + truncated svd + fast-ica) 1.448 CV
  • lightGBM projections second best (sparse random projection) 1.452 CV
  • lightGBM raw + projections (second best) CV
  • lightGBM projections (second best) + aggregations 1.345 CV
  • lightGBM raw + projections (second best) + aggregations 1.3416 CV
Clone this wiki locally