Skip to content
This repository was archived by the owner on Jun 22, 2022. It is now read-only.
Jakub edited this page Jun 25, 2018 · 24 revisions

Write-up

Solution 1:

  1. Preprocessing:
  • drop constant columns
  • drop duplicate columns
  • drop columns where zero over % of time
  1. Feature Extraction:
  • as is
  1. Model:
  • lightGBM raw 1.39 CV 1.43 Public LB
  • zero treated as missing

Solution 2:

  1. Preprocessing:
  2. Feature Extraction:
  • is_missing dummy table
  1. Model:
  • lightGBM is_missing 1.6 CV 1.77 Public LB

Solution 3:

  1. Preprocessing:
  2. Feature Extraction:
  • row aggregation features mean/max/min//std/count_non_zero/fraction_non_zero
  1. Model:
  • lightGBM agg 1.37 CV
  • lightGBM raw + agg 1.44 CV

Solution 4:

  1. Preprocessing:
  2. Feature Extraction:
  • truncated svd projection
  truncated_svd__n_components: 50
  truncated_svd__n_iter: 10
  • pca projection
  pca__n_components: 100
  • fast ica projection
fast_ica__n_components: 15
  • factor analysis
  factor_analysis__n_components: 50
  • gaussian random projection
  gaussian_random_projection__n_components: 100
  gaussian_projection__eps: 0.1

Note as it turns out the eps parameter doesn't matter (tried 0.01,0.1,1.0) with exact same results

  • sparse random projection
  sparse_random_projection__n_components: 50
  1. Model:
  • lightGBM truncated svd 1.56 CV
  • lightGBM pca 1.55 CV
  • lightGBM fast ica 1.57 CV
  • lightGBM factor analysis 1.51 CV
  • lightGBM gaussian random projection 1.63 CV
  • lightGBM sparse random projection 1.47 CV
  • lightGBM projections CV
  • lightGBM raw + projections CV
  • lightGBM raw + projections + aggregations CV
Clone this wiki locally