Skip to content

On behavior of lbjava when dealing with real valued features.

Xuanyu Zhou edited this page Feb 15, 2017 · 17 revisions

##Introduction

There was speculations about lbjava failing when using real-valued features.

Thanks to @Slash0BZ (Ben Zhou) we did a comprehensive experiments, running a few example problems, with different number of real-valued features, across different algorithms, and different number of iterations.

NewsGroup Experiments

Methodology

The data I used was from http://qwone.com/~jason/20Newsgroups/20news-18828.tar.gz

I downloaded and unzipped the data and used the script https://github.com/Slash0BZ/Cogcomp-Utils/blob/master/LBJava/20news_data_parser.py to move randomly selected 90% of the files (in each tag) to a training data path and the rest of them to a testing data path.

The parsed data can be read and used directly through https://github.com/Slash0BZ/lbjava/blob/master/lbjava-examples/src/main/java/edu/illinois/cs/cogcomp/lbjava/examples/DocumentReader.java

Then I first tested the original NewsGroupClassifier.lbj defined as https://github.com/Slash0BZ/lbjava/blob/master/lbjava-examples/src/main/lbj/NewsGroupClassifier.lbj

Then I added a constant real feature for each of the examples, an example is

real[] RealFeatureConstant(Document d) <- { int k = 3; sense k; }

int here is fine because lbjava parser will transform the data type to double

Then I tried multiple real features that are randomly generated by Gaussian distribution. The code I used was

import java.util.Random;
real[] GaussianRealFeatures(Document d) <- {
    List words = d.getWords();
    Random ran = new Random();
    for (int i = 0; i < words.size() - 1; i++)
        sense ran.nextGaussian() * 10;
}

I used bash scripts to run test on several algorithms.

Result Tables

NewsGroup (table for single real feature)

Condition\Algorithm SparseAveragedPerceptron SparseWinnow PassiveAggresive SparseConfidenceWeighted BinaryMIRA
1 round w/o real features 48.916 92.597 19.038 33.739
1 round w/ real features 47.753 92.491 23.268 32.364
10 rounds w/o real features 82.390 91.539 24.802 76.891
10 rounds w/ real features 82.126 91.529 12.427 75.939
50 rounds w/o real features 84.823 91.592 14.120 77.208
50 rounds w/ real features 85.299 91.433 19.566 76.891
100 rounds w/o real features 85.828 91.433 12.956 76.574
100 rounds w real features 84.770 91.486 15.442 61.026

NewsGroup (table for the same amount of Gaussian random real features as discrete ones)

Condition\Algorithm SparseAveragedPerceptron SparseWinnow PassiveAggresive BinaryMIRA
1 round w/o real features 51.454 92.597 12.057 33.739
1 round w/ real features 17.980 6.081 14.913 14.225
10 rounds w/o real features 82.813 91.539 22.369 76.891
10 rounds w/ real features 52.829 42.517 45.743
50 rounds w/o real features 84.294 91.592 21.100 77.208
50 rounds w/ real features 75.727 67.054 75.198
100 rounds w/o real features 85.506 91.433 17.768 76.574
100 rounds w real features 77.631 74.828 74.194

###Problems

  1. SparseConfidenceWeighted encountered a problem where training takes too long on my server. I had to kill the process after waiting for a long time.

  2. SparseWinnow will through a NullPointerException during the testing (testDiscrete()) process after multiple real features are added, if the training rounds are larger than 1.

Badges Experiments

Methodology

Similar to the NewsGroup examples above, the constant real features are the same.

The multiple real features are added through

real[] RealFeatures3(String line) <- {
    for(int i = 0; i < line.length(); i++){
        int k = 3;
        sense k;
    }
}

The random multiple real features are added through

real[] GaussianRealFeatures(String line) <- {
    for (int i = 0; i < line.length(); i++){
        Random ran = new Random();
        sense ran.nextGaussian() * 10;
    }
}

###Result Tables

Badges (table for single real feature)

Condition\Algorithm SparsePerceptron SparseWinnow NaiveBayes
1 round w/o real features 100.0 95.745 100.0
1 round w/ real features 100.0 95.745 100.0
10 rounds w/o real features 100.0 100.0 100.0
10 rounds w/ real features 100.0 100.0 100.0
50 rounds w/o real features 100.0 100.0 100.0
50 rounds w/ real features 100.0 100.0 100.0
100 rounds w/o real features 100.0 100.0 100.0
100 rounds w real features 100.0 100.0 100.0

Badges (table for same amount of constant real features as discrete features)

Condition\Algorithm SparsePerceptron SparseWinnow NaiveBayes
1 round w/o real features 100.0 95.745 100.0
1 round w/ real features 74.468 100.0 100.0
10 rounds w/o real features 100.0 100.0 100.0
10 rounds w/ real features 78.723 100.0 100.0
50 rounds w/o real features 100.0 100.0 100.0
50 rounds w/ real features 100.0 100.0 100.0
100 rounds w/o real features 100.0 100.0 100.0
100 rounds w real features 100.0 100.0 100.0

Badges (table for same amount of of random Gaussian real features as discrete features)

Condition\Algorithm SparsePerceptron SparseWinnow NaiveBayes
1 round w/o real features 100.0 95.745 100.0
1 round w/ real features 55.319 56.383 100.0
10 rounds w/o real features 100.0 100.0 100.0
10 rounds w/ real features 62.766 100.0 100.0
50 rounds w/o real features 100.0 100.0 100.0
50 rounds w/ real features 74.468 87.234 100.0
100 rounds w/o real features 100.0 100.0 100.0
100 rounds w real features 86.170 100.0 100.0

Conclusions

The conclusion made here is that, as more number of real-valued features are added, more training iterations are need to train the system, and there is no clear issues with real-valued features.

Clone this wiki locally