On behavior of lbjava when dealing with real valued features.

Mixed feature experiments on LBJava examples

There was speculations about lbjava failing when using real-valued features.

Thanks to @Slash0BZ (Ben Zhou) we did a comprehensive experiments, running a few example problems, with different number of real-valued features, across different algorithms, and different number of iterations.

NewsGroup Experiments

Methodology

The data I used was from http://qwone.com/~jason/20Newsgroups/20news-18828.tar.gz

I downloaded and unzipped the data and used the script https://github.com/Slash0BZ/Cogcomp-Utils/blob/master/LBJava/20news_data_parser.py to move randomly selected 90% of the files (in each tag) to a training data path and the rest of them to a testing data path.

The parsed data can be read and used directly through https://github.com/Slash0BZ/lbjava/blob/master/lbjava-examples/src/main/java/edu/illinois/cs/cogcomp/lbjava/examples/DocumentReader.java

Then I first tested the original NewsGroupClassifier.lbj defined as https://github.com/Slash0BZ/lbjava/blob/master/lbjava-examples/src/main/lbj/NewsGroupClassifier.lbj

Then I added a constant real feature for each of the examples, an example is

real[] RealFeatureConstant(Document d) <- { int k = 3; sense k; }

int here is fine because lbjava parser will transform the data type to double

Then I tried multiple real features that are randomly generated by Gaussian distribution. The code I used was

import java.util.Random;
real[] GaussianRealFeatures(Document d) <- {
    List words = d.getWords();
    Random ran = new Random();
    for (int i = 0; i < words.size() - 1; i++)
        sense ran.nextGaussian() * 10;
}

I used bash scripts to run test on several algorithms. I wrote an python script that modifies the lbj file to make things easier. The script can be found at https://github.com/Slash0BZ/Cogcomp-Utils/blob/master/LBJava/modify_lbj_file.py

Result Tables

NewsGroup (table for single real feature)

Condition\Algorithm	SparseAveragedPerceptron	SparseWinnow	PassiveAggresive	BinaryMIRA
1 round w/o real features	48.916	92.597	19.038	33.739
1 round w/ real features	47.753	92.491	23.268	32.364
10 rounds w/o real features	82.390	91.539	24.802	76.891
10 rounds w/ real features	82.126	91.529	12.427	75.939
50 rounds w/o real features	84.823	91.592	14.120	77.208
50 rounds w/ real features	85.299	91.433	19.566	76.891
100 rounds w/o real features	85.828	91.433	12.956	76.574
100 rounds w real features	84.770	91.486	15.442	61.026

NewsGroup (table for the same amount of Gaussian random real features as discrete ones)

Condition\Algorithm	SparseAveragedPerceptron	SparseWinnow	PassiveAggresive	BinaryMIRA
1 round w/o real features	51.454	92.597	12.057	33.739
1 round w/ real features	17.980	6.081	14.913	14.225
10 rounds w/o real features	82.813	91.539	22.369	76.891
10 rounds w/ real features	52.829		42.517	45.743
50 rounds w/o real features	84.294	91.592	21.100	77.208
50 rounds w/ real features	75.727		67.054	75.198
100 rounds w/o real features	85.506	91.433	17.768	76.574
100 rounds w real features	77.631		74.828	74.194

###Problems

SparseConfidenceWeighted encountered a problem where training takes too long on my server. I had to kill the process after waiting for a long time.
SparseWinnow will through a NullPointerException during the testing (testDiscrete()) process after multiple real features are added, if the training rounds are larger than 1.

Badges Experiments

Methodology

Similar to the NewsGroup examples above, the constant real features are the same.

The multiple real features are added through

real[] RealFeatures3(String line) <- {
    for(int i = 0; i < line.length(); i++){
        int k = 3;
        sense k;
    }
}

The random multiple real features are added through

real[] GaussianRealFeatures(String line) <- {
    for (int i = 0; i < line.length(); i++){
        Random ran = new Random();
        sense ran.nextGaussian() * 10;
    }
}

###Result Tables

Badges (table for single real feature)

Condition\Algorithm	SparsePerceptron	SparseWinnow	NaiveBayes
1 round w/o real features	100.0	95.745	100.0
1 round w/ real features	100.0	95.745	100.0
10 rounds w/o real features	100.0	100.0	100.0
10 rounds w/ real features	100.0	100.0	100.0
50 rounds w/o real features	100.0	100.0	100.0
50 rounds w/ real features	100.0	100.0	100.0
100 rounds w/o real features	100.0	100.0	100.0
100 rounds w real features	100.0	100.0	100.0

Badges (table for same amount of constant real features as discrete features)

Condition\Algorithm	SparsePerceptron	SparseWinnow	NaiveBayes
1 round w/o real features	100.0	95.745	100.0
1 round w/ real features	74.468	100.0	100.0
10 rounds w/o real features	100.0	100.0	100.0
10 rounds w/ real features	78.723	100.0	100.0
50 rounds w/o real features	100.0	100.0	100.0
50 rounds w/ real features	100.0	100.0	100.0
100 rounds w/o real features	100.0	100.0	100.0
100 rounds w real features	100.0	100.0	100.0

Badges (table for same amount of of random Gaussian real features as discrete features)

Condition\Algorithm	SparsePerceptron	SparseWinnow	NaiveBayes
1 round w/o real features	100.0	95.745	100.0
1 round w/ real features	55.319	56.383	100.0
10 rounds w/o real features	100.0	100.0	100.0
10 rounds w/ real features	62.766	100.0	100.0
50 rounds w/o real features	100.0	100.0	100.0
50 rounds w/ real features	74.468	87.234	100.0
100 rounds w/o real features	100.0	100.0	100.0
100 rounds w real features	86.170	100.0	100.0

Conclusions

The conclusion made here is that, as more number of real-valued features are added, more training iterations are need to train the system, and there is no clear issues with real-valued features.

Mix feature experiments on POS Tagger

Introduction

The goal of this experiment is to add the 25 dimension real valued phrase similarity vector defined at https://gitlab-beta.engr.illinois.edu/cogcomp/illinois-phrasesim/blob/master/src/main/java/edu/illinois/cs/cogcomp/sim/PhraseSim.java for each word to the end of the original feature vector in POSTaggerKnown defined at https://github.com/Slash0BZ/illinois-cogcomp-nlp/blob/master/pos/src/main/java/edu/illinois/cs/cogcomp/pos/lbjava/POSTaggerKnown.java

Methodology

Since the code that are supposed to be generated by LBJava through .lbj files has been modified manually to a certain extent, and many of the changes lack documentations, I was unable to directly modify https://github.com/Slash0BZ/illinois-cogcomp-nlp/blob/master/pos/src/main/lbj/POSKnown.lbj and replace the original classes with the newly generated classes. So I took another approach. I first generated a paragamVector class, which are generated from code


real[] paragamVector(Token w) <- {
    PhraseSim ps = null;
    try{
        ps = PhraseSim.getInstance();
    }
    catch (FileNotFoundException e){

    }
    double[] vec = ps.getVector(w.form);
    for(int i = 0; i < vec.length; i++){
        sense vec[i];
    }
}

The generated class can be found at https://github.com/Slash0BZ/illinois-cogcomp-nlp/blob/master/pos/src/main/java/edu/illinois/cs/cogcomp/pos/lbjava/paragamVector.java

I manually added some code so that this code handles the case when a word do not have corresponding new vector.

Then I modified the code in POSTaggerKnown$$1.java, which is the extractor for the POSTaggerKnown class. The code can be found at https://github.com/Slash0BZ/illinois-cogcomp-nlp/blob/master/pos/src/main/java/edu/illinois/cs/cogcomp/pos/lbjava/POSTaggerKnown%24%241.java

Further with some changes to make the code work on my local machine, POSTrain was able to be ran without errors.

Results

rounds\features	without real features	with real features	Difference
50	96.525	96.420	0.105
100	96.600	96.556	0.044
150	96.599	96.568	0.031
200	96.610	96.588	0.022
250	96.613	96.593	0.02
300	96.610	96.596	0.014
350	96.609	96.593	0.016
400	96.604	96.583	0.021
450	96.598	96.587	0.011
500	96.589	96.582	0.007

Above is the results table. I ran two threads using different model files and the same data files. Column 2 is the result of POS performances after introducing the real features produced from phrasesim mentioned in the introduction section. Column 3 is the result of the original feature vector defined on the Github page of POS.

Everything else except for the feature vector is the same across these two different trails.

Some observations: the accuracy of both settings start to drop at round 250 to 300, and the accuracy differences of these two settings kept dropping until both settings are overfitting. However, there is no sign of the new feature vector with the phrasesim features has a better performance than the original features.

Result validation

I also validated the result to make sure that the new 25 dimension vector is added to the feature vector. I first checked the size of lexicon and confirmed there is a 25 growth in size, which is from 51359 unique features to 51384 unique features for each example.

Then I also checked the values of features, and confirmed that the discrete features have value 1.0 / 0.0 and the real features have a double number as value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On behavior of lbjava when dealing with real valued features.

Mixed feature experiments on LBJava examples

NewsGroup Experiments

Methodology

Result Tables

Badges Experiments

Methodology

Conclusions

Mix feature experiments on POS Tagger

Introduction

Methodology

Results

Result validation

Conclusions

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally