-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
The basic example provided here does not seem to work because the output was always 0:
https://cs.stanford.edu/people/karpathy/convnetjs/docs.html
Proof:
I tried changing this line:
var reward = action === 0 ? 1.0 : 0.0;
into:
var reward = action === 1 ? 1.0 : 0.0;
*** and got the same result which is 0
Code Example:
/START CODE/
var brain = new deepqlearn.Brain(3, 2); // 3 inputs, 2 possible outputs (0,1)
var state = [Math.random(), Math.random(), Math.random()];
for(var k=0;k<10000;k++) {
var action = brain.forward(state); // returns index of chosen action
var reward = action === 0 ? 1.0 : 0.0;
brain.backward([reward]); // <-- learning magic happens here
state[Math.floor(Math.random()*3)] += Math.random()*2-0.5;
}
brain.epsilon_test_time = 0.0; // don't make any more random choices
brain.learning = false;
// get an optimal action from the learned policy
var action = brain.forward(state);
/END CODE/