Question about nesterov method

Either I don't understand something or there is something wrong.
In the second for look the code says:
```
for idx, gradient in enumerate(model.derivate(x,y)):
            # Here we need to do a bit of gymnastic because of how the code is setup
            # We need to save the parameters state, modify it, do the simulation and then reset the parameter state
            # The update happen in the next section
            prev_weight = model.weights[idx]
            model.weights[idx] = decay_factor*gradient
            g[idx] = decay_factor*g[idx] + learning_rate*gradient
            model.weights[idx] = prev_weight         
            # Update the model parameter
            model.weights[idx] = model.weights[idx] - g[idx]
```

If you check the code you set prev_weight's value, then you set model.weights, then you never use model.weights anywhere with the new value and soon after you assign the same value of prev_weight to weights. 

My questions are:
a. Why the derivative is calculated in x,y (theta) instead of theta - gamma * v_t-1?
b. What's the purpose of changing model.weights value here if it's never used? 
c. What's the purpose of prev_weights?

What don't I understand? What am I missing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about nesterov method #42

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about nesterov method #42

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions