Skip to content

Question about nesterov method #42

@chrismaliszewski

Description

@chrismaliszewski

Either I don't understand something or there is something wrong.
In the second for look the code says:

for idx, gradient in enumerate(model.derivate(x,y)):
            # Here we need to do a bit of gymnastic because of how the code is setup
            # We need to save the parameters state, modify it, do the simulation and then reset the parameter state
            # The update happen in the next section
            prev_weight = model.weights[idx]
            model.weights[idx] = decay_factor*gradient
            g[idx] = decay_factor*g[idx] + learning_rate*gradient
            model.weights[idx] = prev_weight         
            # Update the model parameter
            model.weights[idx] = model.weights[idx] - g[idx]

If you check the code you set prev_weight's value, then you set model.weights, then you never use model.weights anywhere with the new value and soon after you assign the same value of prev_weight to weights.

My questions are:
a. Why the derivative is calculated in x,y (theta) instead of theta - gamma * v_t-1?
b. What's the purpose of changing model.weights value here if it's never used?
c. What's the purpose of prev_weights?

What don't I understand? What am I missing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions