-
Notifications
You must be signed in to change notification settings - Fork 9
Model Feed forward and Back propagation
I'm gonna explain in detail how model_tensor_input_ff and model_tensor_input_bp work!
The function:
void model_tensor_input_ff(model* m, int tensor_depth, int tensor_i, int tensor_j, float* input)
Description:
/* This function computes the feed-forward for a model m. each layer at the index l makes the feed-forward
* for the first layer at the index l-1. if the input is a 1d array then you should split its dimension
* in 3 dimension to turn the input in a tensor, for example:
* I have an input array of legth 59, then i can split this in 3 dimensions: depth = 1, row = 1, cols = 59
*
* Input:
*
* @ model* m:= the model with the layers
* @ int tensor_depth:= the depth of the input tensor
* @ int tensor_i:= the number of rows of the tensor
* @ int tensor_j:= the number of columns of the tensor
* @ float* input:= your input array
*
* */
If there is no model passed as parameter the function returns without doing anything.
if(m == NULL)
return;
k1, k2 and k3 are useful during the feed forward to remember respectively what fully-connected, convolutional, residual layer we have reached in the model.
int i,j,z,w,count,count2,z2,k1 = 0, k2 = 0, k3 = 0;
There are 4 fundamental functions used during the feed forward: fcl_fcl_ff, fcl_cl_ff, cl_fcl_ff, cl_cl_ff. These functions compute respectively the feed forward between fully-connected and fully-connected layers, fully-connected and convolutional layers, convolutional and fully-connected layers, convolutional and convolutional layers, handling all the cases where there is DROPOUT, NO DROPOUT, DROPOUT_TEST, CONVOLUTION, NO_CONVOLUTION, PADDING, POOLING, NORMALIZATION. So, to make the things esier the input is put inside a convolutional layer (temp) where is copied in temp->post_activation. the activation flag is set to SIGMOID just to say to the next layer for the feed forward, Ehy, there is an activation in this convolutional layer, so look inside the temp->post_activation array for your input.
/* Setting the input inside a convolutional structure*/
cl* temp = (cl*)malloc(sizeof(cl));
temp->post_activation = (float*)malloc(sizeof(float)*tensor_depth*tensor_i*tensor_j);
temp->normalization_flag = NO_NORMALIZATION;
temp->pooling_flag = NO_POOLING;
temp->activation_flag = SIGMOID;
temp->n_kernels = tensor_depth;
temp->rows1 = tensor_i;
temp->cols1 = tensor_j;
copy_array(input,temp->post_activation,tensor_depth*tensor_i*tensor_j);
There is a double cycle "for". There is a double cycle but the time is set to O(m->layers). The double cycle is set only because you can have multiple layers at the same level, for example you can have 2 layers with the layer param set to the same number. you must pay attention to this form, because the feed forward is computed only considering as input the first layer before the actual one. is not recommended having multiple layers at the same level, if you want a ramification for the feed forward and backpropagation is recommended using multiple models. The sla matrix says: Ehy i'm a matrix of m->layers*m->layers, a generic sla[i][j] is set to 0 if there are no layers, otherwise is set to FCLS, CLS or RLS flag, here an example:
model* m is a model with 6 total layers, 2 convolutional layers inside residual layer, 2 convolutional layers and 2 fully-connected layers, then, sla[0][0] = RLS, sla[0][1] = 0, sla[0][2] = 0, sla[0][3] = 0, sla[0][4] = 0, sla[0][5] = 0, sla[1][0] = RLS, sla[1][1] = 0, sla[1][2] = 0, sla[1][3] = 0, sla[1][4] = 0, sla[1][5] = 0, sla[2][0] = CLS, sla[2][1] = 0, sla[2][2] = 0, sla[2][3] = 0, sla[2][4] = 0, sla[2][5] = 0, sla[3][0] = CLS, sla[3][1] = 0, sla[3][2] = 0, sla[3][3] = 0, sla[3][4] = 0, sla[3][5] = 0, sla[4][0] = FCLS, sla[4][1] = 0, sla[4][2] = 0, sla[4][3] = 0, sla[4][4] = 0, sla[4][5] = 0, sla[5][0] = FCLS, sla[5][1] = 0, sla[5][2] = 0, sla[5][3] = 0, sla[5][4] = 0, sla[5][5] = 0,
/* apply the feed forward to the model*/
for(i = 0; i < m->layers; i++){
for(j = 0; j < m->layers && m->sla[i][j] != 0; j++){
First of all we handle the case where the input is stored in temp and not in a layer inside the model:
if(!i)
If we are in a fully connected layer, first of all we check that, if this fully connected layer is a layer of the model with the softmax activation function and is not the last layer in the model, we return an error, because the softmax function should be set only as the last function in a model, then we compute the feed forward between temp and the actual layer. After that we can increase k1 to say: ehy we have reached the first fully-connected layer of the model, so in the future we can consider the second one.
if(m->sla[i][j] == FCLS){
if(m->fcls[k1]->activation_flag == SOFTMAX && i != m->layers-1 && m->sla[i+1][0] != 0){
fprintf(stderr,"Error: the softmax can be applied only on the last fully-connected layers\n");
exit(1);
}
ff_cl_fcl(temp,m->fcls[k1]);
k1++;
}
Same if we reach a convolutional layer, the difference is that the softmax function should not be applied and we increase k2 instead of k1.
else if(m->sla[i][j] == CLS){
if(m->cls[k2]->activation_flag == SOFTMAX){
fprintf(stderr,"Error: the softmax can be applied only on the last fully-connected layers\n");
exit(1);
}
ff_cl_cl(temp,m->cls[k2]);
k2++;
}
If we have reached a residual layer (convolutional layer inside residual structure) we store in z what residual structure of the model we have reached and k3 - count will be considered the convolutional layer inside that residual structure. Let's make an example. We have 2 residual layers each one with 3 convolutional layer and we have already reached the 5° convolutional layer of these 6 convolutional layers, then k3 is set to 5. Now at the 6° convolutional layer we enter in this state (else if) and after the cycle count will be set to 6 and z to 2. then z--, so z = 1 and count = 6-3 = 3, then k3 - count = 5 -3 = 2. so m->rls[z]->cls[k3-count] = last convolutional layer of the second residual layer of the model => the 6° total convolutional layer of the residual layers.
else if(m->sla[i][j] == RLS){
count = 0;
for(z = 0; z < m->n_rl && count <= k3; z++){
count+=m->rls[z]->n_cl;
}
z--;
count-=m->rls[z]->n_cl;
If this convolutional layer is the first layer of this residual layer, we must save the Input to add in a second time to the output of the last convolutional layer of this residual one.
if(k3-count == 0)
copy_array(temp->post_activation,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
Then we can compute the feed forward between temp and this convolutional layer of the residual one. After that we check if we are at the last convolutional layer of the residual structure (k3-count = m->rls[z]->n_cl-1). In that case we add in the m->rls[z]->cl_output->pre_activation array the addition between the m->rls[z]->input vector and the output of the last convolutional layer. After that, if we have an activation function in m->rls[z]->cl_output->activation_flag we compute that activation function and store the result in cl_output->post_activation. then we increase k3.
ff_cl_cl(temp,m->rls[z]->cls[k3-count]);
if(k3-count == m->rls[z]->n_cl-1){
if(m->rls[z]->cls[k3-count]->pooling_flag)
sum1D(m->rls[z]->input,m->rls[z]->cls[k3-count]->post_pooling,m->rls[z]->cl_output->pre_activation,m->rls[z]->cls[k3-count]->n_kernels*m->rls[z]->cls[k3-count]->rows2*m->rls[z]->cls[k3-count]->cols2);
else if(m->rls[z]->cls[k3-count]->normalization_flag)
sum1D(m->rls[z]->input,m->rls[z]->cls[k3-count]->post_normalization,m->rls[z]->cl_output->pre_activation,m->rls[z]->cls[k3-count]->n_kernels*m->rls[z]->cls[k3-count]->rows1*m->rls[z]->cls[k3-count]->cols1);
else if(m->rls[z]->cls[k3-count]->activation_flag)
sum1D(m->rls[z]->input,m->rls[z]->cls[k3-count]->post_activation,m->rls[z]->cl_output->pre_activation,m->rls[z]->cls[k3-count]->n_kernels*m->rls[z]->cls[k3-count]->rows1*m->rls[z]->cls[k3-count]->cols1);
else
sum1D(m->rls[z]->input,m->rls[z]->cls[k3-count]->pre_activation,m->rls[z]->cl_output->pre_activation,m->rls[z]->cls[k3-count]->n_kernels*m->rls[z]->cls[k3-count]->rows1*m->rls[z]->cls[k3-count]->cols1);
if(m->rls[z]->cl_output->activation_flag == LEAKY_RELU)
leaky_relu_array(m->rls[z]->cl_output->pre_activation,m->rls[z]->cl_output->post_activation, m->rls[z]->cl_output->n_kernels*m->rls[z]->cl_output->rows1*m->rls[z]->cl_output->cols1);
else if(m->rls[z]->cl_output->activation_flag == RELU)
relu_array(m->rls[z]->cl_output->pre_activation,m->rls[z]->cl_output->post_activation, m->rls[z]->cl_output->n_kernels*m->rls[z]->cl_output->rows1*m->rls[z]->cl_output->cols1);
else if(m->rls[z]->cl_output->activation_flag == SIGMOID)
sigmoid_array(m->rls[z]->cl_output->pre_activation,m->rls[z]->cl_output->post_activation, m->rls[z]->cl_output->n_kernels*m->rls[z]->cl_output->rows1*m->rls[z]->cl_output->cols1);
else if(m->rls[z]->cl_output->activation_flag == TANH)
tanhh_array(m->rls[z]->cl_output->pre_activation,m->rls[z]->cl_output->post_activation, m->rls[z]->cl_output->n_kernels*m->rls[z]->cl_output->rows1*m->rls[z]->cl_output->cols1);
}
k3++;
Then we consider the case where the input doesn't come from temp, but from a layer we have already met
else{
Like before we check if we are in a fully-connected, convolutional or residual layer, and then we consider the layer before as input. for the layer before we must also check if is fully-connected, convolutional or residual as well. For the cycle to take the right convolutional layer inside the right residual structure we consider k3-1 because if we have already met it, then k3 has been increased. Since we are in a fully connected one, we can assume that if the last layer was a RLS then that residual structure is ended and we consider as input cl_output
if(m->sla[i][j] == FCLS){
if(m->fcls[k1]->activation_flag == SOFTMAX && i != m->layers-1 && m->sla[i+1][0] != 0){
fprintf(stderr,"Error: the softmax can be applied only on the last fully-connected layers\n");
exit(1);
}
if(m->sla[i-1][0] == FCLS){
ff_fcl_fcl(m->fcls[k1-1],m->fcls[k1]);
}
else if(m->sla[i-1][0] == CLS){
ff_cl_fcl(m->cls[k2-1],m->fcls[k1]);
}
if(m->sla[i-1][0] == RLS){
count2 = 0;
for(z2 = 0; z2 < m->n_rl && count2 <= k3-1; z2++){
count2+=m->rls[z2]->n_cl;
}
z2--;
count2-=m->rls[z2]->n_cl;
ff_cl_fcl(m->rls[z2]->cl_output,m->fcls[k1]);
}
k1++;
}
Same if we are in a convolutional layer
else if(m->sla[i][j] == CLS){
if(m->cls[k2]->activation_flag == SOFTMAX){
fprintf(stderr,"Error: the softmax can be applied only on the last fully-connected layers\n");
exit(1);
}
if(m->sla[i-1][0] == FCLS){
ff_fcl_cl(m->fcls[k1-1],m->cls[k2]);
}
else if(m->sla[i-1][0] == CLS){
ff_cl_cl(m->cls[k2-1],m->cls[k2]);
}
if(m->sla[i-1][0] == RLS){
count2 = 0;
for(z2 = 0; z2 < m->n_rl && count2 <= k3-1; z2++){
count2+=m->rls[z2]->n_cl;
}
z2--;
count2-=m->rls[z2]->n_cl;
ff_cl_cl(m->rls[z2]->cl_output,m->cls[k2]);
}
k2++;
}
If we are in a residual layer and the layer before was a fully connected one, then we must consider different ways to handle the feed forward. First of all if we are at the beginning of the residual layer and the layer before was a fully connected layer with dropout and activation function then we compute the dropout between the mask and the post activation array of the fully-connected layer, storing the result in the input vector of the residual structure. if there isn't an activation function, the dropout is applied between the dropout mask and the pre activation array. Otherwise, if there is no dropout, we copy the output of the fully connected layer in the input vector of the residual structure, then we compute the feed forward
if(m->sla[i-1][0] == FCLS){
if(k3-count == 0){
if(m->fcls[k1-1]->dropout_flag){
if(m->fcls[k1-1]->activation_flag){
dot1D(m->fcls[k1-1]->post_activation,m->fcls[k1-1]->dropout_mask,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
else{
dot1D(m->fcls[k1-1]->pre_activation,m->fcls[k1-1]->dropout_mask,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
}
else{
if(m->fcls[k1-1]->activation_flag){
copy_array(m->fcls[k1-1]->post_activation,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
else{
copy_array(m->fcls[k1-1]->pre_activation,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
}
}
ff_fcl_cl(m->fcls[k1-1],m->rls[z]->cls[k3-count]);
}
In the case that the last layer was a convolutional layer we consider if the output comes from post normalization, post pooling, post activation or pre activation, and we copy it in the input vector of the residual structure.
else if(m->sla[i-1][0] == CLS){
if(k3-count == 0){
if(m->cls[k2-1]->pooling_flag){
copy_array(m->cls[k2-1]->post_pooling,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
else if(m->cls[k2-1]->normalization_flag){
copy_array(m->cls[k2-1]->post_normalization,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
else if(m->cls[k2-1]->activation_flag){
copy_array(m->cls[k2-1]->post_activation,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
else{
copy_array(m->cls[k2-1]->pre_activation,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
}
ff_cl_cl(m->cls[k2-1],m->rls[z]->cls[k3-count]);
}
If the last layer was a residual layer, if we are at the beginning of the actual residual layer, we store the output in the input vector of the current residual structure, then we compute the feed forward
if(m->sla[i-1][0] == RLS){
count2 = 0;
for(z2 = 0; z2 < m->n_rl && count2 <= k3-1; z2++){
count2+=m->rls[z2]->n_cl;
}
z2--;
count2-=m->rls[z2]->n_cl;
if(k3-count == 0){
if(m->rls[z2]->cl_output->activation_flag)
copy_array(m->rls[z2]->cl_output->post_activation,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
else
copy_array(m->rls[z2]->cl_output->pre_activation,m->rls[z]->input,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
if(z2!=z){
ff_cl_cl(m->rls[z2]->cl_output,m->rls[z]->cls[k3-count]);
}
else{
ff_cl_cl(m->rls[z2]->cls[k3-1-count2],m->rls[z]->cls[k3-count]);
}
}
Then we add the output of the last convolutional layer with the input vector of the residual structure, storing the new output in cl_output. After that we increase k3.
if(k3-count == m->rls[z]->n_cl-1){
if(m->rls[z]->cls[k3-count]->pooling_flag)
sum1D(m->rls[z]->input,m->rls[z]->cls[k3-count]->post_pooling,m->rls[z]->cl_output->pre_activation,m->rls[z]->cls[k3-count]->n_kernels*m->rls[z]->cls[k3-count]->rows2*m->rls[z]->cls[k3-count]->cols2);
else if(m->rls[z]->cls[k3-count]->normalization_flag)
sum1D(m->rls[z]->input,m->rls[z]->cls[k3-count]->post_normalization,m->rls[z]->cl_output->pre_activation,m->rls[z]->cls[k3-count]->n_kernels*m->rls[z]->cls[k3-count]->rows1*m->rls[z]->cls[k3-count]->cols1);
else if(m->rls[z]->cls[k3-count]->activation_flag)
sum1D(m->rls[z]->input,m->rls[z]->cls[k3-count]->post_activation,m->rls[z]->cl_output->pre_activation,m->rls[z]->cls[k3-count]->n_kernels*m->rls[z]->cls[k3-count]->rows1*m->rls[z]->cls[k3-count]->cols1);
else
sum1D(m->rls[z]->input,m->rls[z]->cls[k3-count]->pre_activation,m->rls[z]->cl_output->pre_activation,m->rls[z]->cls[k3-count]->n_kernels*m->rls[z]->cls[k3-count]->rows1*m->rls[z]->cls[k3-count]->cols1);
if(m->rls[z]->cl_output->activation_flag == LEAKY_RELU)
leaky_relu_array(m->rls[z]->cl_output->pre_activation,m->rls[z]->cl_output->post_activation, m->rls[z]->cl_output->n_kernels*m->rls[z]->cl_output->rows1*m->rls[z]->cl_output->cols1);
else if(m->rls[z]->cl_output->activation_flag == RELU)
relu_array(m->rls[z]->cl_output->pre_activation,m->rls[z]->cl_output->post_activation, m->rls[z]->cl_output->n_kernels*m->rls[z]->cl_output->rows1*m->rls[z]->cl_output->cols1);
else if(m->rls[z]->cl_output->activation_flag == SIGMOID)
sigmoid_array(m->rls[z]->cl_output->pre_activation,m->rls[z]->cl_output->post_activation, m->rls[z]->cl_output->n_kernels*m->rls[z]->cl_output->rows1*m->rls[z]->cl_output->cols1);
else if(m->rls[z]->cl_output->activation_flag == TANH)
tanhh_array(m->rls[z]->cl_output->pre_activation,m->rls[z]->cl_output->post_activation, m->rls[z]->cl_output->n_kernels*m->rls[z]->cl_output->rows1*m->rls[z]->cl_output->cols1);
}
k3++;
At the end of the function we deallocate the initial resources
free(temp->post_activation);
free(temp);
The function:
model_tensor_input_bp(model* m, int tensor_depth, int tensor_i, int tensor_j, float* input, float* error, int error_dimension);
Description:
/* This function computes the back-propagation for a model m. each first layer at the index l makes the backprop
* from the first layer at the index l+1. if the input is a 1d array then you should split its dimension
* in 3 dimension to turn the input in a tensor, for example:
* I have an input array of length 59, then i can split this in 3 dimensions: depth = 1, row = 1, cols = 59
*
* Input:
*
* @ model* m:= the model with the layers
* @ int tensor_depth:= the depth of the input tensor
* @ int tensor_i:= the number of rows of the tensor
* @ int tensor_j:= the number of columns of the tensor
* @ float* input:= your input array
* @ float* error:= the error of the last layer of the last function computed
* @ int error_dimension:= the dimension of the float* error vector
*
* */
For the first 20 lines is the same of the feed forward function
if(m == NULL)
return NULL;
int i,j,z,w,count,count2,z2,k1 = m->n_fcl, k2 = m->n_cl, k3 = 0;
for(i = 0; i < m->n_rl; i++){
k3+=m->rls[i]->n_cl;
}
/* Setting the input inside a convolutional structure*/
cl* temp = (cl*)malloc(sizeof(cl));
temp->post_activation = (float*)malloc(sizeof(float)*tensor_depth*tensor_i*tensor_j);
temp->normalization_flag = NO_NORMALIZATION;
temp->pooling_flag = NO_POOLING;
temp->activation_flag = SIGMOID;
temp->n_kernels = tensor_depth;
temp->rows1 = tensor_i;
temp->cols1 = tensor_j;
copy_array(input,temp->post_activation,tensor_depth*tensor_i*tensor_j);
The double cycle is the same of the feed forward but starts from the end
for(i = m->layers-1; i >= 0; i--){
for(j = 0; j < 1 && m->sla[i][j] != 0; j++){
Handling first the case where we are at the initial layer of the network
if(!i)
we compute the backpropagation according to our current layer, that can be fully connected, convolutional and residual. In the case of residual layer, the error coming from above is the error of the addition between the final convolutional output of the residual layer and the input, at the end we sum the errors.
if(m->sla[i][j] == FCLS){
k1--;
if(m->fcls[k1]->activation_flag == SOFTMAX && i != m->layers-1 && m->sla[i+1][0] != 0){
fprintf(stderr,"Error: the softmax can be applied only on the last fully-connected layers\n");
exit(1);
}
error1 = bp_cl_fcl(temp,m->fcls[k1],error1);
}
else if(m->sla[i][j] == CLS){
k2--;
if(m->cls[k2]->activation_flag == SOFTMAX){
fprintf(stderr,"Error: the softmax can be applied only on the last fully-connected layers\n");
exit(1);
}
error1 = bp_cl_cl(temp,m->cls[k2],error1);
}
else if(m->sla[i][j] == RLS){
k3--;
count = 0;
for(z = 0; z < m->n_rl && count <= k3; z++){
count+=m->rls[z]->n_cl;
}
z--;
count-=m->rls[z]->n_cl;
if(k3-count == m->rls[z]->n_cl-1){
error_residual = error1;
}
if(m->rls[z]->cls[k3-count]->activation_flag == SOFTMAX){
fprintf(stderr,"Error: the softmax can be applied only on the last fully-connected layers\n");
exit(1);
}
error1 = bp_cl_cl(temp,m->rls[z]->cls[k3-count],error1);
if(k3-count == 0)
sum1D(error1,error_residual,error1,m->rls[z]->channels*m->rls[z]->input_rows*m->rls[z]->input_cols);
}
So pay attention: if you have a model that ends with a residual layer, the model tensor input bp function will consider the error passed as parameter, as the error of the addition between the last convolutional layer and the input of the residual layer, instead of the error of the activation function (if there is anyone) of that addition. So what you need is taking the error that you want to pass as parameter, and if the residual layer computes an activation function after the addition you have to compute the new error manually, for example: the residual layer computes the RELU function? you take your error and
derivative_relu_array(m->rls[z2]->cl_output->pre_activation,m->rls[z2]->cl_output->temp3,m->rls[z2]->cl_output->n_kernels*m->rls[z2]->cl_output->rows1*m->rls[z2]->cl_output->cols1);
dot1D(m->rls[z2]->cl_output->temp3,error,new_error,m->rls[z2]->cl_output->n_kernels*m->rls[z2]->cl_output->rows1*m->rls[z2]->cl_output->cols1);
(PROBLEM FIXED ON 5/12/2019)
other possible issue:
-
you should not free the returning errors of the model, 'cause they are arrays associated with the layer structures.
-
when computing the back propagation to a model you should pass the DL/da error, except in the case where you have softmax as final function, in that case you can pass the right output