diff --git a/README.md b/README.md
index d399a52..3e1094c 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ Torch takes care of our autograd needs. The documentation is available at https:
 
 To get a notion of how function learning of a dense layer network works on given data, we will first have a look at the example from the lecture. In the following task you will implement gradient descent learning of a dense neural network using `torch` and use it to learn a function, e.g. a cosine.
 
-- As a first step, create a cosine function in torch and add some noise with `torch.randn`. Use, for example, a signal length of $n = 200$ samples and a period of your choosing. This will be the noisy signal that the model is supposed to learn the underlaying cosine from.
+- Open `src/denoise_cosine.py` and go to the `__main__` function. Look at the code that is already there. You can see that a cosine function with a signal length of $n = 200$ samples has already been created in torch. In the for loop, which will be our train loop, some noise is added to the cosine function with `torch.randn`. This will be the noisy signal that the model is supposed to learn the underlying cosine from.
 
 - Recall the definition of the sigmoid function $\sigma$
 
@@ -33,7 +33,7 @@ To get a notion of how function learning of a dense layer network works on given
 ```
    where $\mathbf{W}_1\in \mathbb{R}^{m,n}, \mathbf{x}\in\mathbb{R}^n, \mathbf{b}\in\mathbb{R}^m$ and $m$ denotes the number of neurons and $n$ the input signal length. Suppose that the input parameters are stored in a [python dictonary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) with the keys `W_1`, `W_2` and `b`.   Use numpys `@` notation for the matrix product.
 
-- Use `torch.randn` to initialize your weights. For a signal length of $200$ the $W_2$ matrix should have e.g. have the shape [200, `hidden_neurons`] and $W_1$ a shape of [`hidden_neurons`, 200].
+- Use `torch.normal` to initialize your weights. This function will sample the values from a normal distribution. To ensure that the weights are not initialized too high, choose a mean of 0 and a standard deviation of 0.5. For a signal length of $200$ the $W_2$ matrix should have e.g. have the shape [200, `hidden_neurons`] and $W_1$ a shape of [`hidden_neurons`, 200].
 
 - Implement and test a squared error cost
 
@@ -52,7 +52,7 @@ C_{\text{se}} = \frac{1}{2} \sum_{k=1}^{n} (\mathbf{y}_k - \mathbf{o}_k)^2
 ```
 
 
-- In the equation above $\mathbf{W} \in \mathbb{R}$ holds for weight matrices and biases $\epsilon$ denotes the step size and $\delta$ the gradient operation with respect to the following weight.  Use a loop to repeat weight updates for multiple operations. Try to train for one hundred updates.
+- In the equation above $\mathbf{W} \in \mathbb{R}$ holds for weight matrices and biases $\epsilon$ denotes the step size and $\delta$ the gradient operation with respect to the following weight.  Use the loop to repeat weight updates for multiple operations. Try to train for one hundred updates.
 
 - At last, compute the network output `y_hat` on the final values to see if the network learned the underlying cosine function. Use `matplotlib.pyplot.plot` to plot the noisy signal and the network output $\mathbf{o}$.
 
@@ -89,7 +89,7 @@ C_{\text{ce}}(\mathbf{y},\mathbf{o})=-\frac{1}{n_b}\sum_{i=1}^{n_b}\sum_{k=1}^{n
 
 - Initialize the network with the `Net` object (see the `torch` documentation for help).
 
-- Train your network for a fixed number of `EPCOHS` over the entire dataset. Major steps in trianing loop include normalize inputs, model prediction, loss calculation, `.backward()` over loss to compute gradients, `sgd_step` and `zero_grad`. Validate model once per epoch.
+- Train your network for a fixed number of `EPOCHS` over the entire dataset. Major steps in training loop include normalizing inputs, model prediction, loss calculation, `.backward()` over loss to compute gradients, `sgd_step` and `zero_grad`. Validate model once per epoch.
     
 - When model is trained, load the test data with `test_loader` and calculate the test accuracy.
 
diff --git a/src/denoise_cosine.py b/src/denoise_cosine.py
index a663164..7a73074 100644
--- a/src/denoise_cosine.py
+++ b/src/denoise_cosine.py
@@ -6,6 +6,7 @@
 import torch as th
 from torch.func import grad_and_value
 from tqdm import tqdm
+import os
 
 
 def sigmoid(x: th.Tensor) -> th.Tensor:
@@ -69,7 +70,7 @@ def net_cost(params: Dict, x: th.Tensor, y: th.Tensor) -> th.Tensor:
     pass
     # TODO: Choose a suitable stepsize
     step_size = 0.0
-    iterations = 150
+    iterations = 100
     input_neurons = output_neurons = 200
     # TODO: Choose a proper network size.
     hidden_neurons = 0
@@ -102,6 +103,7 @@ def net_cost(params: Dict, x: th.Tensor, y: th.Tensor) -> th.Tensor:
     plt.plot(x, y_noise, label="input")
     plt.legend()
     plt.grid()
+    os.makedirs("./figures", exist_ok=True)
     plt.savefig("./figures/Denoise.png", dpi=600, bbox_inches="tight")
     plt.show()
     print("Done")