DQN can't find a good policy

According your advice, I switched to Stable-Baselines instead of openAI baseline in the Kundur system training.

```
def main(learning_rate, env):
    tf.reset_default_graph()  
    graph = tf.get_default_graph()

    model = DQN(CustomDQNPolicy, env, learning_rate=learning_rate, verbose=0)
    callback = SaveOnBestTrainingRewardCallback(check_freq=1000, storedData=storedData)
    time_steps = 900000
    model.learn(total_timesteps=int(time_steps), callback=callback)

    print("Saving final model to: " + savedModel + "/" + model_name + "_lr_%s_90w.pkl" % (str(learning_rate)))
    model.save(savedModel + "/" + model_name + "_lr_%s_90w.pkl" % (str(learning_rate)))
```
However after 900000 steps of training DQN agent cannot find a good policy. Please see average reward progress plot 

[https://www.dropbox.com/preview/DQN_adaptivenose.png?role=personal](url)

I used the following env settings
```
case_files_array.append(folder_dir +'/testData/Kundur-2area/kunder_2area_ver30.raw')
case_files_array.append(folder_dir+'/testData/Kundur-2area/kunder_2area.dyr')
dyn_config_file = folder_dir+'/testData/Kundur-2area/json/kundur2area_dyn_config.json'
rl_config_file = folder_dir+'/testData/Kundur-2area/json/kundur2area_RL_config_multiStepObsv.json'
```
Mu suggestion is that in the baseline scenario `kunder_2area_ver30.raw` (without system loading), short circuit might not lead to loss of stability during the simulation. Therefore, (perhaps) DQN agent finds a "no action" policy, that so as not to receive the actionPenalty = 2.0. Because according the reward progress plot, during training agent cannot find a policy better than mean reward 603.05. When testing, mean_reward = 603.05 means "no action" policy  (please see figure bellow)

[https://www.dropbox.com/preview/no%20actions%20case.png?role=personal](url)

However it's only my suggestion, I can wrong. I thought to try scenarios with increasing load in order to get for sure loss of stability during simulation.

_Originally posted by @frostyduck in https://github.com/RLGC-Project/RLGC/issues/9#issuecomment-642406121_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DQN can't find a good policy #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DQN can't find a good policy #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions