-
Notifications
You must be signed in to change notification settings - Fork 31
Description
According your advice, I switched to Stable-Baselines instead of openAI baseline in the Kundur system training.
def main(learning_rate, env):
tf.reset_default_graph()
graph = tf.get_default_graph()
model = DQN(CustomDQNPolicy, env, learning_rate=learning_rate, verbose=0)
callback = SaveOnBestTrainingRewardCallback(check_freq=1000, storedData=storedData)
time_steps = 900000
model.learn(total_timesteps=int(time_steps), callback=callback)
print("Saving final model to: " + savedModel + "/" + model_name + "_lr_%s_90w.pkl" % (str(learning_rate)))
model.save(savedModel + "/" + model_name + "_lr_%s_90w.pkl" % (str(learning_rate)))
However after 900000 steps of training DQN agent cannot find a good policy. Please see average reward progress plot
https://www.dropbox.com/preview/DQN_adaptivenose.png?role=personal
I used the following env settings
case_files_array.append(folder_dir +'/testData/Kundur-2area/kunder_2area_ver30.raw')
case_files_array.append(folder_dir+'/testData/Kundur-2area/kunder_2area.dyr')
dyn_config_file = folder_dir+'/testData/Kundur-2area/json/kundur2area_dyn_config.json'
rl_config_file = folder_dir+'/testData/Kundur-2area/json/kundur2area_RL_config_multiStepObsv.json'
Mu suggestion is that in the baseline scenario kunder_2area_ver30.raw
(without system loading), short circuit might not lead to loss of stability during the simulation. Therefore, (perhaps) DQN agent finds a "no action" policy, that so as not to receive the actionPenalty = 2.0. Because according the reward progress plot, during training agent cannot find a policy better than mean reward 603.05. When testing, mean_reward = 603.05 means "no action" policy (please see figure bellow)
https://www.dropbox.com/preview/no%20actions%20case.png?role=personal
However it's only my suggestion, I can wrong. I thought to try scenarios with increasing load in order to get for sure loss of stability during simulation.
Originally posted by @frostyduck in #9 (comment)