Hi, Thank's for this work. It's really usefull. Why you defined the score function as (line 66)?? self.logp / float(self.leng - 1 + 1e-6) + alpha * reward Can you explain this definition? Thank's, Best