Hello, I found a performance issue in the definition of cudnn_gru, MRC/BiDAF/layers.py, tf.zeros([1, batch_size, num_units]) will be created repeatedly during program execution, resulting in reduced efficiency. I think it should be created before the loop.
The same issue exist in :
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.