Add BLEU, ROUGE metrics to pythainlp.benchmarks

Now, we needs to preprocessing word tokenization before using sacrebleu and rouge_score to calculating BLEU/ROUGE Score for Thai language. I think it is a monotonous work. So I think we should have calculating BLEU, ROUGE metrics to pythainlp.benchmarks.