- vllm under any version mismatch with current env and if you separate eval and train. I still need a version of vllm
- in separate env, using the command below, I try different engine (4o and 4-turbo) and get some numbers dose not make sense. Have you ever try different annotators when use 4o, it give me a result where DPO>SimPO, while 4-turbo gives the opposite

alpaca_eval evaluate_from_model
--model_configs /mnt/vepfs/fs_users/***/xAI-RLHF/***/SimPO/eval/alpacaeval2/configs/Llama-3-Base-8B-SFT-SimPO.yaml
--annotators_config weighted_alpaca_eval_gpt4_turbo\