Dear authors,
Thank you very much for sharing the ProtTrans models and for all your work on this project.
I’m trying to better understand the exact datasets used to train ProtBert and ProtBert-BFD, specifically the releases of the protein databases involved:
- For ProtBert, which UniRef100 release was used?
 
- For ProtBert-BFD, could you clarify which release of the BFD database was used?
 
Thanks again, and looking forward to your response!
Best regards,
Alberto