You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I wonder if it's possible to add a deduplication step before calculating MSAs for colabfold. I noticed that when generating MSAs for a large batch of alphafold2-multimer-v3 analyses, there are quite some common proteins across different protein:protein pairs, and the MSAs of those common proteins got calculated repeatedly each time they appear. For example, if I duplicate proteinA:proteinB 1000 times, then colabfold will use mmseqs to calculate the MSA of proteinA 1000 times and the MSAs of proteinB 1000 times, while generating MSA for each of proteinA/B should suffice.
Additionally, when using mmseqs with multiple GPUs, I noticed that the precalculated indices will be loaded and split across GPUs. For a 80GB A100 or H100, the entire database can fit in one GPU pretty nicely. So I wonder if it's possible to adjust how the databases are loaded into GPU based on the size of GPU memory. For example, would it be possible to keep a copy of the database in each of the A100/H100 to reduce the communication time, especially if multiple GPUs are not connected by NVLINK? Thanks!
The text was updated successfully, but these errors were encountered:
Hi, I wonder if it's possible to add a deduplication step before calculating MSAs for colabfold. I noticed that when generating MSAs for a large batch of alphafold2-multimer-v3 analyses, there are quite some common proteins across different protein:protein pairs, and the MSAs of those common proteins got calculated repeatedly each time they appear. For example, if I duplicate proteinA:proteinB 1000 times, then colabfold will use mmseqs to calculate the MSA of proteinA 1000 times and the MSAs of proteinB 1000 times, while generating MSA for each of proteinA/B should suffice.
Additionally, when using mmseqs with multiple GPUs, I noticed that the precalculated indices will be loaded and split across GPUs. For a 80GB A100 or H100, the entire database can fit in one GPU pretty nicely. So I wonder if it's possible to adjust how the databases are loaded into GPU based on the size of GPU memory. For example, would it be possible to keep a copy of the database in each of the A100/H100 to reduce the communication time, especially if multiple GPUs are not connected by NVLINK? Thanks!
The text was updated successfully, but these errors were encountered: