-
Notifications
You must be signed in to change notification settings - Fork 91
Description
当我运行python inference.py mypdb.fasta data/pdb_mmcif/mmcif_files/
--use_precomputed_alignments ./alignments
--output_dir ./
--gpus 4
--model_preset multimer
--uniref90_database_path data/uniref90/uniref90.fasta
--mgnify_database_path data/mgnify/mgy_clusters_2022_05.fa
--pdb70_database_path data/pdb70/pdb70
--uniref30_database_path data/uniref30/UniRef30_2021_03
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
--uniprot_database_path data/uniprot/uniprot.fasta
--pdb_seqres_database_path data/pdb_seqres/pdb_seqres.txt
--param_path data/params/params_model_1_multimer_v3.npz
--model_name model_1_multimer_v3
--jackhmmer_binary_path which jackhmmer
--hhblits_binary_path which hhblits
--hhsearch_binary_path which hhsearch
--kalign_binary_path which kalign
--enable_workflow
--inplace
报错running in multimer mode...
[01/26/24 20:11:10] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 0 is bound to device 0
[01/26/24 20:11:10] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:521 set_device
[01/26/24 20:11:10] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 3 is bound to device 3
[01/26/24 20:11:10] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:521 set_device
INFO colossalai - colossalai - INFO: process rank 2 is bound to device 2
INFO colossalai - colossalai - INFO: process rank 1 is bound to device 1
[01/26/24 20:11:12] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:557 set_seed
[01/26/24 20:11:12] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:557 set_seed
[01/26/24 20:11:12] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 2, numpy:
1024, python random: 1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1026,the default parallel seed is
ParallelMode.DATA.
INFO colossalai - colossalai - INFO: initialized seed on rank 3, numpy:
1024, python random: 1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1027,the default parallel seed is
ParallelMode.DATA.
INFO colossalai - colossalai - INFO: initialized seed on rank 1, numpy:
1024, python random: 1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1025,the default parallel seed is
ParallelMode.DATA.
[01/26/24 20:11:12] INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/context/parallel_context.py:557 set_seed
INFO colossalai - colossalai - INFO: initialized seed on rank 0, numpy:
1024, python random: 1024, ParallelMode.DATA: 1024,
ParallelMode.TENSOR: 1024,the default parallel seed is
ParallelMode.DATA.
INFO colossalai - colossalai - INFO:
/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/colo
ssalai/initialize.py:116 launch
INFO colossalai - colossalai - INFO: Distributed environment is
initialized, data parallel size: 1, pipeline parallel size: 1, tensor
parallel size: 4
Traceback (most recent call last):
File "inference.py", line 556, in
main(args)
File "inference.py", line 164, in main
inference_multimer_model(args)
File "inference.py", line 293, in inference_multimer_model
torch.multiprocessing.spawn(inference_model, nprocs=args.gpus, args=(args.gpus, result_q, batch, args))
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/khuang/video/FastFold-main/inference.py", line 151, in inference_model
out = model(batch)
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/khuang/video/FastFold-main/fastfold/model/hub/alphafold.py", line 522, in forward
outputs, m_1_prev, z_prev, x_prev = self.iteration(
File "/home/khuang/video/FastFold-main/fastfold/model/hub/alphafold.py", line 209, in iteration
else self.input_embedder(feats)
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/khuang/video/FastFold-main/fastfold/model/nn/embedders_multimer.py", line 141, in forward
tf_emb_i = self.linear_tf_z_i(tf)
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/khuang/anaconda3/envs/fastfold/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
有遇到过的嘛