Skip to content

Having Issues re training the model with the existing data provided #6

@mujeebarshad

Description

@mujeebarshad

I am getting the following error on re training the uspto data that has been provided in the link: https://drive.google.com/drive/folders/1lZOLRGyZy18EVow7gyxtKWvs_yuwlIE3?usp=sharing

The atoms count error shouldn't appear at all since the data is the existing one that the model is already trained on. Any thought?

[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/bin/unicore-train", line 8, in <module>
[rank0]:     sys.exit(cli_main())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore_cli/train.py", line 418, in cli_main
[rank0]:     distributed_utils.call_main(args, main)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/distributed/utils.py", line 186, in call_main
[rank0]:     distributed_main(int(os.environ["LOCAL_RANK"]), main, args, kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/distributed/utils.py", line 160, in distributed_main
[rank0]:     main(args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore_cli/train.py", line 105, in main
[rank0]:     extra_state, epoch_itr = checkpoint_utils.load_checkpoint(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/checkpoint_utils.py", line 223, in load_checkpoint
[rank0]:     extra_state, epoch_itr = trainer.load_checkpoint(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/trainer.py", line 433, in load_checkpoint
[rank0]:     epoch_itr = self.get_train_iterator(
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/trainer.py", line 508, in get_train_iterator
[rank0]:     self.reset_dummy_batch(batch_iterator.first_batch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/iterators.py", line 243, in first_batch
[rank0]:     return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/iterators.py", line 243, in <listcomp>
[rank0]:     return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/base_wrapper_dataset.py", line 18, in __getitem__
[rank0]:     return self.dataset[index]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/nested_dictionary_dataset.py", line 69, in __getitem__
[rank0]:     return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unicore/data/nested_dictionary_dataset.py", line 69, in <genexpr>
[rank0]:     return OrderedDict((k, ds[index]) for k, ds in self.defn.items())
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/molecule_dataset.py", line 116, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, idx)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/molecule_dataset.py", line 121, in __getitem_cached__
[rank0]:     data = self.dataset[idx]
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/key_dataset.py", line 27, in __getitem__
[rank0]:     return self.__cached_item__(idx, self.epoch)
[rank0]:   File "/usr/local/lib/python3.10/dist-packages/unimol/data/key_dataset.py", line 24, in __cached_item__
[rank0]:     return self.dataset[idx][self.key]
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 132, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 137, in __getitem_cached__
[rank0]:     reactant = self.reactant_dataset[index]
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 67, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/graphormer_dataset.py", line 71, in __getitem_cached__
[rank0]:     smiles = self.dataset[index]
[rank0]:   File "/content/NAG2G/NAG2G/data/random_smiles_dataset.py", line 75, in __getitem__
[rank0]:     return self.__getitem_cached__(self.epoch, index)
[rank0]:   File "/content/NAG2G/NAG2G/data/random_smiles_dataset.py", line 85, in __getitem_cached__
[rank0]:     nm = Chem.RenumberAtoms(reactant_mol, list_reactant)
[rank0]: ValueError: atomCounts shorter than the number of atoms

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions