Training script #51

nviolante25 · 2025-04-07T15:24:01Z

From #42

…ra into training

…ss logging of images, add mask and plucer to VanillaCFG

…ning

…e more images than poses

kvuong2711 · 2025-04-21T19:10:52Z

Thanks for the PR. I'm trying to test your PR out, but kept running into issues with float16/float32, therefore the attention function crashed out. I'm running python main.py --base configs/example_training/seva-clipl_dl3dv.yaml. I'm also happy to contribute to the PR if you have some clues on where to fix potentially fix this.

nviolante25 · 2025-04-22T08:41:31Z

Hi @kvuong2711
I'm having the same issue, the only solution so far was to disable with sdpa_kernel(SDPBackend.FLASH_ATTENTION) . Not sure that is causing it to fail in the first place. I thought it could be the torch version but unfortunately changing it didn't work.

jensenz-sai · 2025-04-25T12:04:46Z

@kvuong2711 @nviolante25,

Yup, it's a bit tricky to set up flash attention for mixed-precision training. You could replace it with xformer's attention here implemented in generative-models codebase.

Nicolas Violante and others added 4 commits April 4, 2025 17:37

[ADD] sgm training code, loading code for DL2DV and initial config

d3df1e9

[ADD] SevaAutoenoderKL (wrapper to handle frames with image autoencoder)

f54d092

[ADD] SevaWrapper

907ac4a

[ADD] correct naming and remove print

17d4df6

nviolante25 mentioned this pull request Apr 7, 2025

Training set up questions #42

Closed

Nicolas Violante and others added 8 commits April 8, 2025 09:21

[FIX] correct output in dataset

dd9fe71

Merge branch 'training' of github.com:nviolante25/stable-virtual-came…

4245b41

…ra into training

[ADD] conditioning in plucker and binary mask, mask out target latents

efc7732

[FIX] remove input frames from return dict

6273420

[FIX] correct shapes in wrapper, and dataset (needs to be checked) pa…

fe6bcf2

…ss logging of images, add mask and plucer to VanillaCFG

[ADD] loss weighting

31f8c33

[ADD] average CLIP embedding of input frames

d3c6532

[ADD] better names in loss weight

4b7675f

nviolante25 force-pushed the training branch from 846fec5 to 4b7675f Compare April 11, 2025 09:21

nviolante25 and others added 11 commits April 14, 2025 10:40

[FIX] don't pop the mask

5f43f47

[FIX] repeat cross attention

9325dea

[ADD] precompute latents from SD 2.1

f79e2e7

[ADD] replace condition and correct camera normalization

3f42ad7

Merge remote-tracking branch 'refs/remotes/origin/training' into trai…

36eb7c1

…ning

[FIX] replace in conditioner

618dd5e

[FIX] correct dimension to retrieve conditionings

51920b8

[ADD] update config

57b12f8

[ADD] load SEVA checkpoint

9eff550

[FIX] upper idx when sampling adjacent frames

ebff52a

[FIX] verify that colmap poses correspond to the images, there couldb…

1b5dcaf

…e more images than poses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training script #51

Training script #51

nviolante25 commented Apr 7, 2025

kvuong2711 commented Apr 21, 2025 •

edited

Loading

nviolante25 commented Apr 22, 2025

jensenz-sai commented Apr 25, 2025 •

edited

Loading

Training script #51

Are you sure you want to change the base?

Training script #51

Conversation

nviolante25 commented Apr 7, 2025

kvuong2711 commented Apr 21, 2025 • edited Loading

nviolante25 commented Apr 22, 2025

jensenz-sai commented Apr 25, 2025 • edited Loading

kvuong2711 commented Apr 21, 2025 •

edited

Loading

jensenz-sai commented Apr 25, 2025 •

edited

Loading