Vulkan crash on AMD RDNA1 (RX 5500 XT) during buffer initialization

## Summary
whisper.cpp crashes with `VK_ERROR_DEVICE_LOST` on AMD RX 5500 XT (RDNA1/Navi14) when using Vulkan backend. The crash occurs during KV cache initialization before any inference happens.

## Environment
- **GPU:** AMD Radeon RX 5500 XT (gfx1012, NAVI14, 8GB VRAM)
- **Driver:** RADV (Mesa 24.2.8-1ubuntu1~24.04.1)
- **Vulkan:** 1.3.289
- **OS:** Linux Mint 22.1 (Ubuntu 24.04 based), kernel 6.8.0
- **whisper.cpp:** Latest master (commit f53dc748)

## Steps to Reproduce
```bash
git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_VULKAN=1
cmake --build build -j
./models/download-ggml-model.sh base.en
./build/bin/whisper-cli -m models/ggml-base.en.bin -f samples/jfk.wav
```

## Expected Behavior
Whisper transcribes the audio using the Vulkan GPU backend.

## Actual Behavior
Crash with error:
```
radv/amdgpu: The CS has been rejected, see dmesg for more information (-22).
terminate called after throwing an instance of 'vk::DeviceLostError'
  what():  vk::Queue::submit: ErrorDeviceLost
```

## Stack Trace
```
#6  ggml_vk_submit(std::shared_ptr<vk_context_struct>&, vk::Fence)
#7  ggml_vk_buffer_memset(std::shared_ptr<vk_buffer_struct>&, unsigned long, unsigned int, unsigned long)
#8  whisper_kv_cache_init(whisper_kv_cache&, ggml_backend*, ggml_type, long, long, int)
#9  whisper_init_state()
#10 whisper_init_from_file_with_params()
```

## Root Cause Analysis

The crash occurs in `ggml_vk_buffer_memset()` at ggml-vulkan.cpp:6588:
```cpp
subctx->s->buffer.fillBuffer(dst->buffer, offset, size, c);
```

This function uses the **transfer queue** (`dst->device->transfer_queue.cmd_pool`) to execute `vkCmdFillBuffer`. On RDNA1, the transfer queue (SDMA engine) appears to reject this command with EINVAL (-22).

### Key observations:
1. The GPU is correctly detected as RDNA1: `ggml_vulkan: 0 = Radeon RX 5500 XT (RADV NAVI14)`
2. The crash happens before any compute shaders run
3. Error code -22 (EINVAL) suggests invalid parameters or unsupported operation
4. CPU mode works fine with `--no-gpu`

## Attempted Workarounds (all failed)
- `--no-flash-attn` - still crashes
- `GGML_VK_DISABLE_ASYNC=1` - still crashes
- `GGML_VK_PREFER_HOST_MEMORY=1` - still crashes (not UMA, so still uses GPU path)
- `VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation` - still crashes
- `RADV_PERFTEST=transfer_queue=0` - still crashes

## Potential Fix

The issue may be that `fillBuffer` on the dedicated transfer queue doesn't work correctly on RDNA1. Possible solutions:

1. **Use compute queue for fillBuffer on RDNA1:**
   Modify `ggml_vk_buffer_memset()` to use `compute_queue` instead of `transfer_queue` when `device->architecture == vk_device_architecture::AMD_RDNA1`

2. **Add CPU fallback for non-UMA discrete GPUs:**
   The current code only uses CPU memset when `eHostVisible && uma`. For discrete GPUs, could allocate a staging buffer and use CPU memset + copy.

3. **Use vkCmdUpdateBuffer instead:**
   For small buffers, `vkCmdUpdateBuffer` might work better on the transfer queue.

## Related Issues
- Similar RADV crashes on RDNA1: https://github.com/felipeagc/amd_repro
- whisper.cpp AMD issues: https://github.com/ggml-org/whisper.cpp/issues/2828
- Vulkan broken for AMD in v1.8.0+: https://github.com/ggml-org/whisper.cpp/issues/3455

## System Info
```
$ vulkaninfo --summary | grep -A5 "GPU0"
GPU0:
    apiVersion         = 1.3.289
    driverVersion      = 24.2.8
    vendorID           = 0x1002
    deviceID           = 0x7340
    deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
    deviceName         = Radeon RX 5500 XT (RADV NAVI14)
```

## Note
- This appears to be specific to the SDMA/transfer queue on RDNA1


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan crash on AMD RDNA1 (RX 5500 XT) during buffer initialization #3611

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Stack Trace

Root Cause Analysis

Key observations:

Attempted Workarounds (all failed)

Potential Fix

Related Issues

System Info

Note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vulkan crash on AMD RDNA1 (RX 5500 XT) during buffer initialization #3611

Description

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Stack Trace

Root Cause Analysis

Key observations:

Attempted Workarounds (all failed)

Potential Fix

Related Issues

System Info

Note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions