Skip to content

Commit 85f2dd4

Browse files
committed
use the lastest image.
1 parent b374b42 commit 85f2dd4

6 files changed

Lines changed: 39 additions & 21 deletions

File tree

.env

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,17 @@ OPERATION_MODE=auto
4545
# 自動模式下的溫度檢查間隔時間 (秒)
4646
CHECK_INTERVAL=60
4747

48+
4849
# =============================================================================
4950
# GPU 溫度監控設定 (可選)
5051
# =============================================================================
5152
# 是否啟用 GPU 溫度監控
52-
# 設為 true 時,使用決策溫度 = max(磁碟溫度, GPU溫度-20)
53+
# 設為 true 時,使用決策溫度 = max(磁碟溫度, GPU溫度 - GPU_TEMP_OFFSET)
5354
# 設為 false 時,僅使用磁碟溫度
5455
# 注意:啟用 GPU 監控需要使用 --gpus all 運行容器
55-
WITH_GPU_TEMP=false
56+
WITH_GPU_TEMP=false
57+
58+
# GPU 溫度偏移量 (攝氏度)
59+
# 用於調整 GPU 溫度對風扇控制的影響程度
60+
# 預設值為 15°C,即 GPU 溫度會減去此值後與磁碟溫度比較
61+
GPU_TEMP_OFFSET=15

.env.example

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,14 +41,22 @@ CHECK_INTERVAL=60
4141
# ===== GPU 溫度監控設定 / GPU Temperature Monitoring =====
4242
# GPU 溫度控制開關
4343
# 設定為 'true' 以啟用 GPU 溫度監控進行決策溫度計算
44-
# 決策溫度 = max(磁碟溫度, GPU溫度 - 20°C)
44+
# 決策溫度 = max(磁碟溫度, GPU溫度 - GPU_TEMP_OFFSET)
4545
# 預設值:false (僅使用磁碟溫度)
4646
# GPU temperature control flag
4747
# Set to 'true' to enable GPU temperature monitoring for decision temperature calculation
48-
# Decision Temperature = max(Disk Temperature, GPU Temperature - 20°C)
48+
# Decision Temperature = max(Disk Temperature, GPU Temperature - GPU_TEMP_OFFSET)
4949
# Default: false (only use disk temperature)
5050
WITH_GPU_TEMP=false
5151

52+
# GPU 溫度偏移量 (攝氏度)
53+
# 用於調整 GPU 溫度對風扇控制的影響程度
54+
# 預設值為 15°C,即 GPU 溫度會減去此值後與磁碟溫度比較
55+
# GPU temperature offset (Celsius)
56+
# Used to adjust GPU temperature's impact on fan control
57+
# Default is 15°C, meaning GPU temperature minus this value will be compared with disk temperature
58+
GPU_TEMP_OFFSET=15
59+
5260
# ===== ESXi 主機設定 / ESXi Host Configuration =====
5361
# ESXi 主機連線設定 - 用於取得磁碟溫度
5462
# ESXi host connection settings - Used for retrieving disk temperature

Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Dockerfile for iDRAC Fan Speed Control with GPU Support
2-
# 基於 NVIDIA 的 Ubuntu 基礎映像檔以支援 GPU 溫度監控
2+
# 基於 NVIDIA 的 Ubuntu 24.04 基礎映像檔以支援 GPU 溫度監控
33
# 如果不需要 GPU 支援,可以改用 alpine:latest
4-
FROM nvidia/cuda:11.8-base-ubuntu22.04
4+
FROM nvidia/cuda:12.9.0-runtime-ubuntu24.04
55

66
# 設定維護者資訊
77
LABEL maintainer="iDRAC Fan Control Service"

README.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
- 🎮 **GPU Temperature Support**: Optional GPU temperature monitoring with NVIDIA Container Toolkit
1313
- ⚙️ **Environment Variable Control**: Complete configuration through `.env` files
1414
- 🐳 **Docker Deployment**: One-click deployment with no manual environment setup
15-
- 🔥 **Decision Temperature Algorithm**: `max(disk_temp, gpu_temp - 20°C)` ensures optimal cooling
15+
- 🔥 **Decision Temperature Algorithm**: `max(disk_temp, gpu_temp - GPU_TEMP_OFFSET)` ensures optimal cooling
1616
- 📊 **Configurable Thresholds**: Customize temperature and fan speed settings
1717
- 🔄 **Auto/Manual Modes**: Flexible operation modes for different use cases
1818

@@ -105,22 +105,24 @@ CHECK_INTERVAL=60 # Check interval in seconds (auto mode only)
105105
When GPU temperature monitoring is enabled, the system uses this algorithm to calculate the decision temperature:
106106

107107
```text
108-
Decision Temperature = max(Disk Temperature, GPU Temperature - 20°C)
108+
Decision Temperature = max(Disk Temperature, GPU Temperature - GPU_TEMP_OFFSET)
109109
```
110110

111111
This algorithm ensures:
112112

113113
- **Proactive GPU cooling**: High GPU temperatures trigger increased fan speeds
114-
- **Temperature offset compensation**: Accounts for thermal differences between GPU and system
114+
- **Configurable offset compensation**: Adjustable offset (`GPU_TEMP_OFFSET`) accounts for thermal differences between GPU and system
115115
- **Disk temperature baseline**: Disk temperature always serves as the minimum baseline
116116

117117
### Algorithm Examples
118118

119-
| Disk Temp | GPU Temp | GPU-20 | Decision Temp | Reasoning |
119+
With default `GPU_TEMP_OFFSET=15°C`:
120+
121+
| Disk Temp | GPU Temp | GPU-15 | Decision Temp | Reasoning |
120122
|-----------|----------|--------|---------------|-----------|
121-
| 65°C | 70°C | 50°C | **65°C** | Disk temperature is higher |
122-
| 65°C | 90°C | 70°C | **70°C** | GPU-20 is higher, use adjusted GPU temp |
123-
| 75°C | 80°C | 60°C | **75°C** | Disk temperature remains baseline |
123+
| 65°C | 70°C | 55°C | **65°C** | Disk temperature is higher |
124+
| 65°C | 85°C | 70°C | **70°C** | GPU-15 is higher, use adjusted GPU temp |
125+
| 75°C | 80°C | 65°C | **75°C** | Disk temperature remains baseline |
124126

125127
## 🔧 Troubleshooting
126128

@@ -133,7 +135,7 @@ This algorithm ensures:
133135
nvidia-smi
134136

135137
# Verify Docker GPU support
136-
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
138+
docker run --rm --gpus all nvidia/cuda:12.9.0-runtime-ubuntu24.04 nvidia-smi
137139
```
138140

139141
#### Cannot Connect to ESXi Host
@@ -184,6 +186,7 @@ docker exec idrac-fan-control tail -f /var/log/fan-control/fan_control.log
184186
| `OPERATION_MODE` | auto | Operation mode (auto/manual) |
185187
| `CHECK_INTERVAL` | 60 | Check interval in seconds |
186188
| `WITH_GPU_TEMP` | false | Enable GPU temperature monitoring |
189+
| `GPU_TEMP_OFFSET` | 15 | GPU temperature offset for decision algorithm (°C) |
187190

188191
## 🐳 Docker Compose Examples
189192

USAGE_GUIDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -307,7 +307,7 @@ ssh root@your_esxi_host
307307

308308
```bash
309309
# Verify NVIDIA Container Toolkit installation
310-
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
310+
docker run --rm --gpus all nvidia/cuda:12.9.0-runtime-ubuntu24.04 nvidia-smi
311311

312312
# Check if GPU is accessible in container
313313
docker exec idrac-fan-control nvidia-smi

src/FanControlWithEsxiSmart.sh

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ CHECK_INTERVAL=${CHECK_INTERVAL:-60}
3434
# 預設為 false,只有當使用者明確設定 WITH_GPU_TEMP=true 時才啟用 GPU 溫度監控
3535
WITH_GPU_TEMP=${WITH_GPU_TEMP:-"false"}
3636

37+
GPU_TEMP_OFFSET=${GPU_TEMP_OFFSET:-15}
38+
3739
# ESXi variables
3840
ESXI_HOST=${ESXI_HOST:-"REPLACE_TO_YOUR_ESXI_HOST"}
3941
ESXI_USERNAME=${ESXI_USERNAME:-"REPLACE_TO_YOUR_ESXI_USERNAME"}
@@ -66,7 +68,7 @@ get_nvidia_temp() {
6668
}
6769

6870
# Function to calculate decision temperature
69-
# 決策溫度 = max(磁碟溫度, GPU溫度-20)
71+
# 決策溫度 = max(磁碟溫度, GPU溫度 - GPU_TEMP_OFFSET)
7072
# 這個邏輯確保風扇轉速基於較高的溫度來源進行調節
7173
get_decision_temp() {
7274
local disk_temp=$(get_drive_temp)
@@ -75,7 +77,7 @@ get_decision_temp() {
7577
# 只有在啟用 GPU 溫度監控時才考慮 GPU 溫度
7678
if [[ "$WITH_GPU_TEMP" == "true" ]]; then
7779
local gpu_temp=$(get_nvidia_temp)
78-
local gpu_adjusted_temp=$((gpu_temp - 20))
80+
local gpu_adjusted_temp=$((gpu_temp - GPU_TEMP_OFFSET))
7981

8082
# 記錄原始溫度用於除錯
8183
echo "$(date '+%Y-%m-%d %H:%M:%S') - Debug: Disk=${disk_temp}°C, GPU=${gpu_temp}°C, GPU_Adjusted=${gpu_adjusted_temp}°C" >&2
@@ -85,7 +87,7 @@ get_decision_temp() {
8587
decision_temp=$gpu_adjusted_temp
8688
fi
8789

88-
echo "$(date '+%Y-%m-%d %H:%M:%S') - Decision temperature: ${decision_temp}°C (Disk: ${disk_temp}°C, GPU-20: ${gpu_adjusted_temp}°C)" >&2
90+
echo "$(date '+%Y-%m-%d %H:%M:%S') - Decision temperature: ${decision_temp}°C (Disk: ${disk_temp}°C, GPU-${GPU_TEMP_OFFSET}: ${gpu_adjusted_temp}°C)" >&2
8991
else
9092
echo "$(date '+%Y-%m-%d %H:%M:%S') - Decision temperature: ${decision_temp}°C (Disk only mode)" >&2
9193
fi
@@ -159,10 +161,9 @@ manual_mode() {
159161

160162
# Function to run in automatic mode
161163
auto_mode() {
162-
echo "Automatic fan speed control mode"
163-
if [[ "$WITH_GPU_TEMP" == "true" ]]; then
164+
echo "Automatic fan speed control mode" if [[ "$WITH_GPU_TEMP" == "true" ]]; then
164165
echo "GPU temperature monitoring enabled - using decision temperature logic"
165-
echo "Decision Temperature = max(Disk Temperature, GPU Temperature - 20°C)"
166+
echo "Decision Temperature = max(Disk Temperature, GPU Temperature - ${GPU_TEMP_OFFSET}°C)"
166167
else
167168
echo "GPU temperature monitoring disabled - using disk temperature only"
168169
fi

0 commit comments

Comments
 (0)