Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions scripts/enumerate_test_intranode.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
#!/bin/bash

SKN_PWD=""

# 默认值
SKIP_BUILD=false

TEMP=$(getopt -o sw:t:h --long skip-build -n "$0" -- "$@")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The getopt string sw:t:h declares options w:, t:, and h which are not handled in the case statement below. If a user provides any of these options, the script will fail with an "Invalid option" error. Please remove the unhandled options from the getopt string or add handlers for them.

Suggested change
TEMP=$(getopt -o sw:t:h --long skip-build -n "$0" -- "$@")
TEMP=$(getopt -o s --long skip-build -n "$0" -- "$@")

if [ $? != 0 ]; then
echo "Terminating..." >&2
exit 1
fi

eval set -- "$TEMP"

while true; do
case "$1" in
-s|--skip-build)
SKIP_BUILD=true
shift
;;
--)
shift
break
;;
*)
echo "Invalid option: $1" >&2
show_help
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The show_help function is called here but is not defined in the script. This will cause a "command not found" error when this branch is executed. Please define the show_help function. A simple implementation could be:

show_help() {
    echo "Usage: $0 [-s|--skip-build]"
}

exit 1
;;
esac
done

# 切换目录
cd "${SKN_PWD}" || { echo "Directory not found: ${SKN_PWD}"; exit 1; }

# 条件构建
if [ "$SKIP_BUILD" = false ]; then
echo ">>> Building package..."
bash build.sh -a deepep || { echo "Build failed!"; exit 1; }
pip uninstall -y deep-ep
pip install ./output/deep_ep-*.whl || { echo "Install failed!"; exit 1; }
else
echo ">>> Skipping build and install (--skip-build)"
fi

# 进入测试目录
cd ./tests/python/deepep || { echo "Test directory not found"; exit 1; }

# 设置环境变量
export HCCL_BUFFSIZE=4096
# 设置 Ascend 环境
source /usr/local/Ascend/ascend-toolkit/set_env.sh

#遍历test_intranode.py
# 设置参数范围
NUM_PROCESSES_LIST_=(4 8 16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable name NUM_PROCESSES_LIST_ ends with an underscore, which is inconsistent with other list variables in the script (e.g., NUM_TOKENS_LIST). For better code consistency and readability, consider renaming it to NUM_PROCESSES_LIST.

Suggested change
NUM_PROCESSES_LIST_=(4 8 16)
NUM_PROCESSES_LIST=(4 8 16)

NUM_TOKENS_LIST=(1024 2048 4096)
HIDDEN_LIST=(4096 7168)
NUM_TOPK_LIST=(4 8)
NUM_EXPERTS_LIST=(64 128 256)
ACTIVE_RANKS_LIST=("" "0,1" "0,2,3")
ENABLE_DIAGNOSE_LIST=("false" "true")

SCRIPT="test_intranode.py"

# 遍历所有组合
for NUM_PROCESSES in "${NUM_PROCESSES_LIST[@]}"; do
for NUM_TOKENS in "${NUM_TOKENS_LIST[@]}"; do
for HIDDEN in "${HIDDEN_LIST[@]}"; do
for NUM_TOPK in "${NUM_TOPK_LIST[@]}"; do
for NUM_EXPERTS in "${NUM_EXPERTS_LIST[@]}"; do
for ACTIVE_RANKS in "${ACTIVE_RANKS_LIST[@]}"; do
for ENABLE_DIAGNOSE in "${ENABLE_DIAGNOSE_LIST[@]}"; do

# 构建命令
CMD="python3 $SCRIPT \
--num-processes $NUM_PROCESSES \
--num-tokens $NUM_TOKENS \
--hidden $HIDDEN \
--num-topk $NUM_TOPK \
--num-experts $NUM_EXPERTS"

# 添加可选参数
if [ -n "$ACTIVE_RANKS" ]; then
CMD="$CMD --active-ranks \"$ACTIVE_RANKS\""
fi

if [ "$ENABLE_DIAGNOSE" == "true" ]; then
CMD="$CMD --enable-diagnose"
fi

# 打印并执行命令
echo "Running: $CMD"
eval $CMD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using eval can be a security risk if the command string is constructed from untrusted input. While it seems safe in this context as the parameters are from predefined lists, it's a good practice to avoid eval. A safer alternative is to build an array of command arguments and execute it directly.


echo "--------------------------------------------------"

done
done
done
done
done
done
done
103 changes: 103 additions & 0 deletions scripts/enumerate_test_low_latency.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#!/bin/bash

SKN_PWD=""

# 默认值
SKIP_BUILD=false

TEMP=$(getopt -o sw:t:h --long skip-build -n "$0" -- "$@")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The getopt string sw:t:h declares options w:, t:, and h which are not handled in the case statement below. If a user provides any of these options, the script will fail with an "Invalid option" error. Please remove the unhandled options from the getopt string or add handlers for them.

Suggested change
TEMP=$(getopt -o sw:t:h --long skip-build -n "$0" -- "$@")
TEMP=$(getopt -o s --long skip-build -n "$0" -- "$@")

if [ $? != 0 ]; then
echo "Terminating..." >&2
exit 1
fi

eval set -- "$TEMP"

while true; do
case "$1" in
-s|--skip-build)
SKIP_BUILD=true
shift
;;
--)
shift
break
;;
*)
echo "Invalid option: $1" >&2
show_help
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The show_help function is called here but is not defined in the script. This will cause a "command not found" error when this branch is executed. Please define the show_help function. A simple implementation could be:

show_help() {
    echo "Usage: $0 [-s|--skip-build]"
}

exit 1
;;
esac
done

# 切换目录
cd "${SKN_PWD}" || { echo "Directory not found: ${SKN_PWD}"; exit 1; }

# 条件构建
if [ "$SKIP_BUILD" = false ]; then
echo ">>> Building package..."
bash build.sh -a deepep || { echo "Build failed!"; exit 1; }
pip uninstall -y deep-ep
pip install ./output/deep_ep-*.whl || { echo "Install failed!"; exit 1; }
else
echo ">>> Skipping build and install (--skip-build)"
fi

# 进入测试目录
cd ./tests/python/deepep || { echo "Test directory not found"; exit 1; }

# 设置环境变量
export HCCL_BUFFSIZE=4096
# 设置 Ascend 环境
source /usr/local/Ascend/ascend-toolkit/set_env.sh

#遍历test_low_latency.py
# 设置参数范围
NUM_PROCESSES_LIST=(4 8 16)
NUM_TOKENS_LIST=(128 256 512)
HIDDEN_LIST=(4096 7168)
NUM_TOPK_LIST=(4 8)
NUM_EXPERTS_LIST=(64 128 256)

SCRIPT="test_low_latency.py"

# 遍历所有组合
for NUM_PROCESSES in "${NUM_PROCESSES_LIST[@]}"; do
for NUM_TOKENS in "${NUM_TOKENS_LIST[@]}"; do
for HIDDEN in "${HIDDEN_LIST[@]}"; do
for NUM_TOPK in "${NUM_TOPK_LIST[@]}"; do
for NUM_EXPERTS in "${NUM_EXPERTS_LIST[@]}"; do
for ACTIVE_RANKS in "${ACTIVE_RANKS_LIST[@]}"; do
for ENABLE_DIAGNOSE in "${ENABLE_DIAGNOSE_LIST[@]}"; do
Comment on lines +24 to +25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The script attempts to loop over ACTIVE_RANKS_LIST and ENABLE_DIAGNOSE_LIST, but these arrays are not defined in this script. This will cause the inner loops to be skipped silently, meaning a significant portion of the intended tests will not run. This is a critical bug, likely from a copy-paste error. Please either define these arrays with appropriate values for test_low_latency.py or remove the loops and the corresponding logic that uses ACTIVE_RANKS and ENABLE_DIAGNOSE variables.


# 构建命令
CMD="python3 $SCRIPT \
--num-processes $NUM_PROCESSES \
--num-tokens $NUM_TOKENS \
--hidden $HIDDEN \
--num-topk $NUM_TOPK \
--num-experts $NUM_EXPERTS"

# 添加可选参数
if [ -n "$ACTIVE_RANKS" ]; then
CMD="$CMD --active-ranks \"$ACTIVE_RANKS\""
fi

if [ "$ENABLE_DIAGNOSE" == "true" ]; then
CMD="$CMD --enable-diagnose"
fi

# 打印并执行命令
echo "Running: $CMD"
eval $CMD

echo "--------------------------------------------------"

done
done
done
done
done
done
done