Skip to content

Commit 292f678

Browse files
arjun37602Arjun Balaji
andauthored
Terminal bench manual trigger github action (#2312)
* half way thru tb set up * tb yaml * TB * tb bench scripts * trigger on push change * typo * trigger push * python v * run local branch instead of prod * unnecessary env variables removed * allow log inspection * implement _env * npm, rust, cargo, clone github directly * Update setup_amazon_q.sh * clean up disk + check amt of free space * get gcc dependencies * big timeout * pipe config files from gh runner to docker * configure env + working with sso * changed default * default to latest * fixing qchat location + forcing correct auth * set env vars not just config file * env vars all caps * confirm env vairables are visible * roleName + code simplify * environment variable fix + local working * use the correct git hash * larger runner for storage * use full hash instead of short hash * fail if hash invalid * Force to run on manual trigger * responding to PR comments --------- Co-authored-by: Arjun Balaji <arjbal@amazon.com>
1 parent e6f5e99 commit 292f678

File tree

3 files changed

+194
-0
lines changed

3 files changed

+194
-0
lines changed

.github/workflows/terminal-bench.yaml

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# This is a terminal-bench workflow that is manually triggered
2+
# Template taken from https://github.com/actions/starter-workflows/blob/main/automation/manual.yml for reference
3+
4+
name: Terminal-Bench
5+
6+
# Controls when the action will run. Workflow runs when manually triggered using the UI
7+
on:
8+
workflow_dispatch:
9+
inputs:
10+
name:
11+
description: 'Run terminal-bench workflow to test Q CLI in real terminal environments.'
12+
default: 'all'
13+
required: true
14+
type: string
15+
16+
jobs:
17+
run-benchmark:
18+
# avoids disk storage issues
19+
runs-on: ubuntu-latest-8-cores
20+
# makes these env vars available in main.py
21+
env:
22+
CHAT_DOWNLOAD_ROLE_ARN: ${{ secrets.CHAT_DOWNLOAD_ROLE_ARN }}
23+
CHAT_BUILD_BUCKET_NAME: ${{ secrets.CHAT_BUILD_BUCKET_NAME }}
24+
permissions:
25+
id-token: write
26+
contents: read
27+
steps:
28+
29+
# clear unnecessary storage to ensure docker containers have space
30+
- name: Cleanup and free disk space
31+
run: |
32+
sudo rm -rf /usr/share/dotnet
33+
sudo rm -rf /opt/ghc
34+
sudo rm -rf "/usr/local/share/boost"
35+
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
36+
sudo rm -rf /usr/local/lib/android
37+
sudo rm -rf /usr/share/swift
38+
sudo apt-get clean
39+
df -h
40+
41+
- name: Checkout repository
42+
uses: actions/checkout@v4
43+
44+
# Captures git hash of branch to query specific S3 bucket
45+
- name: Set git hash
46+
run: |
47+
if [ -n "$GITHUB_SHA" ]; then
48+
git_hash=$(git rev-parse "$GITHUB_SHA")
49+
else
50+
git_hash="latest"
51+
fi
52+
# appends to github_env file
53+
echo "GIT_HASH=$git_hash" >> $GITHUB_ENV
54+
echo "Git hash set to: $git_hash"
55+
56+
- name: Set up Python
57+
uses: actions/setup-python@v4
58+
with:
59+
python-version: '3.13'
60+
61+
- name: Install dependencies
62+
run: |
63+
python -m pip install --upgrade pip
64+
pip install terminal-bench
65+
66+
# OIDC enabled for github for ArjunPersonal
67+
- name: Configure AWS credentials
68+
uses: aws-actions/configure-aws-credentials@v4
69+
with:
70+
role-to-assume: ${{ secrets.AWS_TB_ROLE }}
71+
aws-region: us-east-1
72+
73+
- name: Run terminal benchmark
74+
run: |
75+
cd terminal-bench-test
76+
tb run --agent-import-path main:AmazonQCLIAgent --dataset-name terminal-bench-core --dataset-version head
77+
78+
# uploads results if run fails as well to allow for easy log inspection
79+
- name: Upload results
80+
if: always()
81+
uses: actions/upload-artifact@v4
82+
with:
83+
name: benchmark-results
84+
path: terminal-bench-test/runs/

terminal-bench-test/main.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import os
2+
import shlex
3+
from pathlib import Path
4+
5+
from terminal_bench.agents.installed_agents.abstract_installed_agent import (
6+
AbstractInstalledAgent,
7+
)
8+
from terminal_bench.terminal.models import TerminalCommand
9+
10+
11+
class AmazonQCLIAgent(AbstractInstalledAgent):
12+
13+
@staticmethod
14+
def name() -> str:
15+
return "Amazon Q CLI"
16+
17+
def __init__(self, *args, **kwargs):
18+
super().__init__(*args, **kwargs)
19+
20+
"""
21+
Makes necessary env vars available in docker containers
22+
"""
23+
@property
24+
def _env(self) -> dict[str, str]:
25+
# SIGv4 = 1 for AWS credentials
26+
env = {
27+
"AMAZON_Q_SIGV4": 1,
28+
"AWS_ACCESS_KEY_ID": os.environ.get("AWS_ACCESS_KEY_ID", ''),
29+
"AWS_SECRET_ACCESS_KEY": os.environ.get("AWS_SECRET_ACCESS_KEY", ''),
30+
"AWS_SESSION_TOKEN": os.environ.get("AWS_SESSION_TOKEN", ''),
31+
"GIT_HASH": os.environ.get("GIT_HASH", ''),
32+
"CHAT_DOWNLOAD_ROLE_ARN": os.environ.get("CHAT_DOWNLOAD_ROLE_ARN", ''),
33+
"CHAT_BUILD_BUCKET_NAME": os.environ.get("CHAT_BUILD_BUCKET_NAME", '')
34+
}
35+
return env
36+
37+
@property
38+
def _install_agent_script_path(self) -> os.PathLike:
39+
return Path(__file__).parent / "setup_amazon_q.sh"
40+
41+
def _run_agent_commands(self, task_description: str) -> list[TerminalCommand]:
42+
escaped_description = shlex.quote(task_description)
43+
44+
return [
45+
# q chat with 30 min max timeout and also we wait on input. Using qchat because of sigv4.
46+
TerminalCommand(
47+
command=f"qchat chat --no-interactive --trust-all-tools {escaped_description}",
48+
max_timeout_sec=1800,
49+
block=True,
50+
)
51+
]

terminal-bench-test/setup_amazon_q.sh

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
#!/bin/bash
2+
set -e
3+
# if git hash empty then set to latest auto
4+
apt-get update
5+
apt-get install -y curl wget unzip jq
6+
7+
echo "Installing AWS CLI..."
8+
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
9+
unzip -q awscliv2.zip
10+
./aws/install --bin-dir /usr/local/bin --install-dir /usr/local/aws-cli
11+
12+
# Create AWS credentials from environment variables
13+
mkdir -p ~/.aws
14+
cat > ~/.aws/credentials << EOF
15+
[default]
16+
aws_access_key_id = ${AWS_ACCESS_KEY_ID}
17+
aws_secret_access_key = ${AWS_SECRET_ACCESS_KEY}
18+
aws_session_token = ${AWS_SESSION_TOKEN}
19+
EOF
20+
chmod 600 ~/.aws/credentials
21+
22+
cat > ~/.aws/config << EOF
23+
[default]
24+
region = us-east-1
25+
EOF
26+
chmod 600 ~/.aws/config
27+
28+
# Assume role and capture temporary credentials --> needed for s3 bucket access for build
29+
echo "Assuming AWS s3 role"
30+
TEMP_CREDENTIALS=$(aws sts assume-role --role-arn ${CHAT_DOWNLOAD_ROLE_ARN} --role-session-name S3AccessSession 2>/dev/null || echo '{}')
31+
QCHAT_ACCESSKEY=$(echo $TEMP_CREDENTIALS | jq -r '.Credentials.AccessKeyId')
32+
Q_SECRET_ACCESS_KEY=$(echo $TEMP_CREDENTIALS | jq -r '.Credentials.SecretAccessKey')
33+
Q_SESSION_TOKEN=$(echo $TEMP_CREDENTIALS | jq -r '.Credentials.SessionToken')
34+
35+
# Download specific build from S3 based on commit hash
36+
echo "Downloading Amazon Q CLI build from S3..."
37+
S3_PREFIX="main/${GIT_HASH}/x86_64-unknown-linux-musl"
38+
echo "Downloading qchat.zip from s3://.../${S3_PREFIX}/qchat.zip"
39+
40+
# Try download, if hash is invalid we fail.
41+
AWS_ACCESS_KEY_ID="$QCHAT_ACCESSKEY" AWS_SECRET_ACCESS_KEY="$Q_SECRET_ACCESS_KEY" AWS_SESSION_TOKEN="$Q_SESSION_TOKEN" \
42+
aws s3 cp s3://${CHAT_BUILD_BUCKET_NAME}/${S3_PREFIX}/qchat.zip ./qchat.zip --region us-east-1
43+
44+
# Handle the zip file, copy the qchat executable to /usr/local/bin + symlink from old code
45+
echo "Extracting qchat.zip..."
46+
unzip -q qchat.zip
47+
48+
# move it to /usr/local/bin/qchat for path as qchat may not work otherwise
49+
if cp qchat /usr/local/bin/ && chmod +x /usr/local/bin/qchat; then
50+
ln -sf /usr/local/bin/qchat /usr/local/bin/q
51+
echo "qchat installed successfully"
52+
else
53+
echo "ERROR: Failed to install qchat"
54+
exit 1
55+
fi
56+
57+
echo "Cleaning q zip"
58+
rm -f qchat.zip
59+
rm -rf qchat

0 commit comments

Comments
 (0)