Skip to content

Adds sample yaml file for pod creation #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions pod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
kind: Pod
apiVersion: v1
metadata:
name: <user>-dev-e2e-1x-aiu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment saying: # this is a sample pod name.

spec:
restartPolicy: Always
serviceAccountName: default
imagePullSecrets:
- name: <user>-secret
priority: 0
schedulerName: aiu-scheduler
enableServiceLinks: true
containers:
- resources:
limits:
ibm.com/aiu_pf: '1'
requests:
ibm.com/aiu_pf: '1'
terminationMessagePath: /dev/termination-log
name: <user>-dev-e2e-1x-aiu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment: # Sample container name. Substitute with your own name.

command:
- bash
- '-c'
env:
- name: FLEX_COMPUTE
value: SENTIENT
- name: FLEX_DEVICE
value: PF
- name: FLEX_OVERWRITE_NMB_FRAME
value: '1'
- name: FLEX_UNLINK_DEVMEM
value: 'false'
- name: PYTHONUNBUFFERED
value: '1'
- name: HOME
value: /home/senuser
- name: HF_HUB_OFFLINE
value: '1'
- name: HF_HOME
value: /home/senuser/models/huggingface_cache

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this comment. # This can be canned to your local home path environment.

Since this will be exposed externally, we should say that we are just providing a sample yaml. User can adjust the fields based on their AIU image and cluster environment.

- name: HF_HUB_CACHE
value: /home/senuser/models/huggingface_cache/hub
- name: DTLOG_LEVEL
value: error
- name: TORCH_SENDNN_LOG
value: CRITICAL
- name: DT_DEEPRT_VERBOSE
value: '-1'
- name: POD_IMAGE
value: *pod_image
- name: FMS_CHECKOUT
value: v1.1.0
securityContext:
capabilities:
drop:
- ALL
runAsUser: 1000810000
runAsNonRoot: true
allowPrivilegeEscalation: false
imagePullPolicy: IfNotPresent
volumeMounts:
- name: dev-shm
mountPath: /dev/shm
terminationMessagePolicy: File
image: &pod_image icr.io/ibmaiu_internal/x86_64/dd2/e2e_stable:latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment # AIU software image

workingDir: /home/senuser
args:
- |
source ~/.bashrc
unset HF_HOME
cd $HOME
pip3 install -q -U transformers
git clone https://github.com/foundation-model-stack/foundation-model-stack.git
cd foundation-model-stack
git checkout $FMS_CHECKOUT
cp ${AIU_AUTOGEN_SENLIB_CONFIG_FILE} /tmp/etc/aiu/senlib_config.json
FILE=/tmp/etc/aiu/senlib_config.json
cat $FILE | jq '. += {"RISCV": {"DOOM": { "enable" : false}}, "SNT_MCI" : { "DCR": {"MCI_CTRL": {"ENABLE_RISCV": "0x0"} } }}' > $FILE.jq
mv $FILE.jq $FILE
cp /tmp/etc/aiu/senlib_config.json $HOME/.senlib.json
echo "POD_IMAGE:" $POD_IMAGE >> /tmp/aiu-query-devices.txt
echo " " >> /tmp/aiu-query-devices.txt
/opt/sentient/bin/aiu-query-devices >> /tmp/aiu-query-devices.txt
echo " " >> ~/.bashrc
echo "cat /tmp/aiu-query-devices.txt" >> ~/.bashrc
echo 'FLEX_COMPUTE = ' $FLEX_COMPUTE
echo 'FLEX_DEVICE = ' $FLEX_DEVICE
echo 'DTLOG_LEVEL = ' $DTLOG_LEVEL
echo 'TORCH_SENDNN_LOG = ' $TORCH_SENDNN_LOG
echo 'DT_DEEPRT_VERBOSE = ' $DT_DEEPRT_VERBOSE
echo 'INFER_SCRIPT = ' $INFER_SCRIPT
echo 'MODEL = ' $MODEL
tail -f /dev/null
serviceAccount: default
volumes:
- name: dev-shm
emptyDir:
medium: Memory
sizeLimit: 64Gi
dnsPolicy: ClusterFirst