Skip to content

Commit 4f6c141

Browse files
committed
initial commit for eScience tutorial
1 parent 6187830 commit 4f6c141

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+3130
-0
lines changed

.gitmodules

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,9 @@
1010
[submodule "2025-HPCIC/tutorial-code/thicket-tutorial"]
1111
path = 2025-HPCIC/tutorial-code/thicket-tutorial
1212
url = https://github.com/llnl/thicket-tutorial
13+
[submodule "2025-eScience/tutorial-code/thicket-tutorial"]
14+
path = 2025-eScience/tutorial-code/thicket-tutorial
15+
url = https://github.com/llnl/thicket-tutorial
16+
[submodule "2025-eScience/tutorial-code/caliper-tutorial"]
17+
path = 2025-eScience/tutorial-code/caliper-tutorial
18+
url = https://github.com/daboehme/caliper-tutorial.git

2025-eScience/README.rst

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
======================
2+
eScience 2025 Tutorial
3+
======================
4+
5+
This directory contains the materials for the eScience 2025 tutorial. The following subsections go over the contains of the material.
6+
7+
--------
8+
Contents
9+
--------
10+
11+
^^^^^^^^^^^^^
12+
Tutorial Code
13+
^^^^^^^^^^^^^
14+
15+
The code elements of this tutorial (e.g., Jupyter notebooks, command-line scripts, Markdown/RST instruction files) can all be found in the :code:`tutorial-code` subdirectory. If materials are actually stored in other git repositories, they can be accessed from this subdirectory
16+
via a git submodule.
17+
18+
^^^^^^
19+
Slides
20+
^^^^^^
21+
22+
The slides used in presenting this tutorial can be found in the :code:`slides` subdirectory.
23+
24+
^^^^^^
25+
Docker
26+
^^^^^^
27+
28+
The Docker definition files (i.e., Dockerfiles) for all the necessary containers can be found in the :code:`docker` subdirectory. There are currently 5 definition files:
29+
30+
1. :code:`Dockerfile.caliper`: builds Caliper and Adiak on top of the :code:`ubuntu/noble` image from DockerHub
31+
2. :code:`Dockerfile.thicket`: build Thicket on top of the image produced by :code:`Dockerfile.caliper`
32+
3. :code:`Dockerfile.benchpark`: download and bootstrap Benchpark on top of the image produced by :code:`Dockerfile.benchpark`
33+
4. :code:`Dockerfile.spawn`: download tutorial materials, download any remaining necessary packages, and do other setup work on top of the image produced by :code:`Dockerfile.benchpark`
34+
5. :code:`Dockerfile.init`: ensure user permissions are correct using the super-minimal :code:`alpine/git` image from DockerHub
35+
36+
"""""""""""""""""""""""""""""""""""""""
37+
Testing the Builds of the Docker Images
38+
"""""""""""""""""""""""""""""""""""""""
39+
40+
To enable automated testing of the Docker images, all edits to the Dockerfiles above should be done in a branch with an open PR. When a PR is open, a GitHub Actions CI will
41+
run and ensure that the images can be built. To properly configure the CI, edit the :code:`github_ci_matrix.json` file in the root of this repository as follows:
42+
43+
1. Edit the "tag" field to be the tag (i.e., version) of the Docker images you will be generating
44+
2. Edit the "tutorial_dir" field to the name of this directory
45+
46+
The CI reads :code:`github_ci_matrix.json` to get values shared by the matrices of all GitHub Actions jobs.
47+
48+
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
49+
Pushing the Docker Images to GitHub Container Registry (GHCR)
50+
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
51+
52+
Before trying to push to GHCR, someone with the necessary permissions should make sure this repo can push to these images in GHCR (**change names when we decide on appropriate ones**):
53+
54+
* ghcr.io/llnl/caliper
55+
* ghcr.io/llnl/thicket
56+
* ghcr.io/llnl/benchpark
57+
* ghcr.io/llnl/reproducible-benchmarking-spawn
58+
* ghcr.io/llnl/reproducible-benchmarking-init
59+
60+
If these images do not yet exist, your first push will properly set the permissions. If these images do exist, follow the instructions
61+
`here <https://docs.github.com/en/packages/learn-github-packages/configuring-a-packages-access-control-and-visibility#ensuring-workflow-access-to-your-package>`_
62+
to add this repository to each package. Make sure to grant "Write" permissions to the repository while doing this.
63+
64+
After ensuring this repository has the necessary permissions, to push the Docker images to GHCR, follow these steps:
65+
66+
1. Make sure all changes to the Dockerfiles have been merged into the :code:`main` branch
67+
2. From the GitHub webpage, navigate to the "Actions" tab
68+
3. On the left of the resulting page, click on "Build containers and push to GHCR"
69+
4. Click on the "Run workflow" button to the right of the page
70+
5. In the popup menu that appears, select the "main" branch and fill out the requested information
71+
6. Click the green "Run workflow" button to start the process and building and pushing images
72+
73+
^^^^^^^^^^^^^^
74+
Infrastructure
75+
^^^^^^^^^^^^^^
76+
77+
All the infrastructure needed to deploy the tutorial to a Kubernetes cluster with JupyterHub is contained in the :code:`infrastructure` subdirectory.
78+
This infrastructure is generated by the tool `here <https://lc.llnl.gov/gitlab/lumsden1/hpcic-k8s-configurer>`_.
79+
The infrastructure can be regenerated as-is using :code:`infrastructure/config.toml`.
80+
81+
----------------------------
82+
Testing the Tutorial Locally
83+
----------------------------
84+
85+
To test the tutorial locally, you first need to build all the Docker images except the init image. Before building,
86+
keep in mind the following dependencies between images:
87+
88+
.. code-block::
89+
90+
ghcr.io/llnl/caliper --> ghcr.io/llnl/thicket --> ghcr.io/llnl/benchpark --> ghcr.io/llnl/reproducible-benchmarking-spawn
91+
92+
Because of these dependencies, the first thing you should figure out is which (if any) images you need to build locally.
93+
If a Dockerfile has changes that are **not** on GHCR, you will need to build that image *and all downstream images (based on the flowchart above)*
94+
locally before testing. To build an image locally, run the following from this directory (**not the** :code:`docker` **directory**):
95+
96+
.. code-block:: bash
97+
98+
$ docker build -t <image_name> -f ./docker/<dockerfile_for_image> . # Note the trailing "."
99+
100+
In the command above, :code:`<image_name>` should be one of the GHCR URLs above, followed by a colon, followed by a tag. It could look something
101+
like :code:`ghcr.io/llnl/benchpark:escience-2025`. Note that :code:`<iamge_name>` **must** match the value of the :code:`FROM` directive
102+
for the dependent image. For example, to get the :code:`<image_name>` field for :code:`ghcr.io/llnl/benchpark`, look for the :code:`FROM` directive
103+
in :code:`./docker/Dockerfile.spawn`.
104+
105+
If all the changes to the corresponding Dockerfiles in :code:`docker` have already been pushed to GHCR, you do not need to build locally.
106+
Instead, you should just pull the spawn image using:
107+
108+
.. code-block:: bash
109+
110+
$ docker pull ghcr.io/llnl/reproducible-benchmarking-spawn:<tag>
111+
112+
You should replace :code:`<tag>` in the command above with the GHCR tag of the image you want to pull.
113+
114+
After you have a built spawn image (either by building locally or by pulling from GHCR), you can run the spawn image locally
115+
by running the following command:
116+
117+
.. code-block:: bash
118+
119+
$ docker run --rm -it --entrypoint <entrypoint> --name reproducible_benchmark_tutorial_local -p 8888:8888 <spawn_image_name>
120+
121+
In the command above, :code:`<spawn_image_name>` is the name of the built spawn image. If you built that image locally, this argument
122+
should match the value you passed to the :code:`-t` flag of :code:`docker build` when building the spawn image. If you pulled the image
123+
from GHCR, this argument should be :code:`ghcr.io/llnl/reproducible-benchmarking-spawn:<tag>`.
124+
125+
The :code:`<entrypoint>` field in the command above dictates what command runs within the container immediately after startup.
126+
It can be one of three values:
127+
128+
1. :code:`/local-entrypoint.sh`: this entrypoint script will start a JupyterLab instance and make it available from outside the container.
129+
2. :code:`/entrypoint.sh`: this entrypoint script will run :code:`jupyterhub-singleuser`. It is intended for use in the cloud JupyterHub deployment and should not be used locally.
130+
3. :code:`bash`: by specifying :code:`bash` (or any other shell installed in the container), you will get command-line access to the container, instead of a Jupyter environment.
131+
132+
At this point, you should either have a Jupyter URL that you can use to access Jupyter, or you should have shell access to the container.
133+
You can now do whatever local testing you want of the image.
134+
135+
------------------------------------
136+
Deploying the Tutorial to Kubernetes
137+
------------------------------------
138+
139+
TBA
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Copyright 2025 Lawrence Livermore National Security, LLC and other
2+
# Benchpark developers. See the top-level COPYRIGHT file for details.
3+
#
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
# For testing
7+
# FROM test-thicket
8+
9+
FROM ghcr.io/llnl/thicket:escience-2025
10+
11+
USER root
12+
13+
ENV DEBIAN_FRONTEND=noninteractive
14+
RUN apt-get update && \
15+
apt-get install -y --no-install-recommends \
16+
wget \
17+
gzip \
18+
lsb-release \
19+
patch \
20+
tar \
21+
unzip \
22+
xz-utils \
23+
zstd \
24+
bzip2 \
25+
liblapack-dev \
26+
libblas-dev \
27+
&& rm -rf /var/lib/apt/lists/*
28+
29+
SHELL [ "/bin/bash", "-c" ]
30+
31+
USER ${NB_USER}
32+
33+
RUN git clone https://github.com/LLNL/benchpark.git ${HOME}/benchpark && \
34+
cd ${HOME}/benchpark && \
35+
git checkout -b develop-2025-08-25 develop-2025-08-25 && \
36+
git submodule update --init --recursive
37+
38+
USER root
39+
40+
RUN . /opt/global_py_venv/bin/activate && \
41+
python3 -m pip install -r ${HOME}/benchpark/requirements.txt
42+
43+
RUN echo 'export PATH=${HOME}/benchpark/bin:$PATH' >> ${HOME}/.bashrc
44+
45+
RUN echo 'export PATH=${HOME}/benchpark/bin:$PATH' >> ${HOME}/.bash_profile
46+
47+
RUN chmod -R 777 ~/ ${HOME}
48+
49+
WORKDIR ${HOME}
50+
51+
RUN mkdir -p ${HOME}/.local/share && \
52+
chmod 777 ${HOME}/.local/share
53+
54+
USER ${NB_USER}
55+
56+
# Run this to trigger bootstrap
57+
RUN . /opt/global_py_venv/bin/activate && \
58+
${HOME}/benchpark/bin/benchpark bootstrap
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Copyright 2025 Lawrence Livermore National Security, LLC and other
2+
# Benchpark developers. See the top-level COPYRIGHT file for details.
3+
#
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
# FROM ubuntu:noble
7+
FROM fluxrm/flux-sched:jammy
8+
9+
# ubuntu:noble added a new 'ubuntu' user in the container.
10+
# Get rid of it!
11+
# RUN userdel -r ubuntu
12+
13+
USER root
14+
15+
ENV DEBIAN_FRONTEND=noninteractive
16+
RUN apt-get update && \
17+
apt-get install -y --no-install-recommends \
18+
adduser \
19+
vim \
20+
nano \
21+
emacs \
22+
build-essential \
23+
cmake \
24+
python3 \
25+
python3-dev \
26+
python3-pip \
27+
python3-venv \
28+
git \
29+
util-linux \
30+
less \
31+
htop \
32+
zip \
33+
unzip \
34+
# NOTE: the flux-sched image already pulls and builds MPICH 4.2.2
35+
# WITHOUT PMIx support (this is important because PMIx is a pain, and
36+
# requires extra setup with Flux).
37+
# openmpi-bin \
38+
# openmpi-common \
39+
# libopenmpi-dev \
40+
&& rm -rf /var/lib/apt/lists/*
41+
42+
SHELL [ "/bin/bash", "-c" ]
43+
44+
RUN python3 -m venv /opt/global_py_venv
45+
46+
RUN . /opt/global_py_venv/bin/activate && \
47+
python3 -m pip install pybind11
48+
49+
ENV CALI_INSTALL_PREFIX=/usr \
50+
GIT_CLONE_STAGING_DIR=/tmp
51+
52+
RUN git clone https://github.com/LLNL/Caliper.git ${GIT_CLONE_STAGING_DIR}/Caliper && \
53+
cd ${GIT_CLONE_STAGING_DIR}/Caliper && \
54+
git fetch origin && \
55+
git checkout v2.12.1 && \
56+
git submodule update --init --recursive && \
57+
git clone https://github.com/LLNL/Adiak.git ${GIT_CLONE_STAGING_DIR}/Adiak && \
58+
cd ${GIT_CLONE_STAGING_DIR}/Adiak && \
59+
git fetch origin && \
60+
git checkout v0.4.1 && \
61+
git submodule update --init --recursive
62+
63+
RUN cd ${GIT_CLONE_STAGING_DIR}/Adiak && \
64+
mkdir build && \
65+
cd build && \
66+
cmake \
67+
-DENABLE_MPI=ON \
68+
-DCMAKE_C_COMPILER=$(which gcc) \
69+
-DCMAKE_CXX_COMPILER=$(which g++) \
70+
-DBUILD_SHARED_LIBS=ON \
71+
-DCMAKE_INSTALL_PREFIX=${CALI_INSTALL_PREFIX} \
72+
.. && \
73+
make -j 4 && \
74+
make install
75+
76+
RUN . /opt/global_py_venv/bin/activate && \
77+
cd ${GIT_CLONE_STAGING_DIR}/Caliper && \
78+
mkdir build && \
79+
cd build && \
80+
cmake \
81+
-DWITH_TOOLS=ON \
82+
-DWITH_MPI=ON \
83+
-DWITH_ADIAK=ON \
84+
-DWITH_PYTHON_BINDINGS=ON \
85+
-Dpybind11_DIR=$(pybind11-config --cmakedir) \
86+
-DCMAKE_PREFIX_PATH=${CALI_INSTALL_PREFIX} \
87+
-DCMAKE_C_COMPILER=$(which gcc) \
88+
-DCMAKE_CXX_COMPILER=$(which g++) \
89+
-DBUILD_SHARED_LIBS=ON \
90+
-DCMAKE_INSTALL_PREFIX=${CALI_INSTALL_PREFIX} \
91+
.. && \
92+
make -j 4 && \
93+
make install
94+
95+
RUN rm -rf ${GIT_CLONE_STAGING_DIR}/Caliper && rm -rf ${GIT_CLONE_STAGING_DIR}/Adiak
96+
97+
ENV NB_USER=jovyan \
98+
NB_UID=1000 \
99+
HOME=/home/jovyan
100+
101+
RUN adduser \
102+
--disabled-password \
103+
--gecos "Default user" \
104+
--uid ${NB_UID} \
105+
--home ${HOME} \
106+
--force-badname \
107+
${NB_USER}
108+
109+
# NOTE: this should NEVER be uncommented by the time we push to GHCR
110+
RUN adduser ${NB_USER} sudo
111+
RUN echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
112+
113+
RUN chmod -R 777 ~/ ${HOME}
114+
115+
ENV SHELL=/usr/bin/bash
116+
117+
RUN mkdir -p ${HOME}/.local/share && \
118+
chmod 777 ${HOME}/.local/share
119+
120+
RUN echo $(flux env)
121+
122+
RUN echo 'export PATH=/usr/bin:$PATH' >> ${HOME}/.bashrc && \
123+
echo '. /opt/global_py_venv/bin/activate' >> ${HOME}/.bashrc && \
124+
echo 'export LD_LIBRARY_PATH=/usr/lib:/usr/lib64:$LD_LIBRARY_PATH' >> ${HOME}/.bashrc
125+
126+
RUN echo 'export PATH=/usr/bin:$PATH' >> ${HOME}/.bash_profile && \
127+
echo '. /opt/global_py_venv/bin/activate' >> ${HOME}/.bash_profile && \
128+
echo 'export LD_LIBRARY_PATH=/usr/lib:/usr/lib64:$LD_LIBRARY_PATH' >> ${HOME}/.bash_profile
129+
130+
USER ${NB_USER}
131+
WORKDIR ${HOME}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
FROM jupyterhub/k8s-hub:4.2.0
2+
3+
ENV JUPYTERHUB_XSRF_ANONYMOUS_IP_CIDRS="0.0.0.0/0"
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Copyright 2025 Lawrence Livermore National Security, LLC and other
2+
# Benchpark developers. See the top-level COPYRIGHT file for details.
3+
#
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
FROM alpine/git
7+
8+
USER root
9+
10+
ENV NB_USER=jovyan \
11+
NB_UID=1000 \
12+
HOME=/home/jovyan
13+
14+
RUN adduser \
15+
-D \
16+
-g "Default user" \
17+
-u ${NB_UID} \
18+
-h ${HOME} \
19+
${NB_USER}
20+
21+
COPY ./docker/init-entrypoint.sh /entrypoint.sh
22+
RUN chmod 777 /entrypoint.sh

0 commit comments

Comments
 (0)