Skip to content

Commit 014a276

Browse files
committed
docs: Add a page on upgrading Ceph
1 parent addefc2 commit 014a276

File tree

4 files changed

+189
-0
lines changed

4 files changed

+189
-0
lines changed

doc/source/conf.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,19 @@
3232
current_series = "2023.1"
3333
previous_series = "zed"
3434
branch = f"stackhpc/{current_series}"
35+
ceph_series = "quincy"
3536

3637
# Substitutions loader
3738
rst_prolog = """
3839
.. |current_release| replace:: {current_release}
3940
.. |current_release_git_branch_name| replace:: {current_release_git_branch_name}
4041
.. |previous_release| replace:: {previous_release}
42+
.. |ceph_series| replace:: {ceph_series}
4143
""".format( # noqa: E501
4244
current_release_git_branch_name=branch,
4345
current_release=current_series,
4446
previous_release=previous_series,
47+
ceph_series=ceph_series,
4548
)
4649

4750
# -- General configuration ----------------------------------------------------
@@ -125,3 +128,4 @@
125128
extlinks["skc-doc"] = (f"https://stackhpc-kayobe-config.readthedocs.io/en/stackhpc-{current_series}/", "%s documentation")
126129
extlinks["kayobe-renos"] = (f"https://docs.openstack.org/releasenotes/kayobe/{current_series}.html", "%s release notes")
127130
extlinks["kolla-ansible-renos"] = (f"https://docs.openstack.org/releasenotes/kolla-ansible/{current_series}.html", "%s release notes")
131+
extlinks["ceph-doc"] = (f"https://docs.ceph.com/en/{ceph_series}/", "%s documentation")

doc/source/configuration/cephadm.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,23 @@ Default variables for configuring Ceph are provided in
103103
but you will likely need to set ``cephadm_osd_spec`` to define the OSD
104104
specification.
105105

106+
Ceph release
107+
~~~~~~~~~~~~
108+
109+
The Ceph release series is not strictly dependent upon the StackHPC OpenStack
110+
release, however this configuration does define a default Ceph release series
111+
and container image tag. The default release series is currently |ceph_series|.
112+
113+
If you wish to use a different Ceph release series, set
114+
``cephadm_ceph_release``.
115+
116+
If you wish to use different Ceph container image tags, set the following
117+
variables:
118+
119+
* ``cephadm_image_tag``
120+
* ``cephadm_haproxy_image_tag``
121+
* ``cephadm_keepalived_image_tag``
122+
106123
OSD specification
107124
~~~~~~~~~~~~~~~~~
108125

doc/source/operations/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,4 @@ This guide is for operators of the StackHPC Kayobe configuration project.
1414
secret-rotation
1515
tempest
1616
upgrading-openstack
17+
upgrading-ceph
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
==============
2+
Upgrading Ceph
3+
==============
4+
5+
This section describes show to upgrade from one version of Ceph to another.
6+
The Ceph upgrade procedure is described :ceph-doc:`here <cephadm/upgrade>`.
7+
8+
The Ceph release series is not strictly dependent upon the StackHPC OpenStack
9+
release, however this configuration does define a default Ceph release series
10+
and container image tag. The default release series is currently |ceph_series|.
11+
12+
Prerequisites
13+
=============
14+
15+
Before starting the upgrade, ensure any appropriate prerequisites are
16+
satisfied. These will be specific to each deployment, but here are some
17+
suggestions:
18+
19+
* Ensure that expected test suites are passing, e.g. Tempest.
20+
* Resolve any Prometheus alerts.
21+
* Check for unexpected ``ERROR`` or ``CRITICAL`` messages in OpenSearch
22+
Dashboard.
23+
* Check Grafana dashboards.
24+
25+
Consider whether the Ceph cluster needs to be upgraded within or outside of a
26+
maintenance/change window.
27+
28+
Preparation
29+
===========
30+
31+
Ensure that the local Kayobe configuration environment is up to date.
32+
33+
If you wish to use a different Ceph release series, set
34+
``cephadm_ceph_release``.
35+
36+
If you wish to use different Ceph container image tags, set the following
37+
variables:
38+
39+
* ``cephadm_image_tag``
40+
* ``cephadm_haproxy_image_tag``
41+
* ``cephadm_keepalived_image_tag``
42+
43+
Upgrading Host Packages
44+
=======================
45+
46+
Prior to upgrading the Ceph storage cluster, it may be desirable to upgrade
47+
system packages on the hosts.
48+
49+
Note that these commands do not affect packages installed in containers, only
50+
those installed on the host.
51+
52+
In order to avoid downtime, it is important to control how package updates are
53+
rolled out. In general, Ceph monitor hosts should be updated *one by one*. For
54+
Ceph OSD hosts it may be possible to update packages in batches of hosts,
55+
provided there is sufficient capacity to maintain data availability.
56+
57+
For each host or batch of hosts, perform the following steps.
58+
59+
Place the host or batch of hosts into maintenance mode:
60+
61+
.. code-block:: console
62+
63+
sudo cephadm shell -- ceph orch host maintenance enter <host>
64+
65+
To update all eligible packages, use ``*``, escaping if necessary:
66+
67+
.. code-block:: console
68+
69+
kayobe overcloud host package update --packages "*" --limit <host>
70+
71+
If the kernel has been upgraded, reboot the host or batch of hosts to pick up
72+
the change:
73+
74+
.. code-block:: console
75+
76+
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reboot.yml -l <host>
77+
78+
Remove the host or batch of hosts from maintenance mode:
79+
80+
.. code-block:: console
81+
82+
sudo cephadm shell -- ceph orch host maintenance exit <host>
83+
84+
Wait for Ceph health to return to ``HEALTH_OK``:
85+
86+
.. code-block:: console
87+
88+
ceph -s
89+
90+
Wait for Prometheus alerts and errors in OpenSearch Dashboard to resolve, or
91+
address them.
92+
93+
Once happy that the system has been restored to full health, move onto the next
94+
host or batch or hosts.
95+
96+
Sync container images
97+
=====================
98+
99+
If using the local Pulp server to host Ceph images
100+
(``stackhpc_sync_ceph_images`` is ``true``), sync the new Ceph images into the
101+
local Pulp:
102+
103+
.. code-block:: console
104+
105+
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/pulp-container-{sync,publish}.yml -e stackhpc_pulp_images_kolla_filter=none
106+
107+
Upgrade Ceph services
108+
=====================
109+
110+
Start the upgrade. If using the local Pulp server to host Ceph images:
111+
112+
.. code-block:: console
113+
114+
sudo cephadm shell -- ceph orch upgrade start --image <registry>/ceph/ceph:<tag>
115+
116+
Otherwise:
117+
118+
.. code-block:: console
119+
120+
sudo cephadm shell -- ceph orch upgrade start --image quay.io/ceph/ceph:<tag>
121+
122+
Check the update status:
123+
124+
.. code-block:: console
125+
126+
ceph orch upgrade status
127+
128+
Wait for Ceph health to return to ``HEALTH_OK``:
129+
130+
.. code-block:: console
131+
132+
ceph -s
133+
134+
Watch the cephadm logs:
135+
136+
.. code-block:: console
137+
138+
ceph -W cephadm
139+
140+
Upgrade Cephadm
141+
===============
142+
143+
Update the Cephadm package:
144+
145+
.. code-block:: console
146+
147+
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/cephadm-deploy.yml -e cephadm_package_update=true
148+
149+
Testing
150+
=======
151+
152+
At this point it is recommended to perform a thorough test of the system to
153+
catch any unexpected issues. This may include:
154+
155+
* Check Prometheus, OpenSearch Dashboards and Grafana
156+
* Smoke tests
157+
* All applicable tempest tests
158+
* Horizon UI inspection
159+
160+
Cleaning up
161+
===========
162+
163+
Prune unused container images:
164+
165+
.. code-block:: console
166+
167+
kayobe overcloud host command run -b --command "docker image prune -a -f" -l ceph

0 commit comments

Comments
 (0)