Skip to content

Commit e935e16

Browse files
committed
Full shutdown document added
1 parent 86a01a4 commit e935e16

File tree

3 files changed

+209
-18
lines changed

3 files changed

+209
-18
lines changed

source/full_shutdown.rst

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
.. include:: vars.rst
2+
3+
=======================
4+
Full Shutdown Procedure
5+
=======================
6+
7+
In case a full shutdown of the system is required, we advise to use the
8+
following order:
9+
10+
* Perform a graceful shutdown of all virtual machine instances
11+
* Stop Ceph (if applicable)
12+
* Put all nodes into maintenance mode in Bifrost
13+
* Shut down compute nodes
14+
* Shut down monitoring node
15+
* Shut down network nodes (if separate from controllers)
16+
* Shut down controllers
17+
* Shut down Ceph nodes (if applicable)
18+
* Shut down seed VM
19+
* Shut down Ansible control host
20+
21+
Virtual Machines shutdown
22+
-------------------------
23+
24+
Contact Openstack users to stop their virtual machines gracefuly,
25+
If that is not possible shut down VMs using openstack CLI as admin user:
26+
27+
.. code-block:: bash
28+
:substitutions:
29+
30+
for i in `openstack server list --all-projects -c ID -f value` ; \
31+
do openstack server stop $i ; done
32+
33+
34+
.. ifconfig:: deployment['ceph_managed']
35+
36+
Stop Ceph
37+
---------
38+
Procedure based on `Red Hat documentation <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/understanding-process-management-for-ceph#powering-down-and-rebooting-a-red-hat-ceph-storage-cluster_admin>`__
39+
40+
- Stop the Ceph clients from using any Ceph resources (RBD, RADOS Gateway, CephFS)
41+
- Check if cluster is in healthy state
42+
43+
.. code-block:: bash
44+
45+
ceph status
46+
47+
- Stop CephFS (if applicable)
48+
49+
Stop CephFS cluster by reducing the number of ranks to 1, setting the cluster_down flag, and then failing the last rank.
50+
51+
.. code-block:: bash
52+
53+
ceph fs set FS_NAME max_mds 1
54+
ceph mds deactivate FS_NAME:1 # rank 2 of 2
55+
ceph status # wait for rank 1 to finish stopping
56+
ceph fs set FS_NAME cluster_down true
57+
ceph mds fail FS_NAME:0
58+
59+
Setting the cluster_down flag prevents standbys from taking over the failed rank.
60+
61+
- Set the noout, norecover, norebalance, nobackfill, nodown and pause flags.
62+
63+
.. code-block:: bash
64+
65+
ceph osd set noout
66+
ceph osd set norecover
67+
ceph osd set norebalance
68+
ceph osd set nobackfill
69+
ceph osd set nodown
70+
ceph osd set pause
71+
72+
- Shut down the OSD nodes one by one:
73+
74+
.. code-block:: bash
75+
76+
systemctl stop ceph-osd.target
77+
78+
- Shut down the monitor/manager nodes one by one:
79+
80+
.. code-block:: bash
81+
82+
systemctl stop ceph.target
83+
84+
Set Bifrost maintenance mode
85+
----------------------------
86+
87+
Set maintenance mode in bifrost to prevent nodes from automatically
88+
powering back on
89+
90+
.. code-block:: bash
91+
92+
for i in `openstack --os-cloud bifrost baremetal node list -c UUID -f value` ; \
93+
do openstack --os-cloud bifrost baremetal node maintenance set $i ; done
94+
95+
Shut down nodes
96+
---------------
97+
98+
Shut down nodes one at a time gracefully using:
99+
100+
.. code-block:: bash
101+
102+
systemctl poweroff
103+
104+
Shut down the seed VM
105+
---------------------
106+
107+
Shut down seed vm on ansible control host gracefully using:
108+
109+
.. code-block:: bash
110+
:substitutions:
111+
112+
ssh stack@|seed_name| sudo systemctl poweroff
113+
virsh shutdown |seed_name|
114+
115+
116+
Full Power on Procedure
117+
-----------------------
118+
119+
* Start ansible control host and seed vm
120+
* Remove nodes from maintenance mode in bifrost
121+
* Recover MariaDB cluster
122+
* Start Ceph (if applicable)
123+
* Check that all docker containers are running
124+
* Check Kibana for any messages with log level ERROR or equivalent
125+
126+
Start Ansible Control Host
127+
--------------------------
128+
129+
The Ansible control host is not enrolled in Bifrost and will have to be powered
130+
on manually.
131+
132+
Start Seed VM
133+
-------------
134+
135+
The seed VM (and any other service VM) should start automatically when the seed
136+
hypervisor is powered on. If it does not, it can be started with:
137+
138+
.. code-block:: bash
139+
140+
virsh start seed-0
141+
142+
Unset Bifrost maintenance mode
143+
------------------------------
144+
145+
Unsetting maintenance mode in bifrost should automatically power on the nodes
146+
147+
.. code-block:: bash
148+
:substitutions:
149+
150+
for i in `openstack --os-cloud bifrost baremetal node list -c UUID -f value` ; \
151+
do openstack --os-cloud bifrost baremetal node maintenance unset $i ; done
152+
153+
Recover MariaDB cluster
154+
-----------------------
155+
156+
If all of the servers were shut down at the same time, it is necessary to run a
157+
script to recover the database once they have all started up. This can be done
158+
with the following command:
159+
160+
.. code-block:: bash
161+
162+
kayobe# kayobe overcloud database recover
163+
164+
.. ifconfig:: deployment['ceph_managed']
165+
166+
Start Ceph
167+
----------
168+
Procedure based on `Red Hat documentation <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/understanding-process-management-for-ceph#powering-down-and-rebooting-a-red-hat-ceph-storage-cluster_admin>`__
169+
170+
- Start monitor/manager nodes:
171+
172+
.. code-block:: bash
173+
174+
systemctl start ceph.target
175+
176+
- Start the OSD nodes:
177+
178+
.. code-block:: bash
179+
180+
systemctl start ceph-osd.target
181+
182+
- Wait for all the nodes to come up
183+
184+
- Unset the noout, norecover, norebalance, nobackfill, nodown and pause flags
185+
186+
.. code-block:: bash
187+
188+
ceph osd unset noout
189+
ceph osd unset norecover
190+
ceph osd unset norebalance
191+
ceph osd unset nobackfill
192+
ceph osd unset nodown
193+
ceph osd unset pause
194+
195+
- Start CephFS (if applicable)
196+
197+
CephFS cluster must be brought back up by setting the cluster_down flag to false
198+
199+
.. code-block:: bash
200+
201+
ceph fs set FS_NAME cluster_down false
202+
203+
- Verify ceph cluster status
204+
205+
.. code-block:: bash
206+
207+
ceph status

source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Contents
2424
ceph_storage
2525
managing_users_and_projects
2626
operations_and_monitoring
27+
full_shutdown
2728
customising_deployment
2829
gpus_in_openstack
2930

source/operations_and_monitoring.rst

Lines changed: 1 addition & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -502,22 +502,10 @@ Shutting down the seed VM
502502
kayobe# ssh stack@|seed_name| sudo systemctl poweroff
503503
kayobe# virsh shutdown |seed_name|
504504
505-
.. _full-shutdown:
506-
507505
Full shutdown
508506
-------------
509507

510-
In case a full shutdown of the system is required, we advise to use the
511-
following order:
512-
513-
* Perform a graceful shutdown of all virtual machine instances
514-
* Shut down compute nodes
515-
* Shut down monitoring node
516-
* Shut down network nodes (if separate from controllers)
517-
* Shut down controllers
518-
* Shut down Ceph nodes (if applicable)
519-
* Shut down seed VM
520-
* Shut down Ansible control host
508+
Follow separate :doc:`document <full_shutdown>`.
521509

522510
Rebooting a node
523511
----------------
@@ -572,11 +560,6 @@ hypervisor is powered on. If it does not, it can be started with:
572560
573561
kayobe# virsh start seed-0
574562
575-
Full power on
576-
-------------
577-
578-
Follow the order in :ref:`full-shutdown`, but in reverse order.
579-
580563
Shutting Down / Restarting Monitoring Services
581564
----------------------------------------------
582565

0 commit comments

Comments
 (0)