Skip to content

Commit d07c6bb

Browse files
committed
Full shutdown document added
1 parent 86a01a4 commit d07c6bb

File tree

3 files changed

+207
-18
lines changed

3 files changed

+207
-18
lines changed

source/full_shutdown.rst

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
.. include:: vars.rst
2+
3+
=======================
4+
Full Shutdown Procedure
5+
=======================
6+
7+
In case a full shutdown of the system is required, we advise to use the
8+
following order:
9+
10+
* Perform a graceful shutdown of all virtual machine instances
11+
* Stop Ceph (if applicable)
12+
* Put all nodes into maintenance mode in Bifrost
13+
* Shut down compute nodes
14+
* Shut down monitoring node
15+
* Shut down network nodes (if separate from controllers)
16+
* Shut down controllers
17+
* Shut down Ceph nodes (if applicable)
18+
* Shut down seed VM
19+
* Shut down Ansible control host
20+
21+
Virtual Machines shutdown
22+
-------------------------
23+
24+
Contact Openstack users to stop their virtual machines gracefuly,
25+
If that is not possible shut down VMs using openstack CLI as admin user:
26+
27+
.. code-block:: bash
28+
29+
for i in `openstack server list --all-projects -c ID -f value` ; \
30+
do openstack server stop $i ; done
31+
32+
33+
.. ifconfig:: deployment['ceph_managed']
34+
35+
Stop Ceph
36+
---------
37+
Procedure based on `Red Hat documentation <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/understanding-process-management-for-ceph#powering-down-and-rebooting-a-red-hat-ceph-storage-cluster_admin>`__
38+
39+
- Stop the Ceph clients from using any Ceph resources (RBD, RADOS Gateway, CephFS)
40+
- Check if cluster is in healthy state
41+
42+
.. code-block:: bash
43+
44+
ceph status
45+
46+
- Stop CephFS (if applicable)
47+
48+
Stop CephFS cluster by reducing the number of ranks to 1, setting the cluster_down flag, and then failing the last rank.
49+
50+
.. code-block:: bash
51+
52+
ceph fs set FS_NAME max_mds 1
53+
ceph mds deactivate FS_NAME:1 # rank 2 of 2
54+
ceph status # wait for rank 1 to finish stopping
55+
ceph fs set FS_NAME cluster_down true
56+
ceph mds fail FS_NAME:0
57+
58+
Setting the cluster_down flag prevents standbys from taking over the failed rank.
59+
60+
- Set the noout, norecover, norebalance, nobackfill, nodown and pause flags.
61+
62+
.. code-block:: bash
63+
64+
ceph osd set noout
65+
ceph osd set norecover
66+
ceph osd set norebalance
67+
ceph osd set nobackfill
68+
ceph osd set nodown
69+
ceph osd set pause
70+
71+
- Shut down the OSD nodes one by one:
72+
73+
.. code-block:: bash
74+
75+
systemctl stop ceph-osd.target
76+
77+
- Shut down the monitor/manager nodes one by one:
78+
79+
.. code-block:: bash
80+
81+
systemctl stop ceph.target
82+
83+
Set Bifrost maintenance mode
84+
----------------------------
85+
86+
Set maintenance mode in bifrost to prevent nodes from automatically
87+
powering back on
88+
89+
.. code-block:: bash
90+
91+
for i in `openstack --os-cloud bifrost baremetal node list -c UUID -f value` ; \
92+
do openstack --os-cloud bifrost baremetal node maintenance set $i ; done
93+
94+
Shut down nodes
95+
---------------
96+
97+
Shut down nodes one at a time gracefully using:
98+
99+
.. code-block:: bash
100+
101+
systemctl poweroff
102+
103+
Shut down the seed VM
104+
---------------------
105+
106+
Shut down seed vm on ansible control host gracefully using:
107+
108+
.. code-block:: bash
109+
:substitutions:
110+
111+
ssh stack@|seed_name| sudo systemctl poweroff
112+
virsh shutdown |seed_name|
113+
114+
115+
Full Power on Procedure
116+
-----------------------
117+
118+
* Start ansible control host and seed vm
119+
* Remove nodes from maintenance mode in bifrost
120+
* Recover MariaDB cluster
121+
* Start Ceph (if applicable)
122+
* Check that all docker containers are running
123+
* Check Kibana for any messages with log level ERROR or equivalent
124+
125+
Start Ansible Control Host
126+
--------------------------
127+
128+
The Ansible control host is not enrolled in Bifrost and will have to be powered
129+
on manually.
130+
131+
Start Seed VM
132+
-------------
133+
134+
The seed VM (and any other service VM) should start automatically when the seed
135+
hypervisor is powered on. If it does not, it can be started with:
136+
137+
.. code-block:: bash
138+
139+
virsh start seed-0
140+
141+
Unset Bifrost maintenance mode
142+
------------------------------
143+
144+
Unsetting maintenance mode in bifrost should automatically power on the nodes
145+
146+
.. code-block:: bash
147+
148+
for i in `openstack --os-cloud bifrost baremetal node list -c UUID -f value` ; \
149+
do openstack --os-cloud bifrost baremetal node maintenance unset $i ; done
150+
151+
Recover MariaDB cluster
152+
-----------------------
153+
154+
If all of the servers were shut down at the same time, it is necessary to run a
155+
script to recover the database once they have all started up. This can be done
156+
with the following command:
157+
158+
.. code-block:: bash
159+
160+
kayobe# kayobe overcloud database recover
161+
162+
.. ifconfig:: deployment['ceph_managed']
163+
164+
Start Ceph
165+
----------
166+
Procedure based on `Red Hat documentation <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/understanding-process-management-for-ceph#powering-down-and-rebooting-a-red-hat-ceph-storage-cluster_admin>`__
167+
168+
- Start monitor/manager nodes:
169+
170+
.. code-block:: bash
171+
172+
systemctl start ceph.target
173+
174+
- Start the OSD nodes:
175+
176+
.. code-block:: bash
177+
178+
systemctl start ceph-osd.target
179+
180+
- Wait for all the nodes to come up
181+
182+
- Unset the noout, norecover, norebalance, nobackfill, nodown and pause flags
183+
184+
.. code-block:: bash
185+
186+
ceph osd unset noout
187+
ceph osd unset norecover
188+
ceph osd unset norebalance
189+
ceph osd unset nobackfill
190+
ceph osd unset nodown
191+
ceph osd unset pause
192+
193+
- Start CephFS (if applicable)
194+
195+
CephFS cluster must be brought back up by setting the cluster_down flag to false
196+
197+
.. code-block:: bash
198+
199+
ceph fs set FS_NAME cluster_down false
200+
201+
- Verify ceph cluster status
202+
203+
.. code-block:: bash
204+
205+
ceph status

source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Contents
2424
ceph_storage
2525
managing_users_and_projects
2626
operations_and_monitoring
27+
full_shutdown
2728
customising_deployment
2829
gpus_in_openstack
2930

source/operations_and_monitoring.rst

Lines changed: 1 addition & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -502,22 +502,10 @@ Shutting down the seed VM
502502
kayobe# ssh stack@|seed_name| sudo systemctl poweroff
503503
kayobe# virsh shutdown |seed_name|
504504
505-
.. _full-shutdown:
506-
507505
Full shutdown
508506
-------------
509507

510-
In case a full shutdown of the system is required, we advise to use the
511-
following order:
512-
513-
* Perform a graceful shutdown of all virtual machine instances
514-
* Shut down compute nodes
515-
* Shut down monitoring node
516-
* Shut down network nodes (if separate from controllers)
517-
* Shut down controllers
518-
* Shut down Ceph nodes (if applicable)
519-
* Shut down seed VM
520-
* Shut down Ansible control host
508+
Follow separate :doc:`document <full_shutdown>`.
521509

522510
Rebooting a node
523511
----------------
@@ -572,11 +560,6 @@ hypervisor is powered on. If it does not, it can be started with:
572560
573561
kayobe# virsh start seed-0
574562
575-
Full power on
576-
-------------
577-
578-
Follow the order in :ref:`full-shutdown`, but in reverse order.
579-
580563
Shutting Down / Restarting Monitoring Services
581564
----------------------------------------------
582565

0 commit comments

Comments
 (0)