-
Notifications
You must be signed in to change notification settings - Fork 2
Full shutdown document added #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Bartosz, just a couple of questions / suggestions
|
||
Stop Ceph | ||
--------- | ||
Procedure based on `Red Hat documentation <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/understanding-process-management-for-ceph#powering-down-and-rebooting-a-red-hat-ceph-storage-cluster_admin>`__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's something equivalent in the community docs it would be better, but the closest I found was https://docs.ceph.com/en/latest/rados/operations/operating/ and it doesn't cover setting all the flags below.
|
||
.. code-block:: bash | ||
|
||
systemctl poweroff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be serialised form of shutdown invocation using Kayobe's tools https://docs.openstack.org/kayobe/latest/administration/overcloud.html#running-commands - perhaps also with a small delay to the shutdown command so that it doesn't immediately chop off the ansible connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition. Looks like there is some room for automation here, but that can be added iteratively.
.. code-block:: bash | ||
|
||
for i in `openstack server list --all-projects -c ID -f value` ; \ | ||
do openstack server stop $i ; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this asynchronous? Should we check for success?
- Stop the Ceph clients from using any Ceph resources (RBD, RADOS Gateway, CephFS) | ||
- Check if cluster is in healthy state | ||
|
||
.. code-block:: bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be indented more to be part of the bullet?
|
||
- Stop CephFS (if applicable) | ||
|
||
Stop CephFS cluster by reducing the number of ranks to 1, setting the cluster_down flag, and then failing the last rank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, indentation?
---------------------------- | ||
|
||
Set maintenance mode in bifrost to prevent nodes from automatically | ||
powering back on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other option is to power off via bifrost
|
||
|
||
Full Power on Procedure | ||
----------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a different heading style. Alternatively (preferably?) this section could go in another page called cold_start.rst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or change the page to be: "Shutdown and power on procedures"
* Shut down controllers | ||
* Shut down Ceph nodes (if applicable) | ||
* Shut down seed VM | ||
* Shut down Ansible control host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one isn't covered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should't make any assumptions about what or where this is. It may not be the seed hypervisor, which should also be called out explicitly.
* Perform a graceful shutdown of all virtual machine instances | ||
* Stop Ceph (if applicable) | ||
* Put all nodes into maintenance mode in Bifrost | ||
* Shut down compute nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this lists shutting down different types of nodes separately, but the procedure only stops the services separately, then shuts down all nodes at once.
* Remove nodes from maintenance mode in bifrost | ||
* Recover MariaDB cluster | ||
* Start Ceph (if applicable) | ||
* Check that all docker containers are running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: they haven't been started
|
||
.. code-block:: bash | ||
|
||
kayobe# kayobe overcloud database recover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it would be cleaner to stop the containers before shutdown, to avoid them starting up in a broken state.
Looks like quite a few comments still to be addressed. It's quite hard to review larger changes when force-pushed. Could you add commits, then squash at the end? |
following order: | ||
|
||
* Perform a graceful shutdown of all virtual machine instances | ||
* Stop Ceph (if applicable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be early for stopping Ceph, in case the OpenStack services are still using Ceph state (eg, image uploads). Perhaps stop Ceph at the point where the Ceph nodes are shut down.
sure, makes perfect sense - that was Gerrit habit ;) |
This would be nice to complete and merge. |
No description provided.