Skip to content

Commit 3ac0eb9

Browse files
authored
Merge branch 'stackhpc/2023.1' into conf/INFRA-629
2 parents 9cc9eb1 + bc83165 commit 3ac0eb9

File tree

50 files changed

+1185
-310
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+1185
-310
lines changed

.github/workflows/stackhpc-container-image-build.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ jobs:
149149
# Normally installed during host configure.
150150
- name: Install Docker Python SDK
151151
run: |
152-
sudo pip install docker
152+
sudo pip install docker 'requests<2.32.0'
153153
154154
- name: Get Kolla tag
155155
id: write-kolla-tag
@@ -253,7 +253,7 @@ jobs:
253253
if docker push $image; then
254254
echo "Pushed $image"
255255
break
256-
elif $i == 5; then
256+
elif [ $i -eq 5 ] ; then
257257
echo "Failed to push $image"
258258
echo $image >> image-build-logs/push-failed-images.txt
259259
else
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
==========
2+
CloudKitty
3+
==========
4+
5+
Configuring in kayobe-config
6+
============================
7+
8+
By default, CloudKitty uses Gnocchi and Ceilometer as the collector and fetcher
9+
backends. Unless the system has a specific reason not to, we recommend instead
10+
using Prometheus as the backend for both. The following instructions explain
11+
how to do this. Also, see the `Kolla Ansible docs on CloudKitty
12+
<https://docs.openstack.org/kolla-ansible/latest/reference/rating/cloudkitty-guide.html>`__
13+
for more details.
14+
15+
Enable CloudKitty and disable InfluxDB, as we are using OpenSearch as the
16+
storage backend. Set the following in ``kolla.yml``:
17+
18+
.. code-block:: yaml
19+
20+
kolla_enable_cloudkitty: true
21+
# Explicitly disable influxdb as we are using OpenSearch as the CloudKitty backend
22+
kolla_enable_influxdb: false
23+
24+
Set Prometheus as the backend for both the collector and fetcher, and
25+
Elasticsearch as the storage backend. Note that our fork of CloudKitty is
26+
patched so that the CloudKitty Elasticsearch V2 storage backend will also work
27+
with an OpenSearch cluster. Proper support for the V2 OpenSearch storage
28+
backend is still pending in Kolla-Ansible `here
29+
<https://review.opendev.org/c/openstack/kolla-ansible/+/898555>`__. Set the
30+
following in ``kolla/globals.yml``:
31+
32+
.. code-block:: yaml
33+
34+
cloudkitty_collector_backend: prometheus
35+
cloudkitty_fetcher_backend: prometheus
36+
cloudkitty_storage_backend: elasticsearch
37+
38+
If you have TLS enabled, you will also need to set the cafile for Prometheus
39+
and Elasticsearch. Set the following in ``kolla/globals.yml``.
40+
41+
.. code-block::
42+
43+
{% raw %}
44+
cloudkitty_prometheus_cafile: "{{ openstack_cacert }}"
45+
cloudkitty_elasticsearch_cafile: "{{ openstack_cacert }}"
46+
{% endraw %}
47+
48+
The default collection period is one hour, which is likely too long for most
49+
systems as CloudKitty charges by the **entire** collection period if any usage
50+
is seen within this timeframe. This is regardless of actual usage, meaning that
51+
even one minute will be charged as a full hour's usage. As a result, it is
52+
recommended to adjust the collection interval, ``period`` (in units of
53+
seconds), appropriately (e.g. ten minutes). Furthermore, when using Prometheus
54+
as the collector, you need to change the ``scope_key`` to match the metrics
55+
provided by the Prometheus OpenStack Exporter. Both of these can be achieved by
56+
setting the following in ``kolla/config/cloudkitty.conf``:
57+
58+
.. code-block:: console
59+
60+
[collect]
61+
scope_key = tenant_id
62+
period = 600
63+
64+
You will need to configure which metrics CloudKitty should track. The following
65+
example, set in ``kolla/config/cloudkitty/metrics.yml``, will track for VM flavors and
66+
the total utilised volume.
67+
68+
.. code-block:: yaml
69+
70+
metrics:
71+
openstack_nova_server_status:
72+
alt_name: instance
73+
groupby:
74+
- uuid
75+
- user_id
76+
- tenant_id
77+
metadata:
78+
- flavor_id
79+
- name
80+
mutate: MAP
81+
mutate_map:
82+
0.0: 1.0 # ACTIVE
83+
11.0: 1.0 # SHUTOFF
84+
12.0: 1.0 # SUSPENDED
85+
16.0: 1.0 # PAUSED
86+
unit: instance
87+
openstack_cinder_limits_volume_used_gb:
88+
alt_name: storage
89+
unit: GiB
90+
groupby:
91+
- tenant_id
92+
93+
If your system had Monasca deployed in the past, you likely have some
94+
relabelled attributes in the Prometheus OpenStack exporter. To account for
95+
this, you should either remove the custom relabelling (in
96+
``kolla/config/prometheus.yml``) or change your ``metrics.yml`` to use the
97+
correct attributes.
98+
99+
Post-configuration with openstack-config
100+
========================================
101+
102+
This is an example `openstack-config
103+
<https://github.com/stackhpc/openstack-config>`__ setup to create mappings for
104+
the metrics configured above. Note that the costs are scaled for the ten minute
105+
collection period, e.g. a flavor with 1 VCPU will cost 1 unit per hour.
106+
107+
.. code-block:: yaml
108+
109+
# Map flavors based on VCPUs
110+
openstack_ratings_hashmap_field_mappings:
111+
- service: instance
112+
name: flavor_id
113+
mappings:
114+
- value: '1' # tiny compute flavor (1 vcpu) with an OpenStack flavor ID of 1
115+
cost: 0.1666666666666666
116+
type: flat
117+
- value: '2' # small compute flavor (2 vcpus) with an OpenStack flavor ID of 2
118+
cost: 0.3333333333333333
119+
type: flat
120+
- value: '3' # medium compute flavor (3 vcpus) with an OpenStack flavor ID of 3
121+
cost: 0.5
122+
type: flat
123+
- value: '4' # large compute flavor (4 vcpus) with an OpenStack flavor ID of 4
124+
cost: 0.6666666666666666
125+
type: flat
126+
- value: '5' # xlarge compute flavor (8 vcpus) with an OpenStack flavor ID of 5
127+
cost: 1.3333333333333333
128+
type: flat
129+
- value: '6' # tiny 2 compute flavor (2 vcpus) with an OpenStack flavor ID of 6
130+
cost: 0.3333333333333333
131+
type: flat
132+
133+
# Map volumes based on GB
134+
openstack_ratings_hashmap_service_mappings:
135+
- service: storage
136+
cost: 0.16666666666666666
137+
type: flat
138+
139+
See the `OpenStack CloudKitty Ratings role
140+
<https://github.com/stackhpc/ansible-collection-openstack/tree/main/roles/os_ratings>`__
141+
for more details.

doc/source/configuration/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,4 @@ the various features provided.
2020
magnum-capi
2121
ci-cd
2222
security-hardening
23+
cloudkitty

doc/source/configuration/monitoring.rst

Lines changed: 81 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,8 @@ depending on your configuration, you may need set the
126126
``kolla_enable_prometheus_ceph_mgr_exporter`` variable to ``true`` in order to
127127
enable the ceph mgr exporter.
128128

129+
.. _os-capacity:
130+
129131
OpenStack Capacity
130132
==================
131133

@@ -149,9 +151,19 @@ project domain name in ``stackhpc-monitoring.yml``:
149151
stackhpc_os_capacity_openstack_region_name: <openstack_region_name>
150152
151153
Additionally, you should ensure these credentials have the correct permissions
152-
for the exporter. If you are deploying in a cloud with internal TLS, you may be required
153-
to disable certificate verification for the OpenStack Capacity exporter
154-
if your certificate is not signed by a trusted CA.
154+
for the exporter.
155+
156+
If you are deploying in a cloud with internal TLS, you may be required
157+
to provide a CA certificate for the OpenStack Capacity exporter if your
158+
certificate is not signed by a trusted CA. For example, to use a CA certificate
159+
named ``vault.crt`` that is also added to the Kolla containers:
160+
161+
.. code-block:: yaml
162+
163+
stackhpc_os_capacity_openstack_cacert: "{{ kayobe_env_config_path }}/kolla/certificates/ca/vault.crt"
164+
165+
Alternatively, to disable certificate verification for the OpenStack Capacity
166+
exporter:
155167

156168
.. code-block:: yaml
157169
@@ -169,3 +181,69 @@ If you notice ``HaproxyServerDown`` or ``HaproxyBackendDown`` prometheus
169181
alerts after deployment it's likely the os_exporter secrets have not been
170182
set correctly, double check you have entered the correct authentication
171183
information appropiate to your cloud and re-deploy.
184+
185+
Friendly Network Names
186+
=======================
187+
For operators that prefer to see descriptive or friendly interface names the
188+
following play can be run. This takes network names as defined in kayobe and
189+
relabels the devices/interfaces in Prometheus to make use of these names.
190+
191+
**Check considerations and known limitations to see if this is suitable in any
192+
given environment before applying.**
193+
194+
This reuses existing fields to provide good compatibility with existing
195+
dashboards and alerts.
196+
197+
To enable the change:
198+
199+
.. code-block:: console
200+
201+
kayobe playbook run etc/kayobe/ansible/prometheus-network-names.yml
202+
kayobe overcloud service reconfigure --kt prometheus
203+
204+
This first generates a template based on the prometheus.yml.j2
205+
``etc/kayobe/ansible/`` and which is further templated for use with
206+
kolla-ansible.
207+
This is then rolled out via service reconfigure.
208+
209+
210+
This helps Prometheus provide insights that can be more easily understood by
211+
those without an intimate understanding of a given site. Prometheus Node
212+
Exporter and cAdvisor both provide network statistics using the
213+
interface/device names. This play causes Prometheus to relabel these fields to
214+
human readable names based on the networks as defined in kayobe
215+
e.g. bond1.1838 may become storage_network.
216+
217+
The default labels are preserved with the prefix ``original_``.
218+
219+
* For node_exporter, ``device`` is then used for network names, while
220+
``original_device`` is used for the interface itself.
221+
* For cAdvisor, ``interface`` is used for network names, and
222+
``original_interface`` is used to preserve the interface name.
223+
224+
:Known-Limitations/Considerations/Requirements:
225+
226+
Before enabling this feature, the implications must be discussed with the
227+
customer. The following are key considerations for that conversation:
228+
229+
* Only network names defined within kayobe are within scope.
230+
* Tenant network interfaces, including SR-IOV are not considered or modified.
231+
* Only the interface directly attributed to a network will be relabelled.
232+
This may be a bond, a vlan tagged sub-interface, or both.
233+
The parent bond, or bond members are not relabelled unless they are
234+
captured within a distinct defined network.
235+
* Modified entries will be within existing labels. This may be breaking for
236+
anything that expects the original structure, including custom dashboards,
237+
alerting, billing, etc.
238+
* After applying, there will be inconsistency in the time-series db for the
239+
duration of the retention period i.e until previously ingested entries
240+
expire.
241+
The metrics gathered prior to applying these modifications will be unaltered,
242+
with all new metrics using the new structure.
243+
* The interface names and their purpose must be consistent and unique within
244+
the environment. i.e if eth0 is defined as admin_interface on one node, no
245+
other node can include a different network definition using eth0.
246+
This does not apply in the case when both devices are bond members.
247+
e.g. bond0 on a controller has eth0 and eth1 as members. bond1 on a compute
248+
uses eth0 and eth1 as members. This is not problematic as it is only
249+
the bond itself that is relabelled.

doc/source/configuration/release-train.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,33 @@ By default, HashiCorp images (Consul and Vault) are not synced from Docker Hub
147147
to the local Pulp. To sync these images, set ``stackhpc_sync_hashicorp_images``
148148
to ``true``.
149149

150+
Custom container images
151+
-----------------------
152+
153+
A custom list of container images can be synced to the local Pulp using the
154+
``stackhpc_pulp_repository_container_repos_extra`` and
155+
``stackhpc_pulp_distribution_container_extra`` variables.
156+
157+
.. code-block:: yaml
158+
159+
# List of extra container image repositories.
160+
stackhpc_pulp_repository_container_repos_extra:
161+
- name: "certbot/certbot"
162+
url: "https://registry-1.docker.io"
163+
policy: on_demand
164+
proxy_url: "{{ pulp_proxy_url }}"
165+
state: present
166+
include_tags: "nightly"
167+
required: True
168+
169+
# List of extra container image distributions.
170+
stackhpc_pulp_distribution_container_extra:
171+
- name: certbot
172+
repository: certbot/certbot
173+
base_path: certbot/certbot
174+
state: present
175+
required: True
176+
150177
Usage
151178
=====
152179

0 commit comments

Comments
 (0)