Use hostPID for ceph-mgr deployment

This change is to address a memory leak in the ceph-mgr deployment.
The leak has also been noted in:

https://review.opendev.org/#/c/711085

Without this change memory usage for the active ceph-mgr pod will
steadily increase by roughly 100MiB per hour until all available
memory has been exhausted. Reset messages will also be seen in the
active and standby ceph-mgr pod logs.

Sample messages:

---

0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1

---

The root cause of the resets and associated memory leak appears to
be due to multiple ceph pods sharing the same IP address (due to
hostNetwork being true) and PID (due to hostPID being false).
In the messages above the "1" at the end of the line is the PID.
Ceph appears to use the Version:IP:Port/PID (v2:10.0.0.226:6808/1)
tuple as a unique identifier. When hostPID is false conflicts arise.

Setting hostPID to true stops the reset messages and memory leak.

Change-Id: I9821637e75e8f89b59cf39842a6eb7e66518fa2c
This commit is contained in:
Frank Ritchie 2020-07-31 13:23:03 -04:00 committed by Chris Wedgwood
parent 3ce0170da8
commit 5909bcbdef

View File

@ -51,6 +51,7 @@ spec:
nodeSelector: nodeSelector:
{{ .Values.labels.mgr.node_selector_key }}: {{ .Values.labels.mgr.node_selector_value }} {{ .Values.labels.mgr.node_selector_key }}: {{ .Values.labels.mgr.node_selector_value }}
hostNetwork: true hostNetwork: true
hostPID: true
dnsPolicy: {{ .Values.pod.dns_policy }} dnsPolicy: {{ .Values.pod.dns_policy }}
initContainers: initContainers:
{{ tuple $envAll "mgr" list | include "helm-toolkit.snippets.kubernetes_entrypoint_init_container" | indent 8 }} {{ tuple $envAll "mgr" list | include "helm-toolkit.snippets.kubernetes_entrypoint_init_container" | indent 8 }}