diff --git a/doc/source/index.rst b/doc/source/index.rst
index 936eb8913..489813aad 100644
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@@ -8,7 +8,9 @@ Contents:
install/index
testing/index
-
+ monitoring/index
+ logging/index
+ readme
Indices and Tables
==================
diff --git a/doc/source/logging/elasticsearch.rst b/doc/source/logging/elasticsearch.rst
new file mode 100644
index 000000000..af0e7a515
--- /dev/null
+++ b/doc/source/logging/elasticsearch.rst
@@ -0,0 +1,196 @@
+Elasticsearch
+=============
+
+The Elasticsearch chart in openstack-helm-infra provides a distributed data
+store to index and analyze logs generated from the OpenStack-Helm services.
+The chart contains templates for:
+
+- Elasticsearch client nodes
+- Elasticsearch data nodes
+- Elasticsearch master nodes
+- An Elasticsearch exporter for providing cluster metrics to Prometheus
+- A cronjob for Elastic Curator to manage data indices
+
+Authentication
+--------------
+
+The Elasticsearch deployment includes a sidecar container that runs an Apache
+reverse proxy to add authentication capabilities for Elasticsearch. The
+username and password are configured under the Elasticsearch entry in the
+endpoints section of the chart's values.yaml.
+
+The configuration for Apache can be found under the conf.httpd key, and uses a
+helm-toolkit function that allows for including gotpl entries in the template
+directly. This allows the use of other templates, like the endpoint lookup
+function templates, directly in the configuration for Apache.
+
+Elasticsearch Service Configuration
+-----------------------------------
+
+The Elasticsearch service configuration file can be modified with a combination
+of pod environment variables and entries in the values.yaml file. Elasticsearch
+does not require much configuration out of the box, and the default values for
+these configuration settings are meant to provide a highly available cluster by
+default.
+
+The vital entries in this configuration file are:
+
+- path.data: The path at which to store the indexed data
+- path.repo: The location of any snapshot repositories to backup indexes
+- bootstrap.memory_lock: Ensures none of the JVM is swapped to disk
+- discovery.zen.minimum_master_nodes: Minimum required masters for the cluster
+
+The bootstrap.memory_lock entry ensures none of the JVM will be swapped to disk
+during execution, and setting this value to false will negatively affect the
+health of your Elasticsearch nodes. The discovery.zen.minimum_master_nodes flag
+registers the minimum number of masters required for your Elasticsearch cluster
+to register as healthy and functional.
+
+To read more about Elasticsearch's configuration file, please see the official
+documentation_.
+
+.. _documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html
+
+Elastic Curator
+---------------
+
+The Elasticsearch chart contains a cronjob to run Elastic Curator at specified
+intervals to manage the lifecycle of your indices. Curator can perform:
+
+- Take and send a snapshot of your indexes to a specified snapshot repository
+- Delete indexes older than a specified length of time
+- Restore indexes with previous index snapshots
+- Reindex an index into a new or preexisting index
+
+The full list of supported Curator actions can be found in the actions_ section of
+the official Curator documentation. The list of options available for those
+actions can be found in the options_ section of the Curator documentation.
+
+.. _actions: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/actions.html
+.. _options: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/options.html
+
+Curator's configuration is handled via entries in Elasticsearch's values.yaml
+file and must be overridden to achieve your index lifecycle management
+needs. Please note that any unused field should be left blank, as an entry of
+"None" will result in an exception, as Curator will read it as a Python NoneType
+insead of a value of None.
+
+The section for Curator's service configuration can be found at:
+
+::
+
+ conf:
+ curator:
+ config:
+ client:
+ hosts:
+ - elasticsearch-logging
+ port: 9200
+ url_prefix:
+ use_ssl: False
+ certificate:
+ client_cert:
+ client_key:
+ ssl_no_validate: False
+ http_auth:
+ timeout: 30
+ master_only: False
+ logging:
+ loglevel: INFO
+ logfile:
+ logformat: default
+ blacklist: ['elasticsearch', 'urllib3']
+
+Curator's actions are configured in the following section:
+
+::
+
+ conf:
+ curator:
+ action_file:
+ actions:
+ 1:
+ action: delete_indices
+ description: "Clean up ES by deleting old indices"
+ options:
+ timeout_override:
+ continue_if_exception: False
+ ignore_empty_list: True
+ disable_action: True
+ filters:
+ - filtertype: age
+ source: name
+ direction: older
+ timestring: '%Y.%m.%d'
+ unit: days
+ unit_count: 30
+ field:
+ stats_result:
+ epoch:
+ exclude: False
+
+The Elasticsearch chart contains commented example actions for deleting and
+snapshotting indexes older 30 days. Please note these actions are provided as a
+reference and are disabled by default to avoid any unexpected behavior against
+your indexes.
+
+Elasticsearch Exporter
+----------------------
+
+The Elasticsearch chart contains templates for an exporter to provide metrics
+for Prometheus. These metrics provide insight into the performance and overall
+health of your Elasticsearch cluster. Please note monitoring for Elasticsearch
+is disabled by default, and must be enabled with the following override:
+
+
+::
+
+ monitoring:
+ prometheus:
+ enabled: true
+
+
+The Elasticsearch exporter uses the same service annotations as the other
+exporters, and no additional configuration is required for Prometheus to target
+the Elasticsearch exporter for scraping. The Elasticsearch exporter is
+configured with command line flags, and the flags' default values can be found
+under the following key in the values.yaml file:
+
+::
+
+ conf:
+ prometheus_elasticsearch_exporter:
+ es:
+ all: true
+ timeout: 20s
+
+The configuration keys configure the following behaviors:
+
+- es.all: Gather information from all nodes, not just the connecting node
+- es.timeout: Timeout for metrics queries
+
+More information about the Elasticsearch exporter can be found on the exporter's
+GitHub_ page.
+
+.. _GitHub: https://github.com/justwatchcom/elasticsearch_exporter
+
+
+Snapshot Repositories
+---------------------
+
+Before Curator can store snapshots in a specified repository, Elasticsearch must
+register the configured repository. To achieve this, the Elasticsearch chart
+contains a job for registering an s3 snapshot repository backed by radosgateway.
+This job is disabled by default as the curator actions for snapshots are
+disabled by default. To enable the snapshot job, the
+conf.elasticsearch.snapshots.enabled flag must be set to true. The following
+configuration keys are relevant:
+
+- conf.elasticsearch.snapshots.enabled: Enable snapshot repositories
+- conf.elasticsearch.snapshots.bucket: Name of the RGW s3 bucket to use
+- conf.elasticsearch.snapshots.repositories: Name of repositories to create
+
+More information about Elasticsearch repositories can be found in the official
+Elasticsearch snapshot_ documentation:
+
+.. _snapshot: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html#_repositories
diff --git a/doc/source/logging/fluent-logging.rst b/doc/source/logging/fluent-logging.rst
new file mode 100644
index 000000000..b3ea41899
--- /dev/null
+++ b/doc/source/logging/fluent-logging.rst
@@ -0,0 +1,279 @@
+Fluent-logging
+===============
+
+The fluent-logging chart in openstack-helm-infra provides the base for a
+centralized logging platform for OpenStack-Helm. The chart combines two
+services, Fluentbit and Fluentd, to gather logs generated by the services,
+filter on or add metadata to logged events, then forward them to Elasticsearch
+for indexing.
+
+Fluentbit
+---------
+
+Fluentbit runs as a log-collecting component on each host in the cluster, and
+can be configured to target specific log locations on the host. The Fluentbit_
+configuration schema can be found on the official Fluentbit website.
+
+.. _Fluentbit: http://fluentbit.io/documentation/0.12/configuration/schema.html
+
+Fluentbit provides a set of plug-ins for ingesting and filtering various log
+types. These plug-ins include:
+
+- Tail: Tails a defined file for logged events
+- Kube: Adds Kubernetes metadata to a logged event
+- Systemd: Provides ability to collect logs from the journald daemon
+- Syslog: Provides the ability to collect logs from a Unix socket (TCP or UDP)
+
+The complete list of plugins can be found in the configuration_ section of the
+Fluentbit documentation.
+
+.. _configuration: http://fluentbit.io/documentation/current/configuration/
+
+Fluentbit uses parsers to turn unstructured log entries into structured entries
+to make processing and filtering events easier. The two formats supported are
+JSON maps and regular expressions. More information about Fluentbit's parsing
+abilities can be found in the parsers_ section of Fluentbit's documentation.
+
+.. _parsers: http://fluentbit.io/documentation/current/parser/
+
+Fluentbit's service and parser configurations are defined via the values.yaml
+file, which allows for custom definitions of inputs, filters and outputs for
+your logging needs.
+Fluentbit's configuration can be found under the following key:
+
+::
+
+ conf:
+ fluentbit:
+ - service:
+ header: service
+ Flush: 1
+ Daemon: Off
+ Log_Level: info
+ Parsers_File: parsers.conf
+ - containers_tail:
+ header: input
+ Name: tail
+ Tag: kube.*
+ Path: /var/log/containers/*.log
+ Parser: docker
+ DB: /var/log/flb_kube.db
+ Mem_Buf_Limit: 5MB
+ - kube_filter:
+ header: filter
+ Name: kubernetes
+ Match: kube.*
+ Merge_JSON_Log: On
+ - fluentd_output:
+ header: output
+ Name: forward
+ Match: "*"
+ Host: ${FLUENTD_HOST}
+ Port: ${FLUENTD_PORT}
+
+Fluentbit is configured by default to capture logs at the info log level. To
+change this, override the Log_Level key with the appropriate levels, which are
+documented in Fluentbit's configuration_.
+
+Fluentbit's parser configuration can be found under the following key:
+
+::
+
+ conf:
+ parsers:
+ - docker:
+ header: parser
+ Name: docker
+ Format: json
+ Time_Key: time
+ Time_Format: "%Y-%m-%dT%H:%M:%S.%L"
+ Time_Keep: On
+
+The values for the fluentbit and parsers keys are consumed by a fluent-logging
+helper template that produces the appropriate configurations for the relevant
+sections. Each list item (keys prefixed with a '-') represents a section in the
+configuration files, and the arbitrary name of the list item should represent a
+logical description of the section defined. The header key represents the type
+of definition (filter, input, output, service or parser), and the remaining
+entries will be rendered as space delimited configuration keys and values. For
+example, the definitions above would result in the following:
+
+::
+
+ [SERVICE]
+ Daemon false
+ Flush 1
+ Log_Level info
+ Parsers_File parsers.conf
+ [INPUT]
+ DB /var/log/flb_kube.db
+ Mem_Buf_Limit 5MB
+ Name tail
+ Parser docker
+ Path /var/log/containers/*.log
+ Tag kube.*
+ [FILTER]
+ Match kube.*
+ Merge_JSON_Log true
+ Name kubernetes
+ [OUTPUT]
+ Host ${FLUENTD_HOST}
+ Match *
+ Name forward
+ Port ${FLUENTD_PORT}
+ [PARSER]
+ Format json
+ Name docker
+ Time_Format %Y-%m-%dT%H:%M:%S.%L
+ Time_Keep true
+ Time_Key time
+
+Fluentd
+-------
+
+Fluentd runs as a forwarding service that receives event entries from Fluentbit
+and routes them to the appropriate destination. By default, Fluentd will route
+all entries received from Fluentbit to Elasticsearch for indexing. The
+Fluentd_ configuration schema can be found at the official Fluentd website.
+
+.. _Fluentd: https://docs.fluentd.org/v0.12/articles/config-file
+
+Fluentd's configuration is handled in the values.yaml file in fluent-logging.
+Similar to Fluentbit, configuration overrides provide flexibility in defining
+custom routes for tagged log events. The configuration can be found under the
+following key:
+
+::
+
+ conf:
+ fluentd:
+ - fluentbit_forward:
+ header: source
+ type: forward
+ port: "#{ENV['FLUENTD_PORT']}"
+ bind: 0.0.0.0
+ - elasticsearch:
+ header: match
+ type: elasticsearch
+ expression: "**"
+ include_tag_key: true
+ host: "#{ENV['ELASTICSEARCH_HOST']}"
+ port: "#{ENV['ELASTICSEARCH_PORT']}"
+ logstash_format: true
+ buffer_chunk_limit: 10M
+ buffer_queue_limit: 32
+ flush_interval: "20"
+ max_retry_wait: 300
+ disable_retry_limit: ""
+
+The values for the fluentd keys are consumed by a fluent-logging helper template
+that produces appropriate configurations for each directive desired. The list
+items (keys prefixed with a '-') represent sections in the configuration file,
+and the name of each list item should represent a logical description of the
+section defined. The header key represents the type of definition (name of the
+fluentd plug-in used), and the expression key is used when the plug-in requires
+a pattern to match against (example: matches on certain input patterns). The
+remaining entries will be rendered as space delimited configuration keys and
+values. For example, the definition above would result in the following:
+
+::
+
+
+
+ buffer_chunk_limit 10M
+ buffer_queue_limit 32
+ disable_retry_limit
+ flush_interval 20s
+ host "#{ENV['ELASTICSEARCH_HOST']}"
+ include_tag_key true
+ logstash_format true
+ max_retry_wait 300
+ port "#{ENV['ELASTICSEARCH_PORT']}"
+ @type elasticsearch
+
+
+Some fluentd plug-ins require nested definitions. The fluentd helper template
+can handle these definitions with the following structure:
+
+::
+
+ conf:
+ td_agent:
+ - fluentbit_forward:
+ header: source
+ type: forward
+ port: "#{ENV['FLUENTD_PORT']}"
+ bind: 0.0.0.0
+ - log_transformer:
+ header: filter
+ type: record_transformer
+ expression: "foo.bar"
+ inner_def:
+ - record_transformer:
+ header: record
+ hostname: my_host
+ tag: my_tag
+
+In this example, the my_transformer list will generate a nested configuration
+entry in the log_transformer section. The nested definitions are handled by
+supplying a list as the value for an arbitrary key, and the list value will
+indicate the entry should be handled as a nested definition. The helper
+template will render the above example key/value pairs as the following:
+
+::
+
+
+
+
+ hostname my_host
+ tag my_tag
+
+ @type record_transformer
+
+
+Fluentd Exporter
+----------------------
+
+The fluent-logging chart contains templates for an exporter to provide metrics
+for Fluentd. These metrics provide insight into Fluentd's performance. Please
+note monitoring for Fluentd is disabled by default, and must be enabled with the
+following override:
+
+::
+
+ monitoring:
+ prometheus:
+ enabled: true
+
+
+The Fluentd exporter uses the same service annotations as the other exporters,
+and no additional configuration is required for Prometheus to target the
+Fluentd exporter for scraping. The Fluentd exporter is configured with command
+line flags, and the flags' default values can be found under the following key
+in the values.yaml file:
+
+::
+
+ conf:
+ fluentd_exporter:
+ log:
+ format: "logger:stdout?json=true"
+ level: "info"
+
+The configuration keys configure the following behaviors:
+
+- log.format: Define the logger used and format of the output
+- log.level: Log level for the exporter to use
+
+More information about the Fluentd exporter can be found on the exporter's
+GitHub_ page.
+
+.. _GitHub: https://github.com/V3ckt0r/fluentd_exporter
diff --git a/doc/source/logging/index.rst b/doc/source/logging/index.rst
new file mode 100644
index 000000000..176293e0c
--- /dev/null
+++ b/doc/source/logging/index.rst
@@ -0,0 +1,11 @@
+OpenStack-Helm Logging
+======================
+
+Contents:
+
+.. toctree::
+ :maxdepth: 2
+
+ elasticsearch
+ fluent-logging
+ kibana
diff --git a/doc/source/logging/kibana.rst b/doc/source/logging/kibana.rst
new file mode 100644
index 000000000..141d80dae
--- /dev/null
+++ b/doc/source/logging/kibana.rst
@@ -0,0 +1,76 @@
+Kibana
+======
+
+The Kibana chart in OpenStack-Helm Infra provides visualization for logs indexed
+into Elasticsearch. These visualizations provide the means to view logs captured
+from services deployed in cluster and targeted for collection by Fluentbit.
+
+Authentication
+--------------
+
+The Kibana deployment includes a sidecar container that runs an Apache reverse
+proxy to add authentication capabilities for Kibana. The username and password
+are configured under the Kibana entry in the endpoints section of the chart's
+values.yaml.
+
+The configuration for Apache can be found under the conf.httpd key, and uses a
+helm-toolkit function that allows for including gotpl entries in the template
+directly. This allows the use of other templates, like the endpoint lookup
+function templates, directly in the configuration for Apache.
+
+Configuration
+-------------
+
+Kibana's configuration is driven by the chart's values.yaml file. The configuration
+options are found under the following keys:
+
+::
+
+ conf:
+ elasticsearch:
+ pingTimeout: 1500
+ preserveHost: true
+ requestTimeout: 30000
+ shardTimeout: 0
+ startupTimeout: 5000
+ il8n:
+ defaultLocale: en
+ kibana:
+ defaultAppId: discover
+ index: .kibana
+ logging:
+ quiet: false
+ silent: false
+ verbose: false
+ ops:
+ interval: 5000
+ server:
+ host: localhost
+ maxPayloadBytes: 1048576
+ port: 5601
+ ssl:
+ enabled: false
+
+The case of the sub-keys is important as these values are injected into
+Kibana's configuration configmap with the toYaml function. More information on
+the configuration options and available settings can be found in the official
+Kibana documentation_.
+
+.. _documentation: https://www.elastic.co/guide/en/kibana/current/settings.html
+
+Installation
+------------
+
+.. code_block: bash
+
+helm install --namespace= local/kibana --name=kibana
+
+Setting Time Field
+------------------
+
+For Kibana to successfully read the logs from Elasticsearch's indexes, the time
+field will need to be manually set after Kibana has successfully deployed. Upon
+visiting the Kibana dashboard for the first time, a prompt will appear to choose the
+time field with a drop down menu. The default time field for Elasticsearch indexes
+is '@timestamp'. Once this field is selected, the default view for querying log entries
+can be found by selecting the "Discover"
diff --git a/doc/source/monitoring/grafana.rst b/doc/source/monitoring/grafana.rst
new file mode 100644
index 000000000..61d1f0a72
--- /dev/null
+++ b/doc/source/monitoring/grafana.rst
@@ -0,0 +1,89 @@
+Grafana
+=======
+
+The Grafana chart in OpenStack-Helm Infra provides default dashboards for the
+metrics gathered with Prometheus. The default dashboards include visualizations
+for metrics on: Ceph, Kubernetes, nodes, containers, MySQL, RabbitMQ, and
+OpenStack.
+
+Configuration
+-------------
+
+Grafana
+~~~~~~~
+
+Grafana's configuration is driven with the chart's values.YAML file, and the
+relevant configuration entries are under the following key:
+
+::
+
+ conf:
+ grafana:
+ paths:
+ server:
+ database:
+ session:
+ security:
+ users:
+ log:
+ log.console:
+ dashboards.json:
+ grafana_net:
+
+These keys correspond to sections in the grafana.ini configuration file, and the
+to_ini helm-toolkit function will render these values into the appropriate
+format in grafana.ini. The list of options for these keys can be found in the
+official Grafana configuration_ documentation.
+
+.. _configuration: http://docs.grafana.org/installation/configuration/
+
+Prometheus Data Source
+~~~~~~~~~~~~~~~~~~~~~~
+
+Grafana requires configured data sources for gathering metrics for display in
+its dashboards. The configuration options for datasources are found under the
+following key in Grafana's values.YAML file:
+
+::
+
+ conf:
+ provisioning:
+ datasources;
+ monitoring:
+ name: prometheus
+ type: prometheus
+ access: proxy
+ orgId: 1
+ editable: true
+ basicAuth: true
+
+The Grafana chart will use the keys under each entry beneath
+.conf.provisioning.datasources as inputs to a helper template that will render
+the appropriate configuration for the data source. The key for each data source
+(monitoring in the above example) should map to an entry in the endpoints
+section in the chart's values.yaml, as the data source's URL and authentication
+credentials will be populated by the values defined in the defined endpoint.
+
+.. _sources: http://docs.grafana.org/features/datasources/
+
+Dashboards
+~~~~~~~~~~
+
+Grafana adds dashboards during installation with dashboards defined in YAML under
+the following key:
+
+::
+
+ conf:
+ dashboards:
+
+
+These YAML definitiions are transformed to JSON are added to Grafana's
+configuration configmap and mounted to the Grafana pods dynamically, allowing for
+flexibility in defining and adding custom dashboards to Grafana. Dashboards can
+be added by inserting a new key along with a YAML dashboard definition as the
+value. Additional dashboards can be found by searching on Grafana's dashboards_
+page or you can define your own. A json-to-YAML tool, such as json2yaml_ , will
+help transform any custom or new dashboards from JSON to YAML.
+
+.. _json2yaml: https://www.json2yaml.com/
diff --git a/doc/source/monitoring/index.rst b/doc/source/monitoring/index.rst
new file mode 100644
index 000000000..aa87e305c
--- /dev/null
+++ b/doc/source/monitoring/index.rst
@@ -0,0 +1,11 @@
+OpenStack-Helm Monitoring
+=========================
+
+Contents:
+
+.. toctree::
+ :maxdepth: 2
+
+ grafana
+ prometheus
+ nagios
diff --git a/doc/source/monitoring/nagios.rst b/doc/source/monitoring/nagios.rst
new file mode 100644
index 000000000..af970cf6b
--- /dev/null
+++ b/doc/source/monitoring/nagios.rst
@@ -0,0 +1,365 @@
+Nagios
+======
+
+The Nagios chart in openstack-helm-infra can be used to provide an alarming
+service that's tightly coupled to an OpenStack-Helm deployment. The Nagios
+chart uses a custom Nagios core image that includes plugins developed to query
+Prometheus directly for scraped metrics and triggered alarms, query the Ceph
+manager endpoints directly to determine the health of a Ceph cluster, and to
+query Elasticsearch for logged events that meet certain criteria (experimental).
+
+Authentication
+--------------
+
+The Nagios deployment includes a sidecar container that runs an Apache reverse
+proxy to add authentication capabilities for Nagios. The username and password
+are configured under the nagios entry in the endpoints section of the chart's
+values.yaml.
+
+The configuration for Apache can be found under the conf.httpd key, and uses a
+helm-toolkit function that allows for including gotpl entries in the template
+directly. This allows the use of other templates, like the endpoint lookup
+function templates, directly in the configuration for Apache.
+
+Image Plugins
+-------------
+
+The Nagios image used contains custom plugins that can be used for the defined
+service check commands. These plugins include:
+
+- check_prometheus_metric.py: Query Prometheus for a specific metric and value
+- check_exporter_health_metric.sh: Nagios plugin to query prometheus exporter
+- check_rest_get_api.py: Check REST API status
+- check_update_prometheus_hosts.py: Queries Prometheus, updates Nagios config
+- query_prometheus_alerts.py: Nagios plugin to query prometheus ALERTS metric
+
+More information about the Nagios image and plugins can be found here_.
+
+.. _here: https://github.com/att-comdev/nagios
+
+
+Nagios Service Configuration
+----------------------------
+
+The Nagios service is configured via the following section in the chart's
+values file:
+
+::
+
+ conf:
+ nagios:
+ nagios:
+ log_file: /opt/nagios/var/log/nagios.log
+ cfg_file:
+ - /opt/nagios/etc/nagios_objects.cfg
+ - /opt/nagios/etc/objects/commands.cfg
+ - /opt/nagios/etc/objects/contacts.cfg
+ - /opt/nagios/etc/objects/timeperiods.cfg
+ - /opt/nagios/etc/objects/templates.cfg
+ - /opt/nagios/etc/objects/prometheus_discovery_objects.cfg
+ object_cache_file: /opt/nagios/var/objects.cache
+ precached_object_file: /opt/nagios/var/objects.precache
+ resource_file: /opt/nagios/etc/resource.cfg
+ status_file: /opt/nagios/var/status.dat
+ status_update_interval: 10
+ nagios_user: nagios
+ nagios_group: nagios
+ check_external_commands: 1
+ command_file: /opt/nagios/var/rw/nagios.cmd
+ lock_file: /var/run/nagios.lock
+ temp_file: /opt/nagios/var/nagios.tmp
+ temp_path: /tmp
+ event_broker_options: -1
+ log_rotation_method: d
+ log_archive_path: /opt/nagios/var/log/archives
+ use_syslog: 1
+ log_service_retries: 1
+ log_host_retries: 1
+ log_event_handlers: 1
+ log_initial_states: 0
+ log_current_states: 1
+ log_external_commands: 1
+ log_passive_checks: 1
+ service_inter_check_delay_method: s
+ max_service_check_spread: 30
+ service_interleave_factor: s
+ host_inter_check_delay_method: s
+ max_host_check_spread: 30
+ max_concurrent_checks: 60
+ check_result_reaper_frequency: 10
+ max_check_result_reaper_time: 30
+ check_result_path: /opt/nagios/var/spool/checkresults
+ max_check_result_file_age: 3600
+ cached_host_check_horizon: 15
+ cached_service_check_horizon: 15
+ enable_predictive_host_dependency_checks: 1
+ enable_predictive_service_dependency_checks: 1
+ soft_state_dependencies: 0
+ auto_reschedule_checks: 0
+ auto_rescheduling_interval: 30
+ auto_rescheduling_window: 180
+ service_check_timeout: 60
+ host_check_timeout: 60
+ event_handler_timeout: 60
+ notification_timeout: 60
+ ocsp_timeout: 5
+ perfdata_timeout: 5
+ retain_state_information: 1
+ state_retention_file: /opt/nagios/var/retention.dat
+ retention_update_interval: 60
+ use_retained_program_state: 1
+ use_retained_scheduling_info: 1
+ retained_host_attribute_mask: 0
+ retained_service_attribute_mask: 0
+ retained_process_host_attribute_mask: 0
+ retained_process_service_attribute_mask: 0
+ retained_contact_host_attribute_mask: 0
+ retained_contact_service_attribute_mask: 0
+ interval_length: 1
+ check_workers: 4
+ check_for_updates: 1
+ bare_update_check: 0
+ use_aggressive_host_checking: 0
+ execute_service_checks: 1
+ accept_passive_service_checks: 1
+ execute_host_checks: 1
+ accept_passive_host_checks: 1
+ enable_notifications: 1
+ enable_event_handlers: 1
+ process_performance_data: 0
+ obsess_over_services: 0
+ obsess_over_hosts: 0
+ translate_passive_host_checks: 0
+ passive_host_checks_are_soft: 0
+ check_for_orphaned_services: 1
+ check_for_orphaned_hosts: 1
+ check_service_freshness: 1
+ service_freshness_check_interval: 60
+ check_host_freshness: 0
+ host_freshness_check_interval: 60
+ additional_freshness_latency: 15
+ enable_flap_detection: 1
+ low_service_flap_threshold: 5.0
+ high_service_flap_threshold: 20.0
+ low_host_flap_threshold: 5.0
+ high_host_flap_threshold: 20.0
+ date_format: us
+ use_regexp_matching: 1
+ use_true_regexp_matching: 0
+ daemon_dumps_core: 0
+ use_large_installation_tweaks: 0
+ enable_environment_macros: 0
+ debug_level: 0
+ debug_verbosity: 1
+ debug_file: /opt/nagios/var/nagios.debug
+ max_debug_file_size: 1000000
+ allow_empty_hostgroup_assignment: 1
+ illegal_macro_output_chars: "`~$&|'<>\""
+
+Nagios CGI Configuration
+------------------------
+
+The Nagios CGI configuration is defined via the following section in the chart's
+values file:
+
+::
+
+ conf:
+ nagios:
+ cgi:
+ main_config_file: /opt/nagios/etc/nagios.cfg
+ physical_html_path: /opt/nagios/share
+ url_html_path: /nagios
+ show_context_help: 0
+ use_pending_states: 1
+ use_authentication: 0
+ use_ssl_authentication: 0
+ authorized_for_system_information: "*"
+ authorized_for_configuration_information: "*"
+ authorized_for_system_commands: nagiosadmin
+ authorized_for_all_services: "*"
+ authorized_for_all_hosts: "*"
+ authorized_for_all_service_commands: "*"
+ authorized_for_all_host_commands: "*"
+ default_statuswrl_layout: 4
+ ping_syntax: /bin/ping -n -U -c 5 $HOSTADDRESS$
+ refresh_rate: 90
+ result_limit: 100
+ escape_html_tags: 1
+ action_url_target: _blank
+ notes_url_target: _blank
+ lock_author_names: 1
+ navbar_search_for_addresses: 1
+ navbar_search_for_aliases: 1
+
+Nagios Host Configuration
+-------------------------
+
+The Nagios chart includes a single host definition for the Prometheus instance
+queried for metrics. The host definition can be found under the following
+values key:
+
+::
+
+ conf:
+ nagios:
+ hosts:
+ - prometheus:
+ use: linux-server
+ host_name: prometheus
+ alias: "Prometheus Monitoring"
+ address: 127.0.0.1
+ hostgroups: prometheus-hosts
+ check_command: check-prometheus-host-alive
+
+The address for the Prometheus host is defined by the PROMETHEUS_SERVICE
+environment variable in the deployment template, which is determined by the
+monitoring entry in the Nagios chart's endpoints section. The endpoint is then
+available as a macro for Nagios to use in all Prometheus based queries. For
+example:
+
+::
+
+ - check_prometheus_host_alive:
+ command_name: check-prometheus-host-alive
+ command_line: "$USER1$/check_rest_get_api.py --url $USER2$ --warning_response_seconds 5 --critical_response_seconds 10"
+
+The $USER2$ macro above corresponds to the Prometheus endpoint defined in the
+PROMETHEUS_SERVICE environment variable. All checks that use the
+prometheus-hosts hostgroup will map back to the Prometheus host defined by this
+endpoint.
+
+Nagios HostGroup Configuration
+------------------------------
+
+The Nagios chart includes configuration values for defined host groups under the
+following values key:
+
+::
+
+ conf:
+ nagios:
+ host_groups:
+ - prometheus-hosts:
+ hostgroup_name: prometheus-hosts
+ alias: "Prometheus Virtual Host"
+ - base-os:
+ hostgroup_name: base-os
+ alias: "base-os"
+
+These hostgroups are used to define which group of hosts should be targeted by
+a particular nagios check. An example of a check that targets Prometheus for a
+specific metric query would be:
+
+::
+
+ - check_ceph_monitor_quorum:
+ use: notifying_service
+ hostgroup_name: prometheus-hosts
+ service_description: "CEPH_quorum"
+ check_command: check_prom_alert!ceph_monitor_quorum_low!CRITICAL- ceph monitor quorum does not exist!OK- ceph monitor quorum exists
+ check_interval: 60
+
+An example of a check that targets all hosts for a base-os type check (memory
+usage, latency, etc) would be:
+
+::
+
+ - check_memory_usage:
+ use: notifying_service
+ service_description: Memory_usage
+ check_command: check_memory_usage
+ hostgroup_name: base-os
+
+These two host groups allow for a wide range of targeted checks for determining
+the status of all components of an OpenStack-Helm deployment.
+
+Nagios Command Configuration
+----------------------------
+
+The Nagios chart includes configuration values for the command definitions Nagios
+will use when executing service checks. These values are found under the
+following key:
+
+::
+
+ conf:
+ nagios:
+ commands:
+ - send_service_snmp_trap:
+ command_name: send_service_snmp_trap
+ command_line: "$USER1$/send_service_trap.sh '$USER8$' '$HOSTNAME$' '$SERVICEDESC$' $SERVICESTATEID$ '$SERVICEOUTPUT$' '$USER4$' '$USER5$'"
+ - send_host_snmp_trap:
+ command_name: send_host_snmp_trap
+ command_line: "$USER1$/send_host_trap.sh '$USER8$' '$HOSTNAME$' $HOSTSTATEID$ '$HOSTOUTPUT$' '$USER4$' '$USER5$'"
+ - send_service_http_post:
+ command_name: send_service_http_post
+ command_line: "$USER1$/send_http_post_event.py --type service --hostname '$HOSTNAME$' --servicedesc '$SERVICEDESC$' --state_id $SERVICESTATEID$ --output '$SERVICEOUTPUT$' --monitoring_hostname '$HOSTNAME$' --primary_url '$USER6$' --secondary_url '$USER7$'"
+ - send_host_http_post:
+ command_name: send_host_http_post
+ command_line: "$USER1$/send_http_post_event.py --type host --hostname '$HOSTNAME$' --state_id $HOSTSTATEID$ --output '$HOSTOUTPUT$' --monitoring_hostname '$HOSTNAME$' --primary_url '$USER6$' --secondary_url '$USER7$'"
+ - check_prometheus_host_alive:
+ command_name: check-prometheus-host-alive
+ command_line: "$USER1$/check_rest_get_api.py --url $USER2$ --warning_response_seconds 5 --critical_response_seconds 10"
+
+The list of defined commands can be modified with configuration overrides, which
+allows for the ability define commands specific to an infrastructure deployment.
+These commands can include querying Prometheus for metrics on dependencies for a
+service to determine whether an alert should be raised, executing checks on each
+host to determine network latency or file system usage, or checking each node
+for issues with ntp clock skew.
+
+Note: Since the conf.nagios.commands key contains a list of the defined commands,
+the entire contents of conf.nagios.commands will need to be overridden if
+additional commands are desired (due to the immutable nature of lists).
+
+Nagios Service Check Configuration
+----------------------------------
+
+The Nagios chart includes configuration values for the service checks Nagios
+will execute. These service check commands can be found under the following
+key:
+
+::
+ conf:
+ nagios:
+ services:
+ - notifying_service:
+ name: notifying_service
+ use: generic-service
+ flap_detection_enabled: 0
+ process_perf_data: 0
+ contact_groups: snmp_and_http_notifying_contact_group
+ check_interval: 60
+ notification_interval: 120
+ retry_interval: 30
+ register: 0
+ - check_ceph_health:
+ use: notifying_service
+ hostgroup_name: base-os
+ service_description: "CEPH_health"
+ check_command: check_ceph_health
+ check_interval: 300
+ - check_hosts_health:
+ use: generic-service
+ hostgroup_name: prometheus-hosts
+ service_description: "Nodes_health"
+ check_command: check_prom_alert!K8SNodesNotReady!CRITICAL- One or more nodes are not ready.
+ check_interval: 60
+ - check_prometheus_replicas:
+ use: notifying_service
+ hostgroup_name: prometheus-hosts
+ service_description: "Prometheus_replica-count"
+ check_command: check_prom_alert_with_labels!replicas_unavailable_statefulset!statefulset="prometheus"!statefulset {statefulset} has lesser than configured replicas
+ check_interval: 60
+
+The Nagios service configurations define the checks Nagios will perform. These
+checks contain keys for defining: the service type to use, the host group to
+target, the description of the service check, the command the check should use,
+and the interval at which to trigger the service check. These services can also
+be extended to provide additional insight into the overall status of a
+particular service. These services also allow the ability to define advanced
+checks for determining the overall health and liveness of a service. For
+example, a service check could trigger an alarm for the OpenStack services when
+Nagios detects that the relevant database and message queue has become
+unresponsive.
diff --git a/doc/source/monitoring/prometheus.rst b/doc/source/monitoring/prometheus.rst
new file mode 100644
index 000000000..446589ee4
--- /dev/null
+++ b/doc/source/monitoring/prometheus.rst
@@ -0,0 +1,338 @@
+Prometheus
+==========
+
+The Prometheus chart in openstack-helm-infra provides a time series database and
+a strong querying language for monitoring various components of OpenStack-Helm.
+Prometheus gathers metrics by scraping defined service endpoints or pods at
+specified intervals and indexing them in the underlying time series database.
+
+Authentication
+--------------
+
+The Prometheus deployment includes a sidecar container that runs an Apache
+reverse proxy to add authentication capabilities for Prometheus. The
+username and password are configured under the monitoring entry in the endpoints
+section of the chart's values.yaml.
+
+The configuration for Apache can be found under the conf.httpd key, and uses a
+helm-toolkit function that allows for including gotpl entries in the template
+directly. This allows the use of other templates, like the endpoint lookup
+function templates, directly in the configuration for Apache.
+
+Prometheus Service configuration
+--------------------------------
+
+The Prometheus service is configured via command line flags set during runtime.
+These flags include: setting the configuration file, setting log levels, setting
+characteristics of the time series database, and enabling the web admin API for
+snapshot support. These settings can be configured via the values tree at:
+
+::
+
+ conf:
+ prometheus:
+ command_line_flags:
+ log.level: info
+ query.max_concurrency: 20
+ query.timeout: 2m
+ storage.tsdb.path: /var/lib/prometheus/data
+ storage.tsdb.retention: 7d
+ web.enable_admin_api: false
+ web.enable_lifecycle: false
+
+The Prometheus configuration file contains the definitions for scrape targets
+and the location of the rules files for triggering alerts on scraped metrics.
+The configuration file is defined in the values file, and can be found at:
+
+::
+
+ conf:
+ prometheus:
+ scrape_configs: |
+
+By defining the configuration via the values file, an operator can override all
+configuration components of the Prometheus deployment at runtime.
+
+Kubernetes Endpoint Configuration
+---------------------------------
+
+The Prometheus chart in openstack-helm-infra uses the built-in service discovery
+mechanisms for Kubernetes endpoints and pods to automatically configure scrape
+targets. Functions added to helm-toolkit allows configuration of these targets
+via annotations that can be applied to any service or pod that exposes metrics
+for Prometheus, whether a service for an application-specific exporter or an
+application that provides a metrics endpoint via its service. The values in
+these functions correspond to entries in the monitoring tree under the
+prometheus key in a chart's values.yaml file.
+
+
+The functions definitions are below:
+
+::
+
+ {{- define "helm-toolkit.snippets.prometheus_service_annotations" -}}
+ {{- $config := index . 0 -}}
+ {{- if $config.scrape }}
+ prometheus.io/scrape: {{ $config.scrape | quote }}
+ {{- end }}
+ {{- if $config.scheme }}
+ prometheus.io/scheme: {{ $config.scheme | quote }}
+ {{- end }}
+ {{- if $config.path }}
+ prometheus.io/path: {{ $config.path | quote }}
+ {{- end }}
+ {{- if $config.port }}
+ prometheus.io/port: {{ $config.port | quote }}
+ {{- end }}
+ {{- end -}}
+
+::
+
+ {{- define "helm-toolkit.snippets.prometheus_pod_annotations" -}}
+ {{- $config := index . 0 -}}
+ {{- if $config.scrape }}
+ prometheus.io/scrape: {{ $config.scrape | quote }}
+ {{- end }}
+ {{- if $config.path }}
+ prometheus.io/path: {{ $config.path | quote }}
+ {{- end }}
+ {{- if $config.port }}
+ prometheus.io/port: {{ $config.port | quote }}
+ {{- end }}
+ {{- end -}}
+
+These functions render the following annotations:
+
+- prometheus.io/scrape: Must be set to true for Prometheus to scrape target
+- prometheus.io/scheme: Overrides scheme used to scrape target if not http
+- prometheus.io/path: Overrides path used to scrape target metrics if not /metrics
+- prometheus.io/port: Overrides port to scrape metrics on if not service's default port
+
+Each chart that can be targeted for monitoring by Prometheus has a prometheus
+section under a monitoring tree in the chart's values.yaml, and Prometheus
+monitoring is disabled by default for those services. Example values for the
+required entries can be found in the following monitoring configuration for the
+prometheus-node-exporter chart:
+
+::
+
+ monitoring:
+ prometheus:
+ enabled: false
+ node_exporter:
+ scrape: true
+
+If the prometheus.enabled key is set to true, the annotations are set on the
+targeted service or pod as the condition for applying the annotations evaluates
+to true. For example:
+
+::
+
+ {{- $prometheus_annotations := $envAll.Values.monitoring.prometheus.node_exporter }}
+ ---
+ apiVersion: v1
+ kind: Service
+ metadata:
+ name: {{ tuple "node_metrics" "internal" . | include "helm-toolkit.endpoints.hostname_short_endpoint_lookup" }}
+ labels:
+ {{ tuple $envAll "node_exporter" "metrics" | include "helm-toolkit.snippets.kubernetes_metadata_labels" | indent 4 }}
+ annotations:
+ {{- if .Values.monitoring.prometheus.enabled }}
+ {{ tuple $prometheus_annotations | include "helm-toolkit.snippets.prometheus_service_annotations" | indent 4 }}
+ {{- end }}
+
+Kubelet, API Server, and cAdvisor
+---------------------------------
+
+The Prometheus chart includes scrape target configurations for the kubelet, the
+Kubernetes API servers, and cAdvisor. These targets are configured based on
+a kubeadm deployed Kubernetes cluster, as OpenStack-Helm uses kubeadm to deploy
+Kubernetes in the gates. These configurations may need to change based on your
+chosen method of deployment. Please note the cAdvisor metrics will not be
+captured if the kubelet was started with the following flag:
+
+::
+
+ --cadvisor-port=0
+
+To enable the gathering of the kubelet's custom metrics, the following flag must
+be set:
+
+::
+
+ --enable-custom-metrics
+
+Installation
+------------
+
+The Prometheus chart can be installed with the following command:
+
+.. code-block:: bash
+
+ helm install --namespace=openstack local/prometheus --name=prometheus
+
+The above command results in a Prometheus deployment configured to automatically
+discover services with the necessary annotations for scraping, configured to
+gather metrics on the kubelet, the Kubernetes API servers, and cAdvisor.
+
+Extending Prometheus
+--------------------
+
+Prometheus can target various exporters to gather metrics related to specific
+applications to extend visibility into an OpenStack-Helm deployment. Currently,
+openstack-helm-infra contains charts for:
+
+- prometheus-kube-state-metrics: Provides additional Kubernetes metrics
+- prometheus-node-exporter: Provides metrics for nodes and linux kernels
+- prometheus-openstack-metrics-exporter: Provides metrics for OpenStack services
+
+Kube-State-Metrics
+~~~~~~~~~~~~~~~~~~
+
+The prometheus-kube-state-metrics chart provides metrics for Kubernetes objects
+as well as metrics for kube-scheduler and kube-controller-manager. Information
+on the specific metrics available via the kube-state-metrics service can be
+found in the kube-state-metrics_ documentation.
+
+The prometheus-kube-state-metrics chart can be installed with the following:
+
+.. code-block:: bash
+
+ helm install --namespace=kube-system local/prometheus-kube-state-metrics --name=prometheus-kube-state-metrics
+
+.. _kube-state-metrics: https://github.com/kubernetes/kube-state-metrics/tree/master/Documentation
+
+Node Exporter
+~~~~~~~~~~~~~
+
+The prometheus-node-exporter chart provides hardware and operating system metrics
+exposed via Linux kernels. Information on the specific metrics available via
+the Node exporter can be found on the Node_exporter_ GitHub page.
+
+The prometheus-node-exporter chart can be installed with the following:
+
+.. code-block:: bash
+
+ helm install --namespace=kube-system local/prometheus-node-exporter --name=prometheus-node-exporter
+
+.. _Node_exporter: https://github.com/prometheus/node_exporter
+
+OpenStack Exporter
+~~~~~~~~~~~~~~~~~~
+
+The prometheus-openstack-exporter chart provides metrics specific to the
+OpenStack services. The exporter's source code can be found here_. While the
+metrics provided are by no means comprehensive, they will be expanded upon.
+
+Please note the OpenStack exporter requires the creation of a Keystone user to
+successfully gather metrics. To create the required user, the chart uses the
+same keystone user management job the OpenStack service charts use.
+
+The prometheus-openstack-exporter chart can be installed with the following:
+
+.. code-block:: bash
+
+ helm install --namespace=openstack local/prometheus-openstack-exporter --name=prometheus-openstack-exporter
+
+.. _here: https://github.com/att-comdev/openstack-metrics-collector
+
+Other exporters
+~~~~~~~~~~~~~~~
+
+Certain charts in OpenStack-Helm include templates for application-specific
+Prometheus exporters, which keeps the monitoring of those services tightly coupled
+to the chart. The templates for these exporters can be found in the monitoring
+subdirectory in the chart. These exporters are disabled by default, and can be
+enabled by setting the appropriate flag in the monitoring.prometheus key of the
+chart's values.yaml file. The charts containing exporters include:
+
+- Elasticsearch_
+- RabbitMQ_
+- MariaDB_
+- Memcached_
+- Fluentd_
+- Postgres_
+
+.. _Elasticsearch: https://github.com/justwatchcom/elasticsearch_exporter
+.. _RabbitMQ: https://github.com/kbudde/rabbitmq_exporter
+.. _MariaDB: https://github.com/prometheus/mysqld_exporter
+.. _Memcached: https://github.com/prometheus/memcached_exporter
+.. _Fluentd: https://github.com/V3ckt0r/fluentd_exporter
+.. _Postgres: https://github.com/wrouesnel/postgres_exporter
+
+Ceph
+~~~~
+
+Starting with Luminous, Ceph can export metrics with ceph-mgr prometheus module.
+This module can be enabled in Ceph's values.yaml under the ceph_mgr_enabled_plugins
+key by appending prometheus to the list of enabled modules. After enabling the
+prometheus module, metrics can be scraped on the ceph-mgr service endpoint. This
+relies on the Prometheus annotations attached to the ceph-mgr service template, and
+these annotations can be modified in the endpoints section of Ceph's values.yaml
+file. Information on the specific metrics available via the prometheus module
+can be found in the Ceph prometheus_ module documentation.
+
+.. _prometheus: http://docs.ceph.com/docs/master/mgr/prometheus/
+
+
+Prometheus Dashboard
+--------------------
+
+Prometheus includes a dashboard that can be accessed via the accessible
+Prometheus endpoint (NodePort or otherwise). This dashboard will give you a
+view of your scrape targets' state, the configuration values for Prometheus's
+scrape jobs and command line flags, a view of any alerts triggered based on the
+defined rules, and a means for using PromQL to query scraped metrics. The
+Prometheus dashboard is a useful tool for verifying Prometheus is configured
+appropriately and to verify the status of any services targeted for scraping via
+the Prometheus service discovery annotations.
+
+Rules Configuration
+-------------------
+
+Prometheus provides a querying language that can operate on defined rules which
+allow for the generation of alerts on specific metrics. The Prometheus chart in
+openstack-helm-infra defines these rules via the values.yaml file. By defining
+these in the values file, it allows operators flexibility to provide specific
+rules via overrides at installation. The following rules keys are provided:
+
+::
+
+ values:
+ conf:
+ rules:
+ alertmanager:
+ etcd3:
+ kube_apiserver:
+ kube_controller_manager:
+ kubelet:
+ kubernetes:
+ rabbitmq:
+ mysql:
+ ceph:
+ openstack:
+ custom:
+
+These provided keys provide recording and alert rules for all infrastructure
+components of an OpenStack-Helm deployment. If you wish to exclude rules for a
+component, leave the tree empty in an overrides file. To read more
+about Prometheus recording and alert rules definitions, please see the official
+Prometheus recording_ and alert_ rules documentation.
+
+.. _recording: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
+.. _alert: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
+
+Note: Prometheus releases prior to 2.0 used gotpl to define rules. Prometheus
+2.0 changed the rules format to YAML, making them much easier to read. The
+Prometheus chart in openstack-helm-infra uses Prometheus 2.0 by default to take
+advantage of changes to the underlying storage layer and the handling of stale
+data. The chart will not support overrides for Prometheus versions below 2.0,
+as the command line flags for the service changed between versions.
+
+The wide range of exporters included in OpenStack-Helm coupled with the ability
+to define rules with configuration overrides allows for the addition of custom
+alerting and recording rules to fit an operator's monitoring needs. Adding new
+rules or modifying existing rules require overrides for either an existing key
+under conf.rules or the addition of a new key under conf.rules. The addition
+of custom rules can be used to define complex checks that can be extended for
+determining the liveliness or health of infrastructure components.
diff --git a/doc/source/readme.rst b/doc/source/readme.rst
new file mode 100644
index 000000000..a6210d3d8
--- /dev/null
+++ b/doc/source/readme.rst
@@ -0,0 +1 @@
+.. include:: ../../README.rst