deckhand/doc/source/users/substitution.rst
Phil Sphicas 5cd799cc5d Allow source substring extraction
When performing substitutions, there are occasions when the source value
does not exactly match the format required by the destination document
(e.g. the values.yaml structure of an Armada chart).

This change provides the ability extract a substring of the source
value, and substitute that into the destination document.

Two optional fields are added to `src` under `metadata.substitutions`:

  * `pattern`: a regular expression, with optional capture groups
  * `match_group`: the number of the desired capture group

The canonical use case is a chart that requires an image with the repo
name and tag in separate fields, while the substitution source has the
full image path as a single value.

For example, assuming that the source document "software-versions" has:

    data:
      images:
        hello: docker.io/library/hello-world:latest

Then the following set of substitutions would put the repo and tag in
the applicable values in the destination document:

    metadata:
      substitutions:
        - src:
            schema: pegleg/SoftwareVersions/v1
            name: software-versions
            path: .images.hello
            pattern: '^(.*):(.*)'
            match_group: 1
          dest:
            path: .values.images.hello.repo
        - src:
            schema: pegleg/SoftwareVersions/v1
            name: software-versions
            path: .images.hello
            pattern: '^(.*):(.*)'
            match_group: 2
          dest:
            path: .values.images.hello.tag
    data:
      values:
        images:
          hello:
            repo:  # docker.io/library/hello-world
            tag:   # latest

Change-Id: I2fcb0d2b8e2fe3d85479ac2bad0b7b90f434eb77
2022-01-18 13:04:25 -08:00

14 KiB

Document Substitution

Introduction

Document substitution, simply put, allows one document to overwrite parts of its own data with that of another document. Substitution involves a source document sharing data with a destination document, which replaces its own data with the shared data.

Substitution may be leveraged as a mechanism for:

  • inserting secrets into configuration documents
  • reducing data duplication by declaring common data within one document and having multiple other documents substitute data from the common location as needed

During document rendering, substitution is applied at each layer after all merge actions occur. For more information on the interaction between document layering and substitution, see: rendering.

Requirements

Substitutions between documents are not restricted by schema, name, nor layer. Source and destination documents do not need to share the same schema.

No substitution dependency cycle may exist between a series of substitutions. For example, if A substitutes from B, B from C, and C from A, then Deckhand will raise an exception as it is impossible to determine the source data to use for substitution in the presence of a dependency cycle.

Substitution works like this:

The source document is resolved via the src.schema and src.name keys and the src.path key is used relative to the source document's data section to retrieve the substitution data, which is then injected into the data section of the destination document using the dest.path key.

If all the constraints above are correct, then the substitution source data is injected into the destination document's data section, at the path specified by dest.path.

The injection of data into the destination document can be more fine-tuned using a regular expression; see the substitution-pattern section below for more information.

Note

Substitution is only applied to the data section of a document. This is because a document's metadata and schema sections should be immutable within the scope of a revision, for obvious reasons.

Rendering Documents with Substitution

Concrete (non-abstract) documents can be used as a source of substitution into other documents. This substitution is layer-independent, so given the 3 layer example above, which includes global, region and site layers, a document in the region layer could insert data from a document in the site layer.

Example

Here is a sample set of documents demonstrating substitution:

---
schema: deckhand/Certificate/v1
metadata:
  name: example-cert
  storagePolicy: cleartext
  layeringDefinition:
    layer: site
data: |
  CERTIFICATE DATA
---
schema: deckhand/CertificateKey/v1
metadata:
  name: example-key
  storagePolicy: encrypted
  layeringDefinition:
    layer: site
data: |
  KEY DATA
---
schema: deckhand/Passphrase/v1
metadata:
  name: example-password
  storagePolicy: encrypted
  layeringDefinition:
    layer: site
data: my-secret-password
---
schema: armada/Chart/v1
metadata:
  name: example-chart-01
  storagePolicy: cleartext
  layeringDefinition:
    layer: region
  substitutions:
    - dest:
        path: .chart.values.tls.certificate
      src:
        schema: deckhand/Certificate/v1
        name: example-cert
        path: .
    - dest:
        path: .chart.values.tls.key
      src:
        schema: deckhand/CertificateKey/v1
        name: example-key
        path: .
    - dest:
        path: .chart.values.some_url
        pattern: INSERT_[A-Z]+_HERE
      src:
        schema: deckhand/Passphrase/v1
        name: example-password
        path: .
data:
  chart:
    details:
      data: here
    values:
      some_url: http://admin:INSERT_PASSWORD_HERE@service-name:8080/v1
...

The rendered document will look like:

---
schema: armada/Chart/v1
metadata:
  name: example-chart-01
  storagePolicy: cleartext
  layeringDefinition:
    layer: region
  substitutions:
    - dest:
        path: .chart.values.tls.certificate
      src:
        schema: deckhand/Certificate/v1
        name: example-cert
        path: .
    - dest:
        path: .chart.values.tls.key
      src:
        schema: deckhand/CertificateKey/v1
        name: example-key
        path: .
    - dest:
        path: .chart.values.some_url
        pattern: INSERT_[A-Z]+_HERE
      src:
        schema: deckhand/Passphrase/v1
        name: example-password
        path: .
data:
  chart:
    details:
      data: here
    values:
      some_url: http://admin:my-secret-password@service-name:8080/v1
      tls:
        certificate: |
          CERTIFICATE DATA
        key: |
          KEY DATA
...

Substitution with Patterns

Substitution can be controlled in a more fine-tuned fashion using dest.pattern (optional) which functions as a regular expression underneath the hood. The dest.pattern has the following constraints:

  • dest.path key must already exist in the data section of the destination document and must have an associated value.
  • The dest.pattern must be a valid regular expression string.
  • The dest.pattern must be resolvable in the value of dest.path.

If the above constraints are met, then more precise substitution via a pattern can be carried out. If dest.path is a string or multiline string then all occurrences of dest.pattern found in dest.path will be replaced. To handle a more complex dest.path read Recursive Replacement of Patterns.

Example

---
# Source document.
schema: deckhand/Passphrase/v1
metadata:
  name: example-password
  schema: metadata/Document/v1
  layeringDefinition:
    layer: site
  storagePolicy: cleartext
data: my-secret-password
---
# Another source document.
schema: deckhand/Passphrase/v1
metadata:
  name: another-password
  schema: metadata/Document/v1
  layeringDefinition:
    layer: site
  storagePolicy: cleartext
data: another-secret-password
---
# Destination document.
schema: armada/Chart/v1
metadata:
  name: example-chart-01
  schema: metadata/Document/v1
  layeringDefinition:
    layer: region
  substitutions:
    - dest:
        path: .chart.values.some_url
        pattern: INSERT_[A-Z]+_HERE
      src:
        schema: deckhand/Passphrase/v1
        name: example-password
        path: .
    - dest:
        path: .chart.values.script
        pattern: INSERT_ANOTHER_PASSWORD
      src:
        schema: deckhand/Passphrase/v1
        name: another-password
        path: .
data:
  chart:
    details:
      data: here
    values:
      some_url: http://admin:INSERT_PASSWORD_HERE@service-name:8080/v1
      script: |
       some_function("INSERT_ANOTHER_PASSWORD")
       another_function("INSERT_ANOTHER_PASSWORD")

After document rendering, the output for example-chart-01 (the destination document) will be:

---
schema: armada/Chart/v1
metadata:
  name: example-chart-01
  schema: metadata/Document/v1
  [...]
data:
  chart:
    details:
      data: here
    values:
      # Notice string replacement occurs at exact location specified by
      # ``dest.pattern``.
      some_url: http://admin:my-secret-password@service-name:8080/v1
      script: |
       some_function("another-secret-password")
       another_function("another-secret-password")

Recursive Replacement of Patterns

Patterns may also be replaced recursively. This can be achieved by specifying a pattern value and recurse as True (it otherwise defaults to False). Best practice is to limit the scope of the recursion as much as possible: e.g. avoid passing in "$" as the jsonpath, but rather a JSON path that lives closer to the nested strings in question.

Note

Recursive selection of patterns will only consider matching patterns. Non-matching patterns will be ignored. Thus, even if recursion can "pass over" non-matching patterns, they will be silently ignored.

---
# Source document.
schema: deckhand/Passphrase/v1
metadata:
  name: example-password
  schema: metadata/Document/v1
  layeringDefinition:
    layer: site
  storagePolicy: cleartext
data: my-secret-password
---
# Destination document.
schema: armada/Chart/v1
metadata:
  name: example-chart-01
  schema: metadata/Document/v1
  layeringDefinition:
    layer: region
  substitutions:
    - dest:
        # Note that the path encapsulates all 3 entries that require pattern
        # replacement.
        path: .chart.values
        pattern: INSERT_[A-Z]+_HERE
        recurse:
          # Note that specifying the depth is mandatory. -1 means that all
          # layers are recursed through.
          depth: -1
      src:
        schema: deckhand/Passphrase/v1
        name: example-password
        path: .
data:
  chart:
    details:
      data: here
    values:
      # Notice string replacement occurs for all paths recursively captured
      # by dest.path, since all their patterns match dest.pattern.
      admin_url: http://admin:INSERT_PASSWORD_HERE@service-name:35357/v1
      internal_url: http://internal:INSERT_PASSWORD_HERE@service-name:5000/v1
      public_url: http://public:INSERT_PASSWORD_HERE@service-name:5000/v1

After document rendering, the output for example-chart-01 (the destination document) will be:

---
schema: armada/Chart/v1
metadata:
  name: example-chart-01
  schema: metadata/Document/v1
  [...]
data:
  chart:
    details:
      data: here
    values:
      # Notice how the data from the source document is injected into the
      # exact location specified by ``dest.pattern``.
      admin_url: http://admin:my-secret-password@service-name:35357/v1
      internal_url: http://internal:my-secret-passwor@service-name:5000/v1
      public_url: http://public:my-secret-passwor@service-name:5000/v1

Note that the recursion depth must be specified. -1 effectively ignores the depth. Any other positive integer will specify how many levels deep to recurse in order to optimize recursive pattern replacement. Take care to specify the required recursion depth or else too-deep patterns won't be replaced.

Source Pattern Matching (Substring Extraction)

In some cases, only a substring of the substitution source is needed in the destination document. For example, the source document may specify a full image path, while the destination chart requires the repo and tag as separate fields.

This type of substitution can be accomplished with the optional parameters: * src.pattern - a regular expression, with optional capture groups. * src.match_group - the number of the desired capture group.

Note

It is an error to specify src.pattern if the substitution source is not a string (e.g. an object or an array).

Note

If the regex does not match, a warning is logged, and the entire source string is used.

Note

The default src.match_group is 0 (i.e. the entire match). This allows the use of expressions like sha256:.* without parentheses, and without explicitly specifying a match group.

For example, given the following source documents, the distinct values for repo and tag will be extracted from the source image:

---
# Source document.
schema: pegleg/SoftwareVersions/v1
metadata:
  schema: metadata/Document/v1
  name: software-versions
  layeringDefinition:
    abstract: false
    layer: global
  storagePolicy: cleartext
data:
  images:
    hello: docker.io/library/hello-world:latest
---
# Destination document.
schema: armada/Chart/v1
metadata:
  name: example-chart-01
  schema: metadata/Document/v1
  layeringDefinition:
    abstract: false
    layer: global
  substitutions:
    - src:
        schema: pegleg/SoftwareVersions/v1
        name: software-versions
        path: .images.hello
        pattern: '^(.*):(.*)'
        match_group: 1
      dest:
        path: .values.images.hello.repo
    - src:
        schema: pegleg/SoftwareVersions/v1
        name: software-versions
        path: .images.hello
        pattern: '^(.*):(.*)'
        match_group: 2
      dest:
        path: .values.images.hello.tag
data:
  values:
    images:
      hello:
        repo:  # docker.io/library/hello-world
        tag:   # latest

Substitution of Encrypted Data

Deckhand allows data to be encrypted using Barbican <encryption>. Substitution of encrypted data works the same as substitution of cleartext data.

Note that during the rendering process, source and destination documents receive the secrets stored in Barbican.