zuul-jobs/roles/mirror-workspace-git-repos/tasks/main.yaml
Felix Edel 7761396303 mirror-workspace-git-repos: Retry on failure in git update task
We occasionally see the this task fail for the first element in the
zuul.projects list with a MODULE FAILURE and a return code of -13
(SIGPIPE) [1]. So far we couldn't identify the root cause, so try to
mitigate this issue by retrying on failure. This solution is similar to
the one used for the "Synchronize repos" task[2].

There is a bug report in Ansible that fits

Since it's only the first element in the loop that is failing while
subsequent elements are successful, we currently have two assumptions:

  1. As the task before is using a `delegate_to: localhost' [3],
     there might be a problem with Ansible when switching the connection
     from localhost to the remote host (node).
  2. Since the task before is using the same SSH connection [4] that is
     used by Ansible to push the git repository, there might be some
     "leftovers" on the connection that make the next task fail.
  3. There is also a bug report in Ansible [5] which might be causing
     that error.

[1]:
    {
        "ansible_loop_var": "zj_project",
        "changed": false,
        "failed": true,
        "module_stderr": "",
        "module_stdout": "",
        "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
        "rc": -13,
        "zj_project": {...}
    }

[2]: 3b3495e255/roles/mirror-workspace-git-repos/tasks/main.yaml (L32)
[3]: 3b3495e255/roles/mirror-workspace-git-repos/tasks/main.yaml (L25)
[4]: 3b3495e255/roles/mirror-workspace-git-repos/tasks/main.yaml (L16)
[5]: https://github.com/ansible/ansible/issues/81777

Change-Id: I0c4cb87bb076b9b40c9c446dbe5db437daff5897
2023-12-08 06:37:55 -08:00

71 lines
2.9 KiB
YAML

- name: Allow pushing to non-bare repo
git_config:
name: receive.denyCurrentBranch
value: ignore
scope: local
repo: "{{ zuul_workspace_root }}/{{ zj_project.value.src_dir }}"
with_dict: "{{ zuul.projects }}"
loop_control:
loop_var: zj_project
- name: Synchronize src repos to workspace directory
command: |-
{% if ansible_connection == "kubectl" %}
git push {% if mirror_workspace_quiet %}--quiet{% endif %} --mirror "ext::kubectl --context {{ zuul.resources[inventory_hostname].context }} -n {{ zuul.resources[inventory_hostname].namespace }} exec -i {{ zuul.resources[inventory_hostname].pod }} -- %S {{ zuul_workspace_root }}/{{ zj_project.value.src_dir }}"
{% else %}
git push {% if mirror_workspace_quiet %}--quiet{% endif %} --mirror git+ssh://{{ ansible_user }}@{{ ansible_host | ipwrap }}:{{ ansible_port }}/{{ zuul_workspace_root }}/{{ zj_project.value.src_dir }}
{% endif %}
args:
chdir: "{{ zuul.executor.work_root }}/{{ zj_project.value.src_dir }}"
environment:
GIT_ALLOW_PROTOCOL: ext:ssh
with_dict: "{{ zuul.projects }}"
loop_control:
loop_var: zj_project
delegate_to: localhost
# We occasionally see git pushes in the middle of this loop fail then
# subsequent pushes for other repos succeed. The entire loop ends up
# failing because one of the pushes failed. Mitigate this by retrying
# on failure.
register: git_push
until: git_push is success
retries: 3
# ANSIBLE0006: Skip linting since it triggers on the "git" command,
# but push is not supported by ansible git module.
tags:
- skip_ansible_lint
# Do this as a multi-line shell so that we can do the loop once
- name: Update remote repository state correctly
shell: |
set -eu
# Reset is needed because we pushed to a non-bare repo
git reset --hard
# Clean is needed because we pushed to a non-bare repo
git clean -xdf
# Undo the config setting we did above
git config --local --unset receive.denyCurrentBranch
# checkout the branch matching the branch set up by the executor
git checkout {% if mirror_workspace_quiet %}--quiet{% endif %} {{ zj_project.value.checkout }}
# put out a status line with the current HEAD
echo "{{ zj_project.value.canonical_name }} checked out to:"
git log --pretty=oneline -1
args:
chdir: "{{ zuul_workspace_root }}/{{ zj_project.value.src_dir }}"
with_dict: "{{ zuul.projects }}"
loop_control:
loop_var: zj_project
# We occasionally see the this task fail for the first element in the
# zuul.projects list with a MODULE FAILURE and a return code of -13
# (SIGPIPE). This may be caused by
# https://github.com/ansible/ansible/issues/81777
# Try to mitigate this issue by retrying on failure.
register: git_update
until: git_update is success
retries: 3
# ANSIBLE0006: Skip linting since it triggers on the "git" command,
# but we prefer the shell above
tags:
- skip_ansible_lint