11. Appendices

11.1. Quick Steps to Restarting Squid Proxy Container

Downloading and installing several hundred packages per host while testing provisioning of multiple Vagrant virtual machines can take several hours to perform over a 1-5 Mbps network connection. Even a single Vagrant can take around 45 minutes to fully provision after a vagrant destroy. Since this task may need to be done over and over again, even for just one system, the process becomes very tedious and time consuming.

To minimize the number of remote downloads, a local proxy can help immensely. The DIMS project utilizes a squid-deb-proxy running in a Docker container on VM host systems to allow all of the local VMs to take advantage of a single cacheing proxy on the host. This significantly improves performance (cutting the time down to just a few minutes), but this comes at a cost in occassional instability due to the combination of iptables firewall rules that must contain a DOCKER chain for Docker, which attempts to keep the squid-deb-proxy container running across reboots of the VM host can result in the the container effectively “hanging” from time to time. This manifests as a random failure in an Ansible task that is trying to use the configured proxy (e.g., see the python-virtualenv build failure in Section _using_dims_functions_in_bats.)

A bats test exists to test the proxy:

$ test.runner integration/proxy
[+] Running test integration/proxy
 ✗ [S][EV] HTTP download test (using wget, w/proxy if configured)
   (in test file integration/proxy.bats, line 16)
     `[ ! -z "$(wget -q -O - http://http.us.debian.org/debian/dists/jessie/Release | grep non-free/source/Release 2>/dev/null)" ]' failed
 ✗ [S][EV] HTTPS download test (using wget, w/proxy if configured)
   (in test file integration/proxy.bats, line 26)
     `[ ! -z "$(wget -q -O - https://packages.debian.org/jessie/amd64/0install/filelist | grep 0install 2>/dev/null)" ]' failed

2 tests, 2 failures

This error will manifest itself sometimes when doing development work on Vagrants, as can be seen here:

 $ cd /vm/run/purple
 $ make up && make DIMS_ANSIBLE_ARGS="--tags base" reprovision-local
 [+] Creating Vagrantfile
 . . .
 TASK [base : Only "update_cache=yes" if >3600s since last update (Debian)] ****
 Wednesday 16 August 2017  16:55:35 -0700 (0:00:01.968)       0:00:48.823 ******
 fatal: [purple.devops.local]: FAILED! => {
     "changed": false,
     "failed": true
 }

 MSG:

 Failed to update apt cache.


 RUNNING HANDLER [base : update timezone] **************************************
 Wednesday 16 August 2017  16:56:18 -0700 (0:00:43.205)       0:01:32.028 ******

 PLAY RECAP ********************************************************************
 purple.devops.local        : ok=15   changed=7    unreachable=0    failed=1

 Wednesday 16 August 2017  16:56:18 -0700 (0:00:00.000)       0:01:32.029 ******
 ===============================================================================
 base : Only "update_cache=yes" if >3600s since last update (Debian) ---- 43.21s
 . . .
 make[1]: *** [provision] Error 2
 make[1]: Leaving directory `/vm/run/purple'
 make: *** [reprovision-local] Error 2

When it fails like this, it usually means that iptables must be restarted, followed by restarting the docker service. That usually is enough to fix the problem. If not, it may be necessary to also restart the squid-deb-proxy container.

Note

The cause of this the recreation of the DOCKER chain, which removes the rules added by Docker, when restarting just the iptables-persistent service as can be seen here:

$ sudo iptables -nvL | grep "Chain DOCKER"
Chain DOCKER (2 references)
Chain DOCKER-ISOLATION (1 references)
$ sudo iptables-persistent restart
sudo: iptables-persistent: command not found
$ sudo service iptables-persistent restart
 * Loading iptables rules...
 *  IPv4...
 *  IPv6...
   ...done.
$ sudo iptables -nvL | grep "Chain DOCKER"
Chain DOCKER (0 references)

Restarting the docker service will restore the rules for containers that Docker is keeping running across restarts.

$ sudo service docker restart
docker stop/waiting
docker start/running, process 18276
$ sudo iptables -nvL | grep "Chain DOCKER"
Chain DOCKER (2 references)
Chain DOCKER-ISOLATION (1 references)

The solution for this is to notify a special handler that conditionally restarts the docker service after restarting iptables in order to re-establish the proper firewall rules. The handler is shown here:

 - name: conditional restart docker
   service: name=docker state=restarted
   when: hostvars[inventory_hostname].ansible_docker0 is defined

Use of the handler (from roles/base/tasks/main.yml) is shown here:

 - name: iptables v4 rules (Debian)
   template:
     src: '{{ item }}'
     dest: /etc/iptables/rules.v4
     owner: '{{ root_user }}'
     group: '{{ root_group }}'
     mode: 0o600
     validate: '/sbin/iptables-restore --test %s'
   with_first_found:
     - files:
         - '{{ iptables_rules }}'
         - rules.v4.{{ inventory_hostname }}.j2
         - rules.v4.category-{{ category }}.j2
         - rules.v4.deployment-{{ deployment }}.j2
         - rules.v4.j2
       paths:
         - '{{ dims_private }}/roles/{{ role_name }}/templates/iptables/'
         - iptables/
   notify:
     - "restart iptables ({{ ansible_distribution }}/{{ ansible_distribution_release }})"
     - "conditional restart docker"
   become: yes
   when: ansible_os_family == "Debian"
   tags: [ base, config, iptables ]

A tag iptables exists to allow regeneration of the iptables rules and perform the proper restarting sequence, which should be used instead of just restarting the iptables-persistent service manually. Use ansible-playbook instead (e.g., run.playbook --tags iptables) after making changes to variables that affect iptables rules.

$ cd $GIT/dims-dockerfiles/dockerfiles/squid-deb-proxy

$ for S in iptables-persistent docker; do sudo service $S restart; done
 * Loading iptables rules...
 *  IPv4...
 *  IPv6...
   ...done.
docker stop/waiting
docker start/running, process 22065

$ make rm
docker stop dims.squid-deb-proxy
test.runner -dims.squid-deb-proxy
docker rm dims.squid-deb-proxy
-dims.squid-deb-proxy

$ make daemon
docker run \
          --name dims.squid-deb-proxy \
          --restart unless-stopped \
          -v /vm/cache/apt:/cachedir -p 127.0.0.1:8000:8000 squid-deb-proxy:0.7 2>&1 >/dev/null &
2017/07/22 19:31:29| strtokFile: /etc/squid-deb-proxy/autogenerated/pkg-blacklist-regexp.acl not found
2017/07/22 19:31:29| Warning: empty ACL: acl blockedpkgs urlpath_regex "/etc/squid-deb-proxy/autogenerated/pkg-blacklist-regexp.acl"

The test should now succeed:

$ test.runner --level '*' --match proxy
[+] Running test integration/proxy
 ✓ [S][EV] HTTP download test (using wget, w/proxy if configured)
 ✓ [S][EV] HTTPS download test (using wget, w/proxy if configured)

2 tests, 0 failures

11.2. Recovering From Operating System Corruption

Part of the reason for using a Python virtual environment for development is to encapsulate the development Python and its libraries from the system Python and its libraries, in case a failed upgrade breaks Python. Since Python is a primary dependency of Ansible, a broken system Python is a Very Bad Thing. ™

For example, the following change was attempted to try to upgrade pip packages during application of the base role. Here are the changes:

$ git diff
diff --git a/roles/base/tasks/main.yml b/roles/base/tasks/main.yml
index 3ce57d8..182e7d8 100644
--- a/roles/base/tasks/main.yml
+++ b/roles/base/tasks/main.yml
@@ -717,7 +717,7 @@
 - name: Ensure pip installed for system python
   apt:
     name: '{{ item }}'
-    state: installed
+    state: latest
   with_items:
     - python-pip
   become: yes
@@ -725,7 +725,7 @@
   tags: [ base, config ]

 - name: Ensure required system python packages present
-  shell: 'pip install {{ item }}'
+  shell: 'pip install -U {{ item }}'
   with_items:
     - urllib3
     - pyOpenSSL

Applying the base role against two systems resulted in a series of error messages.

$ ansible-playbook master.yml --limit trident --tags base

. . .

PLAY [Configure host "purple.devops.local"] ***********************************

. . .

TASK [base : Ensure required system python packages present] ******************
Thursday 17 August 2017  10:36:13 -0700 (0:00:01.879)       0:02:22.637 *******
changed: [purple.devops.local] => (item=urllib3)
failed: [purple.devops.local] (item=pyOpenSSL) => {
    "changed": true,
    "cmd": "pip install -U pyOpenSSL",
    "delta": "0:00:07.516760",
    "end": "2017-08-17 10:36:24.256121",
    "failed": true,
    "item": "pyOpenSSL",
    "rc": 1,
    "start": "2017-08-17 10:36:16.739361"
}

STDOUT:

Downloading/unpacking pyOpenSSL from https://pypi.python.org/packages/41/bd/751560b317222ba6b6d2e7663a990ac36465aaa026621c6057db130e2faf/pyOpenSSL-17.2.0-py2.py3-none-any.whl#md5=0f8a4b784b6
81231f03edc8dd28612df
Downloading/unpacking six>=1.5.2 from https://pypi.python.org/packages/c8/0a/b6723e1bc4c516cb687841499455a8505b44607ab535be01091c0f24f079/six-1.10.0-py2.py3-none-any.whl#md5=3ab558cf5d4f7a72
611d59a81a315dc8 (from pyOpenSSL)
  Downloading six-1.10.0-py2.py3-none-any.whl
Downloading/unpacking cryptography>=1.9 (from pyOpenSSL)
  Running setup.py (path:/tmp/pip-build-FCbUwT/cryptography/setup.py) egg_info for package cryptography

    no previously-included directories found matching 'docs/_build'
    warning: no previously-included files matching '*' found under directory 'vectors'
Downloading/unpacking idna>=2.1 (from cryptography>=1.9->pyOpenSSL)
Downloading/unpacking asn1crypto>=0.21.0 (from cryptography>=1.9->pyOpenSSL)
Downloading/unpacking enum34 (from cryptography>=1.9->pyOpenSSL)
  Downloading enum34-1.1.6-py2-none-any.whl
Downloading/unpacking ipaddress (from cryptography>=1.9->pyOpenSSL)
  Downloading ipaddress-1.0.18-py2-none-any.whl
Downloading/unpacking cffi>=1.7 (from cryptography>=1.9->pyOpenSSL)
  Running setup.py (path:/tmp/pip-build-FCbUwT/cffi/setup.py) egg_info for package cffi

Downloading/unpacking pycparser from https://pypi.python.org/packages/8c/2d/aad7f16146f4197a11f8e91fb81df177adcc2073d36a17b1491fd09df6ed/pycparser-2.18.tar.gz#md5=72370da54358202a60130e223d4
88136 (from cffi>=1.7->cryptography>=1.9->pyOpenSSL)
  Running setup.py (path:/tmp/pip-build-FCbUwT/pycparser/setup.py) egg_info for package pycparser

    warning: no previously-included files matching 'yacctab.*' found under directory 'tests'
    warning: no previously-included files matching 'lextab.*' found under directory 'tests'
    warning: no previously-included files matching 'yacctab.*' found under directory 'examples'
    warning: no previously-included files matching 'lextab.*' found under directory 'examples'
Installing collected packages: pyOpenSSL, six, cryptography, idna, asn1crypto, enum34, ipaddress, cffi, pycparser
  Found existing installation: pyOpenSSL 0.14
    Not uninstalling pyOpenSSL at /usr/lib/python2.7/dist-packages, owned by OS
  Found existing installation: six 1.8.0
    Not uninstalling six at /usr/lib/python2.7/dist-packages, owned by OS
  Found existing installation: cryptography 0.6.1
    Not uninstalling cryptography at /usr/lib/python2.7/dist-packages, owned by OS
  Running setup.py install for cryptography

    Installed /tmp/pip-build-FCbUwT/cryptography/cffi-1.10.0-py2.7-linux-x86_64.egg
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-FCbUwT/cryptography/setup.py", line 312, in <module>
        **keywords_with_side_effects(sys.argv)
      File "/usr/lib/python2.7/distutils/core.py", line 111, in setup
        _setup_distribution = dist = klass(attrs)
      File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 266, in __init__
        _Distribution.__init__(self,attrs)
      File "/usr/lib/python2.7/distutils/dist.py", line 287, in __init__
        self.finalize_options()
      File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 301, in finalize_options
        ep.load()(self, ep.name, value)
      File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2190, in load
        ['__name__'])
    ImportError: No module named setuptools_ext
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-FCbUwT/cryptography/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__)
.read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-qKjzie-record/install-record.txt --single-version-externally-managed --compile:


Installed /tmp/pip-build-FCbUwT/cryptography/cffi-1.10.0-py2.7-linux-x86_64.egg

Traceback (most recent call last):

  File "<string>", line 1, in <module>

  File "/tmp/pip-build-FCbUwT/cryptography/setup.py", line 312, in <module>

    **keywords_with_side_effects(sys.argv)

  File "/usr/lib/python2.7/distutils/core.py", line 111, in setup

    _setup_distribution = dist = klass(attrs)

  File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 266, in __init__

    _Distribution.__init__(self,attrs)

  File "/usr/lib/python2.7/distutils/dist.py", line 287, in __init__

    self.finalize_options()

  File "/usr/lib/python2.7/dist-packages/setuptools/dist.py", line 301, in finalize_options

    ep.load()(self, ep.name, value)

  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2190, in load

    ['__name__'])

ImportError: No module named setuptools_ext

----------------------------------------
  Can't roll back cryptography; was not uninstalled
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-FCbUwT/cryptography/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '
\n'), __file__, 'exec'))" install --record /tmp/pip-qKjzie-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip-build-FCbUwT/cryptogra
phy
Storing debug log for failure in /root/.pip/pip.log

. . .

PLAY RECAP ********************************************************************
purple.devops.local        : ok=60   changed=35   unreachable=0    failed=1

Thursday 17 August 2017  10:36:29 -0700 (0:00:00.001)       0:02:38.799 *******
===============================================================================
base : Ensure required system python packages present ------------------ 16.16s
base : Ensure dims (system-level) subdirectories exist ----------------- 15.85s
base : Only "update_cache=yes" if >3600s since last update (Debian) ----- 5.65s
base : conditional restart docker --------------------------------------- 5.60s
base : Make sure required APT packages are present (Debian) ------------- 2.14s
base : Clean up dnsmasq build artifacts --------------------------------- 2.09s
base : Make sure blacklisted packages are absent (Debian) --------------- 2.03s
base : Check to see if https_proxy is working --------------------------- 1.99s
base : Log start of 'base' role ----------------------------------------- 1.95s
base : Make backports present for APT on Debian jessie ------------------ 1.89s
base : Ensure pip installed for system python --------------------------- 1.88s
base : Only "update_cache=yes" if >3600s since last update -------------- 1.85s
base : Make dbus-1 development libraries present ------------------------ 1.85s
base : iptables v4 rules (Debian) --------------------------------------- 1.84s
base : iptables v6 rules (Debian) --------------------------------------- 1.84s
base : Make full dnsmasq package present (Debian, not Trusty) ----------- 1.82s
base : Create base /etc/hosts file (Debian, RedHat, CoreOS) ------------- 1.64s
base : Make /etc/rsyslog.d/49-consolidation.conf present ---------------- 1.63s
base : Make dnsmasq configuration present on Debian --------------------- 1.60s
base : Ensure DIMS system shell init hook is present (Debian, CoreOS) --- 1.56s

The base role is supposed to ensure the operating system has the fundamental settings and pre-requisites necessary for all other DIMS roles, so applying that role should hopefully fix things, right?

$ ansible-playbook master.yml --limit trident --tags base

. . .

PLAY [Configure host "purple.devops.local"] ***********************************

. . .

TASK [base : Make sure blacklisted packages are absent (Debian)] **************
Thursday 17 August 2017  11:05:08 -0700 (0:00:01.049)       0:00:30.456 *******
...ignoring
An exception occurred during task execution. To see the full traceback, use
-vvv. The error was: AttributeError: 'FFI' object has no attribute 'new_allocator'
failed: [purple.devops.local] (item=[u'modemmanager', u'resolvconf', u'sendmail']) => {
    "failed": true,
    "item": [
        "modemmanager",
        "resolvconf",
        "sendmail"
    ],
    "module_stderr": "Traceback (most recent call last):\n  File \"/tmp/ansible_ehzfMx/
    ansible_module_apt.py\", line 239, in <module>\n    from ansible.module_utils.urls import fetch_url\n
File \"/tmp/ansible_ehzfMx/ansible_modlib.zip/ansible/module_utils/urls.py\", line 153,
in <module>\n  File \"/usr/local/lib/python2.7/dist-packages/urllib3/contrib/pyopenssl.py\", line 46,
in <module>\n    import OpenSSL.SSL\n  File \"/usr/local/lib/python2.7/dist-packages/OpenSSL/__init__.py\",
line 8, in <module>\n    from OpenSSL import rand, crypto, SSL\n  File \"/usr/local/lib/
python2.7/dist-packages/OpenSSL/rand.py\", line 10, in <module>\n    from OpenSSL._util
import (\n  File \"/usr/local/lib/python2.7/dist-packages/OpenSSL/_util.py\", line 18, in
<module>\n    no_zero_allocator = ffi.new_allocator(should_clear_after_alloc=False)\n
AttributeError: 'FFI' object has no attribute 'new_allocator'\n",
    "module_stdout": "",
    "rc": 1
}

MSG:

MODULE FAILURE


TASK [base : Only "update_cache=yes" if >3600s since last update (Debian)] ****
Thursday 17 August 2017  11:05:10 -0700 (0:00:01.729)       0:00:32.186 *******
An exception occurred during task execution. To see the full traceback, use -vvv.
The error was: AttributeError: 'FFI' object has no attribute 'new_allocator'
fatal: [purple.devops.local]: FAILED! => {
    "changed": false,
    "failed": true,
    "module_stderr": "Traceback (most recent call last):\n  File \"/tmp/ansible_ganqlZ/
    ansible_module_apt.py\", line 239, in <module>\n    from ansible.module_utils.urls import fetch_url\n
File \"/tmp/ansible_ganqlZ/ansible_modlib.zip/ansible/module_utils/urls.py\", line 153, in
<module>\n  File \"/usr/local/lib/python2.7/dist-packages/urllib3/contrib/pyopenssl.py\", line 46,
in <module>\n    import OpenSSL.SSL\n  File \"/usr/local/lib/python2.7/dist-packages/
OpenSSL/__init__.py\", line 8, in <module>\n    from OpenSSL import rand, crypto, SSL\n
File \"/usr/local/lib/python2.7/dist-packages/OpenSSL/rand.py\", line 10, in <module>\n
from OpenSSL._util import (\n  File \"/usr/local/lib/python2.7/dist-packages/OpenSSL/_util.py\",
line 18, in <module>\n    no_zero_allocator = ffi.new_allocator(should_clear_after_alloc=False)\n
AttributeError: 'FFI' object has no attribute 'new_allocator'\n",
    "module_stdout": "",
    "rc": 1
}

MSG:

MODULE FAILURE


RUNNING HANDLER [base : update timezone] **************************************
Thursday 17 August 2017  11:05:11 -0700 (0:00:01.530)       0:00:33.716 *******

PLAY RECAP ********************************************************************
purple.devops.local        : ok=14   changed=7    unreachable=0    failed=1

Thursday 17 August 2017  11:05:11 -0700 (0:00:00.001)       0:00:33.717 *******
===============================================================================
base : Log start of 'base' role ----------------------------------------- 1.88s
base : Make sure blacklisted packages are absent (Debian) --------------- 1.73s
base : Create base /etc/hosts file (Debian, RedHat, CoreOS) ------------- 1.55s
base : Only "update_cache=yes" if >3600s since last update (Debian) ----- 1.53s
base : Set timezone variables (Debian) ---------------------------------- 1.53s
base : iptables v6 rules (Debian) --------------------------------------- 1.48s
base : iptables v4 rules (Debian) --------------------------------------- 1.48s
base : Ensure getaddrinfo configuration is present (Debian) ------------- 1.48s
base : Check to see if dims.logger exists yet --------------------------- 1.31s
base : Set domainname (Debian, CoreOS) ---------------------------------- 1.17s
base : Check to see if gpk-update-viewer is running on Ubuntu ----------- 1.16s
base : Set hostname (runtime) (Debian, CoreOS) -------------------------- 1.16s
base : Make /etc/hostname present (Debian, CoreOS) ---------------------- 1.16s
base : Disable IPv6 in kernel on non-CoreOS ----------------------------- 1.16s
debug : include --------------------------------------------------------- 1.07s
base : iptables v4 rules (CoreOS) --------------------------------------- 1.06s
base : iptables v6 rules (CoreOS) --------------------------------------- 1.06s
debug : debug ----------------------------------------------------------- 1.05s
debug : debug ----------------------------------------------------------- 1.05s
debug : debug ----------------------------------------------------------- 1.05s

Since Debian apt is a Python program, it requires Python to install packages. The Python packages are corrupted, so Python will not work properly. This creates a deadlock condition. There is another way to install Python packages, however, so it can be used via Ansible ad-hoc mode:

$ ansible -m shell --become -a 'easy_install -U cffi' trident
yellow.devops.local | SUCCESS | rc=0 >>
Searching for cffi
Reading https://pypi.python.org/simple/cffi/
Best match: cffi 1.10.0
Downloading https://pypi.python.org/packages/5b/b9/790f8eafcdab455bcd3bd908161f802c9ce5adbf702a83aa7712fcc345b7/cffi-1.10.0.tar.gz#md5=2b5fa41182ed0edaf929a789e602a070
Processing cffi-1.10.0.tar.gz
Writing /tmp/easy_install-RmOJBU/cffi-1.10.0/setup.cfg
Running cffi-1.10.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-RmOJBU/cffi-1.10.0/egg-dist-tmp-lNCOck
compiling '_configtest.c':
__thread int some_threadlocal_variable_42;

x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -c
 _configtest.c -o _configtest.o
success!
removing: _configtest.c _configtest.o
compiling '_configtest.c':
int main(void) { __sync_synchronize(); return 0; }

x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -c
 _configtest.c -o _configtest.o
x86_64-linux-gnu-gcc -pthread _configtest.o -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
Adding cffi 1.10.0 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/cffi-1.10.0-py2.7-linux-x86_64.egg
Processing dependencies for cffi
Finished processing dependencies for cffi

purple.devops.local | SUCCESS | rc=0 >>
Searching for cffi
Reading https://pypi.python.org/simple/cffi/
Best match: cffi 1.10.0
Downloading https://pypi.python.org/packages/5b/b9/790f8eafcdab455bcd3bd908161f802c9ce5adbf702a83aa7712fcc345b7/cffi-1.10.0.tar.gz#md5=2b5fa41182ed0edaf929a789e602a070
Processing cffi-1.10.0.tar.gz
Writing /tmp/easy_install-fuS4hd/cffi-1.10.0/setup.cfg
Running cffi-1.10.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-fuS4hd/cffi-1.10.0/egg-dist-tmp-nOgko4
compiling '_configtest.c':
__thread int some_threadlocal_variable_42;

x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -c
 _configtest.c -o _configtest.o
success!
removing: _configtest.c _configtest.o
compiling '_configtest.c':
int main(void) { __sync_synchronize(); return 0; }

x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -c
 _configtest.c -o _configtest.o
x86_64-linux-gnu-gcc -pthread _configtest.o -o _configtest
success!
removing: _configtest.c _configtest.o _configtest
Adding cffi 1.10.0 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/cffi-1.10.0-py2.7-linux-x86_64.egg
Processing dependencies for cffi
Finished processing dependencies for cffi

Now we can back out the addition of the -U flag that caused the corruption and apply the base role to the two hosts using the master.yml playbook.

$ ansible-playbook master.yml --limit trident --tags base

. . .

PLAY [Configure host "purple.devops.local"] ***********************************

. . .

PLAY [Configure host "yellow.devops.local"] ***********************************

. . .

PLAY RECAP ********************************************************************
purple.devops.local        : ok=136  changed=29   unreachable=0    failed=0
yellow.devops.local        : ok=139  changed=53   unreachable=0    failed=0

Thursday 17 August 2017  11:20:08 -0700 (0:00:01.175)       0:10:03.307 *******
===============================================================================
base : Make defined bats tests present --------------------------------- 29.18s
base : Make defined bats tests present --------------------------------- 28.95s
base : Ensure dims (system-level) subdirectories exist ----------------- 15.89s
base : Ensure dims (system-level) subdirectories exist ----------------- 15.84s
base : Ensure required system python packages present ------------------- 8.81s
base : Make sure common (non-templated) BASH scripts are present -------- 8.79s
base : Make sure common (non-templated) BASH scripts are present -------- 8.74s
base : Ensure required system python packages present ------------------- 8.71s
base : Make subdirectories for test categories present ------------------ 6.84s
base : Make links to helper functions present --------------------------- 6.83s
base : Make subdirectories for test categories present ------------------ 6.83s
base : Make links to helper functions present --------------------------- 6.81s
base : Ensure bashrc additions are present ------------------------------ 4.63s
base : Ensure bashrc additions are present ------------------------------ 4.59s
base : Only "update_cache=yes" if >3600s since last update (Debian) ----- 4.45s
base : Make sure common (non-templated) Python scripts are present ------ 3.77s
base : Make sure common (non-templated) Python scripts are present ------ 3.77s
base : conditional restart docker --------------------------------------- 3.17s
base : Make sure common (templated) scripts are present ----------------- 2.96s
base : Make sure common (templated) scripts are present ----------------- 2.94s

In this case, the systems are now back to a functional state and the disruptive change backed out. Were these Vagrants, the problem of a broken system is lessened, so testing should always be done first on throw-away VMs. But on those occassions where something goes wrong on “production” hosts, Ansible ad-hoc mode is a powerful debugging and corrective capability.

11.3. Advanced Ansible Tasks or Jinja Templating

This section includes some advanced uses of Ansible task declaration and/or Jinja templating that may be difficult to learn from Ansible documentation or other sources. Some useful resources that were identified during the DIMS Project are listed in Section bestpractices.

11.3.1. Multi-line fail or debug Output

There are times when it is necessary to produce a long message in a fail or debug play. An answer to the stackoverflow post In YAML, how do I break a string over multiple lines? includes multiple ways to do this. Here is one of them in action in the virtualbox role:

---

# File: roles/virtualbox/tasks/main.yml

# Note: You can't just run 'vboxmanage list runningvms' to get
# a list of running VMs, unless running as the user that started
# the VMs. (Virtualbox keeps state on a per-user basis.)
# Instead, we are looking for running processes.

- name: Look for running VM guests
  shell: "ps -Ao user,pid,cmd|egrep '^USER|virtualbox/VBox.* --startvm'|grep -v ' egrep '"
  register: _vbox_result
  delegate_to: '{{ inventory_hostname }}'
  tags: [ virtualbox, packages ]

- name: Register number of running VM guests
  set_fact:
    _running_vms: '{{ ((_vbox_result.stdout_lines|length) - 1) }}'
  when: _vbox_result is defined
  tags: [ virtualbox, packages ]

- block:
  - fail: msg='Could not determine number of running VMs'
    when: _vbox_result is not defined or _running_vms is not defined
    tags: [ virtualbox, packages ]

  rescue:
  - name: Register failure
    set_fact:
      _running_vms: -1
    tags: [ virtualbox, packages ]

- fail:
    msg: |
      Found {{ _running_vms }} running Virtualbox VM{{ (_running_vms|int == 1)|ternary("","s") }}.
      Virtualbox cannot be updated while VMs are running.
      Please halt or suspend {{ (_running_vms|int == 1)|ternary("this VM","these VMs") }} and apply this role again.
      {% raw %}{% endraw %}
      {% for line in _vbox_result.stdout_lines|default([]) %}
      {{ line }}
      {% endfor %}
  when: _running_vms is defined and _running_vms|int >= 1
  tags: [ virtualbox, packages ]

- include: virtualbox.yml
  when: _running_vms is defined and _running_vms|int == 0
  tags: [ virtualbox, packages ]

# vim: ft=ansible :

When this code is triggered, the output is now clean and clear about what to do.

TASK [virtualbox : fail] *******************************************************************
task path: /home/dittrich/dims/git/ansible-dims-playbooks/roles/virtualbox/tasks/main.yml:33
Wednesday 06 September 2017  12:45:38 -0700 (0:00:01.046)       0:00:51.117 ***
fatal: [dimsdemo1.devops.develop]: FAILED! => {
    "changed": false,
    "failed": true
}

MSG:

Found 1 running Virtualbox VM.
Virtualbox cannot be updated while VMs are running.
Please halt or suspend this VM and apply this role again.

USER       PID CMD
dittrich 15289 /usr/lib/virtualbox/VBoxHeadless --comment orange_default_1504485887221_79778 --startvm 62e20c31-7c2c-417a-a5ab-3a056aa81e2d --vrde config