8. Regular System Maintenance

This chapter covers regular system maintenance tasks, such as updating the ansible-dims-playbooks repo and related private customization repository, upgrading operating system packages, and generally keeping system components clean and up to date.

8.1. Updating Operating System Packages

Updating system packages, especially security patches, is an important part of ensuring the integrity, availability, and confidentiality of information and information systems. The availability aspect is sometimes a concern when applying updates, so using the multi-deployment model adopted by the DIMS Project to allow easier testing of system components after patching on a test deployment before applying updates to “production” deployment systems helps allay concerns.

There are two bats system tests that are designed to make the normal system updating process easier to automate and apply across the entire deployment: the system/updates and system/reboot tests. Both of these tests can be run at once using the following command line:

$ test.runner --match "updates|reboot"
[+] Running test system/updates
 ✗ [S][EV] All APT packages are up to date (Ubuntu)
   (from function `assert' in file system/helpers.bash, line 18,
    in test file system/updates.bats, line 11)
     `assert "0 packages can be updated. 0 updates are security updates." bash -c "/usr/lib/update-notifier/apt-check --human-readable"' failed
   linux-headers-4.4.0-92-generic
   google-chrome-stable
   xul-ext-ubufox
   firefox
   linux-image-4.4.0-92-generic
   linux-generic-lts-xenial
   libgd3
   linux-headers-4.4.0-92
   linux-headers-generic-lts-xenial
   linux-image-extra-4.4.0-92-generic
   linux-image-generic-lts-xenial
   expected: "0 packages can be updated. 0 updates are security updates."
   actual:   "11 packages can be updated.10 updates are security updates."

1 test, 1 failure

[+] Running test system/reboot
 ✓ [S][EV] System does not require a reboot (Ubuntu)

1 test, 0 failures

In this case, the tests show that the system has updates most of them security updates, ready to apply. The updates test failed, but the reboot test passed.

Now apply the updates tag to update and install upgrades.

$ run.playbook --tags updates -e packages_upgrade=yes

PLAY [Configure host "dimsdemo1.devops.develop"] ******************************
. . .
TASK [base : Check to see if update-manager is running on Ubuntu] *************
Wednesday 16 August 2017  13:06:29 -0700 (0:00:01.049)       0:00:05.392 ******
changed: [dimsdemo1.devops.develop]

TASK [base : Kill update_manager to avoid dpkg lock contention] ***************
Wednesday 16 August 2017  13:06:30 -0700 (0:00:01.239)       0:00:06.631 ******
skipping: [dimsdemo1.devops.develop]

TASK [base : Check to see if gpk-update-viewer is running on Ubuntu] **********
Wednesday 16 August 2017  13:06:31 -0700 (0:00:01.049)       0:00:07.681 ******
skipping: [dimsdemo1.devops.develop]

TASK [base : Kill gpk-update-viewer to avoid dpkg lock contention] ************
Wednesday 16 August 2017  13:06:32 -0700 (0:00:01.048)       0:00:08.729 ******
skipping: [dimsdemo1.devops.develop]

TASK [base : Make sure blacklisted packages are absent (Debian)] **************
Wednesday 16 August 2017  13:06:33 -0700 (0:00:01.084)       0:00:09.814 ******
ok: [dimsdemo1.devops.develop] => (item=[u'modemmanager', u'resolvconf',
u'sendmail', u'whoopsie', u'libwhoopsie0'])

TASK [base : Only "update_cache=yes" if >3600s since last update (Debian)] ****
Wednesday 16 August 2017  13:06:35 -0700 (0:00:02.015)       0:00:11.829 ******
ok: [dimsdemo1.devops.develop]

TASK [base : Make sure required APT packages are present (Debian)] ************
Wednesday 16 August 2017  13:06:37 -0700 (0:00:01.610)       0:00:13.440 ******
ok: [dimsdemo1.devops.develop] => (item=[u'apt-transport-https', u'bash-completion',
u'ca-certificates', u'cpanminus', u'curl', u'dconf-tools', u'git-core',
u'default-jdk', u'gitk', u'gnupg2', u'htop', u'hunspell', u'iptables-persistent',
u'ifstat', u'make', u'myrepos', u'netcat', u'nfs-common', u'chrony', u'ntpdate',
u'openssh-server', u'patch', u'perl', u'postfix', u'python', u'python-apt',
u'remake', u'rsync', u'rsyslog', u'sshfs', u'sshpass', u'strace', u'tree', u'vim',
u'xsltproc', u'chrony', u'nfs-kernel-server', u'smartmontools', u'unzip'])

TASK [base : Make upgraded packages present if we are explicitly upgrading] ***
Wednesday 16 August 2017  13:06:38 -0700 (0:00:01.750)       0:00:15.190 ******
changed: [dimsdemo1.devops.develop]

TASK [base : Check proxy availability] ****************************************
Wednesday 16 August 2017  13:09:12 -0700 (0:02:33.389)       0:02:48.580 ******
. . .
PLAY RECAP ********************************************************************
dimsdemo1.devops.develop   : ok=72   changed=4    unreachable=0    failed=0

Wednesday 16 August 2017  13:10:28 -0700 (0:00:01.069)       0:04:04.737 ******
===============================================================================
base : Make upgraded packages present if we are explicitly upgrading -- 153.39s
. . .

Note

The flag -e packages_upgrade=yes sets the variable packages_upgrade that must evaluate to true in order for packages to be updated in the role. This is to ensure that package updates are done in a controlled manner. Set this variable to something that Ansible evaluates as true on the command line, or somewhere in the host vars section of the inventory.

Now re-run the two tests.

$ test.runner --match "updates|reboot"
[+] Running test system/updates
 ✓ [S][EV] All APT packages are up to date (Ubuntu)

1 test, 0 failures

[+] Running test system/reboot
 ✗ [S][EV] System does not require a reboot (Ubuntu)
   (in test file system/reboot.bats, line 8)
     `@test "[S][EV] System does not require a reboot (Ubuntu)" {' failed
   linux-image-4.4.0-92-generic
   linux-base
   linux-base

1 test, 1 failure

This time the updates test passes, but notice that some of the updates require a reboot, so that test fails. This means that a reboot needs to be planned and executed carefully, to ensure minimal disruption to anything dependent on this system (e.g., running virtual machines on a development system).

Attention

A developer workstation or production VM host running virtual machines needs to have the virtual machines shut down or suspended prior to a reboot of the VM host in order to ensure the VMs or the VM host does not lose network interfaces that are using DHCP. The VM host may lose a vboxnet interface, a VM may lose an eth interface, or both.

  • Vagrants are handled as part of the shutdown process when you use the dims.shutdown wrapper script. After reboot, use dims.shutdown --resume (optionally with --group to select specific Vagrants by name or group) to resume them.
  • Virtualbox VMs that were created by hand are not yet supported by dims.shutdown. Use the virtualbox management GUI to cleanly shut down any running VMs (and again after reboot, to bring them back up.) If this is a remote VM host, use remmina and the VNC wrapper script described in Section Validating VNC over SSH Tunnelling to run the virtualbox management GUI remotely.

Using Ansible ad-hoc mode, the checks can be performed on multiple hosts at once:

$ ansible -m shell -a 'test.runner --match reboot' trident
yellow.devops.develop | SUCCESS | rc=0 >>
# [+] Running test system/reboot
1..1
ok 1 [S][EV] System does not require a reboot (Debian)
#

purple.devops.develop | SUCCESS | rc=0 >>
# [+] Running test system/reboot
1..1
ok 1 [S][EV] System does not require a reboot (Debian)
#

As a convenience for the system administrator, a cron job is managed by the base role that runs a script named dims.updatecheck on a daily basis. The variables that control the cron job are defined in the group_vars/all/dims.yml file:

cronjobs:
  - name: 'dims.updatecheck'
    weekday: '*'
    hour: '6'
    minute: '0'
    user: 'ansible'
    job: '{{ dims_bin }}/dims.updatecheck'

The base role creates the following file:

$ cat /etc/cron.d/dims
#Ansible: dims.updatecheck
0 6 * * * ansible /opt/dims/bin/dims.updatecheck

When updates are available, or a reboot is required, email is sent to the root account. Make sure that email to this account is forwarded by setting the postmaster variable to a valid email address. An example of the message that will be sent is shown here:

To: dittrich@u.washington.edu
Subject: dims.updatecheck results from purple.ops.ectf (2017-09-01T23:06:02.211268+00:00)
Message-Id: <20170901230603.9D3C3582@breathe.prisem.washington.edu>
Date: Fri,  1 Sep 2017 16:06:03 -0700 (PDT)
From: root@breathe.prisem.washington.edu (root)

-----------------------------------------------------------------------

Host: purple.ops.ectf
Date: 2017-09-01T23:06:02.211268+00:00

This is a report of available package updates and/or required reboot
status.  The output of the bats tests that were run is included below.

If package updates are necessary, this can be accomplished by running
the Ansible playbook for purple.ops.ectf with the following options:

   --tags updates -e packages_update=true

If a reboot is necessary, ensure that the host (and anyone using it)
is prepared for the reboot:

  o Ensure that all users of external services are aware of any
    potential outage of services provided by this host (or its
    (VMs).

  o Halt or suspend any VMs if this is a VM host (and be prepared
    to ensure they are restart after rebooting is complete.)
    (Use the "dims.shutdown" script to facilitate this. See
    documentation and/or "dims.shutdown --usage".)

  o Notify any active users to ensure no active development work
    is lost.

-----------------------------------------------------------------------
test.runner --tap --match "updates|reboot"

# [+] Running test system/updates
1..1
not ok 1 [S][EV] All APT packages are up to date (Debian)
# (from function `assert' in file system/helpers.bash, line 18,
#  in test file system/updates.bats, line 12)
#   `assert "0 packages can be updated." bash -c "apt list --upgradable 2>/dev/null"' failed
#
# WARNING: apt does not have a stable CLI interface yet. Use with caution in scripts.
#
# expected: "0 packages can be updated."
# actual:   "Listing...firefox-esr/oldstable 52.3.0esr-1~deb8u2 amd64 [upgradable fro
m: 52.2.0esr-1~deb8u1]gir1.2-soup-2.4/oldstable 2.48.0-1+deb8u1 amd64 [upgradable fro
m: 2.48.0-1]git/oldstable 1:2.1.4-2.1+deb8u4 amd64 [upgradable from: 1:2.1.4-2.1+deb8
u3]git-core/oldstable 1:2.1.4-2.1+deb8u4 all [upgradable from: 1:2.1.4-2.1+deb8u3]git
-man/oldstable 1:2.1.4-2.1+deb8u4 all [upgradable from: 1:2.1.4-2.1+deb8u3]gitk/oldst
able 1:2.1.4-2.1+deb8u4 all [upgradable from: 1:2.1.4-2.1+deb8u3]iceweasel/oldstable
52.3.0esr-1~deb8u2 all [upgradable from: 52.2.0esr-1~deb8u1]libdbd-pg-perl/jessie-pgd
g 3.6.2-1~pgdg80+1 amd64 [upgradable from: 3.4.2-1]libgd3/oldstable 2.1.0-5+deb8u10 a
md64 [upgradable from: 2.1.0-5+deb8u9]libpq5/jessie-pgdg 9.6.4-1.pgdg80+1 amd64 [upgr
adable from: 9.4.13-0+deb8u1]libsoup-gnome2.4-1/oldstable 2.48.0-1+deb8u1 amd64 [upgr
adable from: 2.48.0-1]libsoup2.4-1/oldstable 2.48.0-1+deb8u1 amd64 [upgradable from:
2.48.0-1]"
#
# [+] Running test system/reboot
1..1
ok 1 [S][EV] System does not require a reboot (Debian)
#

-----------------------------------------------------------------------

8.2. Renewing Letsencrypt Certificates

The imported role ansible-role-certbot that is being used for Letsencrypt support creates a crontab entry in the ansible account to automatically renew the certificate when it is about to expire. You can see the crontab entry using Ansible ad-hoc mode:

$ ansible -m shell -a 'crontab -l' trident
yellow.devops.develop | SUCCESS | rc=0 >>
#Ansible: Certbot automatic renewal.
20 5 * * * /opt/certbot/certbot-auto renew --quiet --no-self-upgrade

purple.devops.develop | SUCCESS | rc=0 >>
#Ansible: Certbot automatic renewal.
20 5 * * * /opt/certbot/certbot-auto renew --quiet --no-self-upgrade

You can always run this command whenever you want, again using Ansible ad-hoc mode:

$ ansible -m shell -a '/opt/certbot/certbot-auto renew --no-self-upgrade' trident
purple.devops.develop | SUCCESS | rc=0 >>
Requesting root privileges to run certbot...
  /home/ansible/.local/share/letsencrypt/bin/letsencrypt renew --no-self-upgrade

-------------------------------------------------------------------------------
Processing /etc/letsencrypt/renewal/breathe.prisem.washington.edu.conf
-------------------------------------------------------------------------------

The following certs are not due for renewal yet:
  /etc/letsencrypt/live/breathe.prisem.washington.edu/fullchain.pem (skipped)
No renewals were attempted.Saving debug log to /var/log/letsencrypt/letsencrypt.log
Cert not yet due for renewal

yellow.devops.develop | SUCCESS | rc=0 >>
Requesting root privileges to run certbot...
  /home/ansible/.local/share/letsencrypt/bin/letsencrypt renew --no-self-upgrade

-------------------------------------------------------------------------------
Processing /etc/letsencrypt/renewal/echoes.prisem.washington.edu.conf
-------------------------------------------------------------------------------

The following certs are not due for renewal yet:
  /etc/letsencrypt/live/echoes.prisem.washington.edu/fullchain.pem (skipped)
No renewals were attempted.Saving debug log to /var/log/letsencrypt/letsencrypt.log
Cert not yet due for renewal

8.3. Updating Secondary Components

The package update steps above perform what you could call a first order update process, that is, updating the packages for the major components of the operating system. Some of these components, however, themselves use plugins or other sub-components that require updating. This is most disruptive for major releases (e.g., going from PyCharm 2016.2 to 2016.3, as shown in the next section).

A development system will have more of these components requiring secondary updates. Partly because of this reason, these type of components are pinned to a specific version. When updating the ansible-dims-playbooks, take note of the changes and check for required secondary updates.

Attention

You will sometimes need to communicate the need for these secondary updates to users of the system (e.g., to developers) because some tools like Vagrant and PyCharm keep plugins in users’ accounts, not in system directories. As it is difficult to automate this process in a robust way, each user must take responsibility for updating their own plugins to avoid having their toolset go out-of-sync with other developers and cause random failures that are difficult to track down.

In this section, we cover updating Vagrant and PyCharm.

8.3.1. Updating Vagrant Plugins

Vagrant is used for development using Virtualbox virtual machines. It has a few plugins that were adopted (or at least experimentally used) during DIMS development.

After upgrading Vagrant to a new version, users can update their plugins with the following command:

$ vagrant plugin update
Updating installed plugins...
Fetching: vagrant-ignition-0.0.3.gem (100%)
Successfully uninstalled vagrant-ignition-0.0.1
Updated 'vagrant-ignition' to version '0.0.3'!
Updated 'vagrant-scp' to version '0.5.7'!
Updated 'vagrant-share' to version '1.1.9'!
Updated 'vagrant-triggers' to version '0.5.3'!
Updated 'vagrant-vbguest' to version '0.14.2'!

8.3.2. Updating PyCharm Community Edition

PyCharm is installed using Ansible. The normal workflow for updating a component like PyCharm is to test the new version to ensure it works properly, then update the variables for PyCharm in the Ansible inventory before exporting your old settings and then running the pycharm role for your development system.

PyCharm keeps all of its state, including settings, breakpoints, indexes, in internal data stores in a directory specific to the version of PyCharm being used. For example, PyCharm 2016.2.3 files are kept in $HOME/.PyCharm2016.2. When updating to the release 2016.3.1, the location changes to $HOME/.PyCharmCE2016.3. You need to run PyCharm 2016.2.3 to export your settings, then run the new PyCharm 2016.3.1 version to import them.

To export settings, run PyCharm 2016.2.3 and select File>Export Settings.... A dialog will pop up that allows you to select what to export and where to export it. You can use the defaults (pay attention to where the exported setting file is located, since you need to select it in the next step.) Select Ok to complete the export. See Figure Exporting Settings from PyCharm 2016.2.3.

Exporting Settings from PyCharm 2016.2.3

Exporting Settings from PyCharm 2016.2.3

After PyCharm has been updated, select File>Import Settings... and select the .jar file that was created in the previous step and then select Ok. Again, the defaults can be used for selecting the elements to import. See Figure Importing Settings to PyCharm 2016.3.1.

Importing Settings from PyCharm 2016.3.1

Importing Settings to PyCharm 2016.3.1

Once you have completed this process and are successfully using version 2016.3.1, you can delete the old directory.

$ rm -rf ~/.PyCharm2016.2