Cheat Sheets

Recent Posts

Network Split Test Scripts

Today I want to share two simple scripts for simulating a network split and rejoin between two groups of hosts. The split is done by adding per-host network blackhole routes on each host for all hosts of the other group.

Please be careful with using this. Forgetting a blackhole route can result in long hours of debugging as this is something you probably rarely use nowadays.

Script Usage

./ <filter1> <filter2> <hosts>
./ <filter1> <filter2> <hosts>
The script expects SSH equivalency and sudo on the target hosts. The filters are grep patterns.


group1_filter=$1; shift group2_filter=$1; shift hosts=$*

hosts1=$(echo $hosts | xargs -n1 | grep "$group1_filter") hosts2=$(echo $hosts | xargs -n1 | grep "$group2_filter")

if [ "$hosts1" == "" -o "$hosts2" == "" ]; then echo "ERROR: Syntax: $0 " exit 1 fi

for h in $hosts1; do echo "backlisting other zone on $h" for i in $hosts2; do ssh $h sudo route add $i gw lo done done for h in $hosts2; do echo "Backlisting other zone on $h" for i in $hosts1; do ssh $h sudo route add $i gw lo done done


group1_filter=$1; shift group2_filter=$1; shift hosts=$*

hosts1=$(echo $hosts | xargs -n1 | grep "$group1_filter") hosts2=$(echo $hosts | xargs -n1 | grep "$group2_filter")

if [ "$hosts1" == "" -o "$hosts2" == "" ]; then echo "ERROR: Syntax: $0 " exit 1 fi

for h in $hosts1; do echo "De-blacklisting other zone on $h" for i in $hosts2; do ssh $h sudo route del $i gw lo done done for h in $hosts2; do echo "De-blacklisting other zone on $h" for i in $hosts1; do ssh $h sudo route del $i gw lo done done

Sequence definitions with kwalify

After guess-trying a lot on how to define a simple sequence in kwalify (which I do use as a JSON/YAML schema validator) I want to share this solution for a YAML schema.

So my use case is whitelisting certain keys and somehow ensuring their types. Using this I want to use kwalify to validate YAML files. Doing this for scalars are simple, but hashes and lists of scalar elements are not. Most problematic was the lists...

Defining Arbitrary Scalar Sequences

So how to define a list in kwalify? The user guide gives this example:
  type: seq
     - type: str
This gives us a list of strings. But many lists also contain numbers and some contain structured data. For my use case I want to exclude structured date AND allow numbers. So "type: any" cannot be used. Also "type: any" would'nt work because it would require defining the mapping for any, which in a validation use case where we just want to ensure the list as a type, we cannot know. The great thing is there is a type "text" which you can use to allow a list of strings or number or both like this:
  type: seq
     - type: text

Building a key name + type validation schema

As already mentioned the need for this is to have a whitelisting schema with simple type validation. Below you see an example for such a schema:
type: map
  "default_definition": &allow_hash
     type: map
         type: any

"default_list_definition": &allow_list type: seq sequence: # Type text means string or number - type: text

"key1": *allow_hash "key2": *allow_list "key3": type: str

=: type: number range: { max: 29384855, min: 29384855 }
At the top there are two dummy keys "default_definition" and "default_list_definition" which we use to define two YAML references "allow_hash" and "allow_list" for generic hashes and scalar only lists.

In the middle of the schema you see three keys which are whitelisted and using the references are typed as hash/list and also as a string.

Finally for this to be a whitelist we need to refuse all other keys. Note that '=' as a key name stands for a default definition. Now we want to say: default is "not allowed". Sadly kwalify has no mechanism for this that allows expressing something like
    type: invalid
Therefore we resort to an absurd type definition (that we hopefully never use) for example a number that has to be exactly 29384855. All other keys not listed in the whitelist above, hopefully will fail to be this number an cause kwalify to throw an error.

This is how the kwalify YAML whitelist works.

PyPI does brownouts for legacy TLS

Nice! Reading through the maintenance notices on my status page aggregator I learned that PyPI started intentionally blocking legacy TLS clients as a way of getting people to switch before TLS 1.0/1.1 support is gone for real.

Here is a quote from their status page:

In preparation for our CDN provider deprecating TLSv1.0 and TLSv1.1 protocols, we have begun rolling brownouts for these protocols for the first ten (10) minutes of each hour.

During that window, clients accessing with clients that do not support TLSv1.2 will receive an HTTP 403 with the error message "This is a brown out of TLSv1 support. TLSv1 support is going away soon, upgrade to a TLSv1.2+ capable client.".

I like this action as a good balance of hurting as much as needed to help end users to stop putting of updates.

Puppet Agent Settings Issue

Experienced a strange puppet agent 4.8 configuration issue this week. To properly distribute the agent runs over time to even out puppet master load I wanted to configure the splay settings properly. There are two settings:

What first confused me was the "splay" was not on per-default. Of course when using the open source version it makes sense to have it off. Having it on per-default sounds more like an enterprise feature :-)

No matter the default after deploying an agent config with settings like this
runInterval = 3600
splay = true
splayLimit = 3600
... nothing happened. Runs were still not randomized. Checking the active configuration with
# puppet config print | grep splay
turned out that my config settings were not working at all. What was utterly confusing is that even the runInterval was reported as 1800 (which is the default value). But while the splay just did not work the effective runInterval was 3600!

After hours of debugging it, I happened to read the puppet documentation section that covers the config sections like [agent] and [main]. It says that [main] configures global settings and other sections can override the settings in [main], which makes sense.

But it just doesn't work this way. In the end the solution was using [main] as config section instead of [agent]:
and with this config "puppet config print" finally reported the settings as effective and the runtime behaviour had the expected randomization.

Maybe I misread something somewhere, but this is really hard to debug. And INI file are not really helpful in Unix. Overriding works better default files and with drop dirs.

Python re.sub Examples

Example for re.sub() usage in Python


import re

result = re.sub(pattern, repl, string, count=0, flags=0);

Simple Examples

num = re.sub('abc',  '',    input)           # Delete pattern abc
num = re.sub('abc',  'def', input)           # Replace pattern abc -> def
num = re.sub(r'\s+', ' ',   input)           # Eliminate duplicate whitespaces
num = re.sub('abc(def)ghi', r'\1', input)    # Replace a string with a part of itself
Note: Take care to always prefix patterns containing \ escapes with raw strings (by adding an r in front of the string). Otherwise the \ is used as an escape sequence and the regex won't work.

Advance Usage

Replacement Function

Instead of a replacement string you can provide a function performing dynamic replacements based on the match string like this:
def my_replace(m):
    if :
       return <replacement variant 1>
    return <replacement variant 2>

result = re.sub(r"\w+", my_replace, input)

Count Replacements

When you want to know how many replacements did happen use re.subn() instead:
result = re.sub(pattern, replacement, input)
print ('Result: ', result[0])
print ('Replacements: ', result[1])

See also: Python Syntax Python re.match Python re.sub

Helm Error: cannot connect to Tiller

Today I ran "helm" and got the following error:

$ helm status
Error: could not find tiller
It took me some minutes to find the root cause. First thing I thought was, that the tiller installation was gone/broken, which turned out to be fine. The root cause was that the helm client didn't select the correct namespace and probably stayed in the current namespace (where tiller isn't located).

This is due to the use of an environment variable $TILLER_NAMESPACE (as suggested in the setup docs) which I forgot to persist in my shell.

So running
$ TILLER_NAMESPACE=tiller helm status
solved the issue.

Using Linux keyring secrets from your scripts

When you write script that need to perform remote authentication you don't want to include passwords plain text in the script itself. And if the credentials are personal credentials you cannot deliver them with the script anyway.


Since 2008 the Secret Service API is standardized via and is implemented by GnomeKeyring and ksecretservice. Effectivly there is standard interface to access secrets on Linux desktops.

Sadly the CLI tools are rarely installed by default so you have to add them manually. On Debian
apt install libsecret-tools

Using secret-tool

There are two important modes:

Fetching passwords

The "lookup" command prints the password to STDOUT
/usr/bin/secret-tool lookup <key> <name>

Storing passwords

Note that with "store" you do not pass the password, as a dialog is raised to add it.
/usr/bin/secret-tool store <key> <name>

Scripting with secret-tool

Here is a simple example Bash script to automatically ask, store and use a secret:

ST=/usr/bin/secret-tool LOGIN="my-login" # Unique id for your login LABEL="My special login" # Human readable label

get_password() { $ST lookup "$LOGIN" "$USER" }

password=$( get_password ) if [ "$password" = "" ]; then $ST store --label "$LABEL" "$LOGIN" "$USER" password=$( get_password ) fi

if [ "$password" = "" ]; then echo "ERROR: Failed to fetch password!" else echo "Credentials: user=$USER password=$password" fi

Note that the secret will appear in the "Login" keyring. On GNOME you can check the secret with "seahorse".

How to install Helm on Openshift

This is a short summary of things to consider when installing Helm on Openshift.

What is Helm?

Before going into details: helm is a self-proclaimed "Kubernetes Package Manager". While this is not entirly false in my opinion it is three thingsWhen looking closer it does more of the stuff that automation tools like Puppet, Chef and Ansible do.

Current Installation Issues

Since kubernetes v1.6.1, which introduced RBAC (role based access control) it became harder to properly install helm. Actually the simple installation as suggested on the homepage
# Download and...
helm init
seems to work, but as soon as you run commands like
helm list
you get permission errors. This of course being caused by the tighter access control now being in place. Sadly even now being at kubernetes 1.8 helm still wasn't updated to take care of the proper permissions.

Openshift to the rescue...

As Redhat somewhat pioneered RBAC in Openshift with their namespace based "projects" concept they are also the ones with a good solution for the helm RBAC troubles.

Setting up Helm on Openshift

Client installation (helm)

curl -s | tar xz
sudo mv linux-amd64/helm /usr/local/bin
sudo chmod a+x /usr/local/bin/helm

helm init --client-only

Server installation (tiller)

With helm being the client only, Helm needs an agent named "tiller" on the kubernetes cluster. Therefore we create a project (namespace) for this agent an install it with "oc create"
export TILLER_NAMESPACE=tiller
oc new-project tiller
oc project tiller
oc process -f -p TILLER_NAMESPACE="${TILLER_NAMESPACE}" | oc create -f -
oc rollout status deployment tiller

Preparing your projects (namespaces)

Finally you have to give tiller access to each of the namespaces you want someone to manage using helm:
export TILLER_NAMESPACE=tiller
oc project 
oc policy add-role-to-user edit "system:serviceaccount:${TILLER_NAMESPACE}:tiller"
After you did this you can deploy your first service, e.g.
helm install stable/redis --namespace 

See also Helm - Cheat Sheet kubernetes - Cheat Sheet Openshift - Cheat Sheet


See also ulimit - Cheat Sheet

Sometimes you need to increase the open file limit for an application server or the maximum shared memory for your ever-growing master database. In such a case you edit your /etc/security/limits.conf and then wonder how to get the changed limits to be visible to check wether you have set them correctly. You do not want to find out that they were wrong after your master DB doesn't come up after some incident in the middle of the night...

Instant Applying Limits to Running Processes

Actually you might want to apply the changes directly to a running process additionally to changing /etc/security/limits.conf. In recent edge Linux distributions (e.g. Debian Jessie) there is a tool "prlimit" to get/set limits.

Usage for changing limits for a PID is

prlimit --pid <pid> --<limit>=<soft>:<hard>
for example
prlimit --pid 12345 --nofile=1024:2048
If you are unlucky and do not have prlimit yet check out "man 2 prlimit" for instructions on how to compile your own version because despite missing user tool the prlimit() system call is in the kernel for quite a while (since 2.6.36).

Alternative #1: Re-Login with "sudo -i"

If you do not have prlimit yet and want a changed limit configuration to become visible you might want to try "sudo -i". The reason: you need to re-login as limits from /etc/security/* are only applied on login!

But wait: what about users without login? In such a case you login as root (which might not share their limits) and sudo into the user: so no real login as the user. In this case you must ensure to use the "-i" option of sudo:
sudo -i -u <user>
to simulate an initial login with sudo. This will apply the new limits.

Alternative #2: Make it work for sudo without "-i"

Wether you need "-i" depends on the PAM configuration of your Linux distribution. If you need it then PAM probably loads "" only in /etc/pam.d/login which means at login time but no on sudo. This was introduced in Ubuntu Precise for example. By adding this line

session    required
in /etc/pam.d/sudo limits will also be applied when running sudo without "-i". Still using "-i" might be easier.

Finally: Always Check Effective Limits

The best way is to change the limits and check them by running
prlimit               # for current shell
prlimit --pid <pid>   # for a running process
because it shows both soft and hard limits together. Alternatively call
ulimit -a                # for current shell
cat /proc/<pid>/limits   # for a running process
with the affected user.

Nagios Check

You might also want to have a look at the nofile limit Nagios check.

Nagios Check Plugin for nofile Limit

Following the recent post on how to investigate limit related issues which gave instructions what to check if you suspect a system limit to be hit I want to share this Nagios check to cover the open file descriptor limit. Note that existing Nagios plugins like this only check the global limit, only check one application or do not output all problems. So here is my solution which does:

  1. Check the global file descriptor limit
  2. Uses lsof to check all processes "nofile" hard limit
It has two simple parameters -w and -c to specify a percentage threshold. An example call:
./ -w 70 -c 85
could result in the following output indicating two problematic processes:
WARNING memcached (PID 2398) 75% of 1024 used CRITICAL apache (PID 2392) 94% of 4096 used
Here is the check script doing this:

# Check "nofile" limit for all running processes using lsof

MIN_COUNT=0 # default "nofile" limit is usually 1024, so no checking for # processes with much less open fds needed

WARN_THRESHOLD=80 # default warning: 80% of file limit used CRITICAL_THRESHOLD=90 # default critical: 90% of file limit used

while getopts "hw:c:" option; do case $option in w) WARN_THRESHOLD=$OPTARG;; c) CRITICAL_THRESHOLD=$OPTARG;; h) echo "Syntax: $0 [-w <warning percentage>] [-c <critical percentage>]"; exit 1;; esac done

results=$( # Check global limit global_max=$(cat /proc/sys/fs/file-nr 2>&1 |cut -f 3) global_cur=$(cat /proc/sys/fs/file-nr 2>&1 |cut -f 1) ratio=$(( $global_cur * 100 / $global_max))

if [ $ratio -ge $CRITICAL_THRESHOLD ]; then echo "CRITICAL global file usage $ratio% of $global_max used" elif [ $ratio -ge $WARN_THRESHOLD ]; then echo "WARNING global file usage $ratio% of $global_max used" fi

# We use the following lsof options: # # -n to avoid resolving network names # -b to avoid kernel locks # -w to avoid warnings caused by -b # +c15 to get somewhat longer process names # lsof -wbn +c15 2>/dev/null | awk '{print $1,$2}' | sort | uniq -c |\ while read count name pid remainder; do # Never check anything above a sane minimum if [ $count -gt $MIN_COUNT ]; then # Extract the hard limit from /proc limit=$(cat /proc/$pid/limits 2>/dev/null| grep 'open files' | awk '{print $5}')

# Check if we got something, if not the process must have terminated if [ "$limit" != "" ]; then ratio=$(( $count * 100 / $limit )) if [ $ratio -ge $CRITICAL_THRESHOLD ]; then echo "CRITICAL $name (PID $pid) $ratio% of $limit used" elif [ $ratio -ge $WARN_THRESHOLD ]; then echo "WARNING $name (PID $pid) $ratio% of $limit used" fi fi fi done )

if echo $results | grep CRITICAL; then exit 2 fi if echo $results | grep WARNING; then exit 1 fi

echo "All processes are fine."
Use the script with caution! At the moment it has no protection against a hanging lsof. So the script might mess up your system if it hangs for some reason. If you have ideas how to improve it please share them in the comments!

Solving rtl8812au installation on Ubuntu 17.10

After several fruitless attempts on getting my new dual band Wifi stick to work on my PC I went the hard way to compiling the driver.

Figuring out which driver to use

As drivers do support device ids the first thing is to determine the id
$ lsusb
Bus 002 Device 013: ID 0bda:a811 Realtek Semiconductor Corp. 
So the id being "0bda:a811" you can search online for a list of driver names. Google suggests rtl8812au as related searches...

Finding a source repo

At github you can find several source repos for the rtl8812au driver in of different age. It seems that Realtek is supplying the source on some driver CDs and different people independently put them online. The hard part is to find the most recent one, as only this has fixes for recent kernels.

One with patches for kernel 4.13.x is the one from Vital Koshalev which has already a PR against the most commonly referenced repo of diederikdehaas which doesn't work yet!.

Getting the right source

So fetch the source as following
git clone
git checkout driver-4.3.22-beta-mod

Compilation + Installation of rtl8812AU

The instructions are from the and are to be run as root:
mkdir /usr/src/${DRV_NAME}-${DRV_VERSION}
git archive driver-${DRV_VERSION} | tar -x -C /usr/src/${DRV_NAME}-${DRV_VERSION}
dkms add -m ${DRV_NAME} -v ${DRV_VERSION}
dkms build -m ${DRV_NAME} -v ${DRV_VERSION}
dkms install -m ${DRV_NAME} -v ${DRV_VERSION}
See also DKMS - Cheat Sheet

Loading rtl8812AU

If everything worked well you should now be able to issue
modprobe 8812au
Note the missing "rtl"!!!

If the module loads, but the wifi stick doesn't work immediately it might be that the rtlwifi driver is preventing the self-compiled module from working. So remove it with
rmmod rtlwifi
It will complain about dependencies. You need to rmmod those too. Afterwards the new driver should load properly. To make disabling rtlwifi persistent add it to the modprobe blacklist:
echo "blacklist rtlwifi" >>/etc/modprobe.d/blacklist.conf
Please leave feedback on the instructions if you have problems!

Ensure secure Javascript dependencies

When you write Javascript code or when you want to know if a 3rd party code bases dependencies are secure check out which is an online scanner for github repos package.json contents.

This tool is able to generate badges and gives you details on dependencies

Here is a screenshot of some vulnerable deps

and the badge as seen on the corresponding github page:

While I do not like the badge explosion on it still is an amazingly useful tool to know the issue with this library just looking at the github project.

Openshift S2I and Spring profiles

When porting Springboot applications to Openshift using S2I (source to image) directly from a git repo you cannot rely on a start script passing the proper<profile name> parameter like this

java -jar yourApplication.jar
The only proper ways for injecting application configuration are
  1. Add it to the artifact/repository (meeh)
  2. Mount it using a config map
  3. Pass it via Docker environment variables
And for Openshift variant #3 works fine as there is an environment variable SPRING_PROFILES_ACTIVE which you can add in your deployment configuration and set it to your favourite spring profile name.

USB seq nnnn is taking a long time

When your dmesg/journalctl/syslog says something like

Nov 14 21:43:12 Wolf systemd-udevd[274]: seq 3634 '/devices/pci0000:00/0000:00:14.0/usb2/2-1' is taking a long time
then know that the only proper manly response can be
systemctl restart udev
Don't allow for disrespectful USB messages!!!

Openshift Ultra-fast Bootstrap

Today I want to share some hints on ultra-fast bootstrapping developers to use Openshift. Given that adoption in your organisation depends on developers daring and wanting to use Kubernetes/Openshift I believe showing a clear and easy migration path is the way to go.

Teach the Basics by Failing!

Actually why not treating Openshift as a user-friendly self-service? Naively approach it and try stuff.

So hold a workshop. Ask people to:
  1. Not use the CLI for now!!! Don't even think about it. Automation comes later!
  2. Login and create a project. That usually works well.
  3. Decide on Docker Image / Source to Image. In the second dialog of the project creation you get presented with those three tabs

    Let them choose their poison.

    Let the image pull fail because they don't find docker images and don't know that they cannot just fetch stuff from Explain why this is the case. Show them where to find your preferred base and runtime images in you internal registry which of course is already configured, ready to be used. Show them the base image you suggest.

    Let the template creation fail using an already prepared template. Show were to look up the build error and explain where to find the infamously hidden secrets option everyone needs.

  4. Once the first build fails: explain the logic of applications in Openshift and that they did not only create a project, but also an application. Show the difference of locating builds and deployments. Show how to access logs of both and how to find 'Edit' hidden in the 'Actions' menu.
  5. When the build fails du to SSH connection refused: Explain that (even when using a source secret you already prepared) you need to put the public key in your favourite SCM either globally, per project or per repo for the code pull to work.
  6. When people check the pod first and see it isn't running: Explain again and again the holy trinity of checking stuff:
    1. First check the build
    2. Then check the deployment
    3. Only then check if pods do come up
  7. Finally the pod is green! People will access the deployed application and ask you how? Now is the time to have a short excurse on service and routing. Maybe show an already configured defautl. If some service isn't accessible:
    1. Show how to get the pods TCP endpoint
    2. Show how to attach to a container via the GUI / CLI

Have Docs and Examples Ready

Most important of course is preparation. Do prepare
  1. Walkthrough screenshots
  2. At least one runtime template
  3. At least one base image on your own registry
  4. At least one S2I ready source repository with a hello-world app
  5. Global project settings with
    • a default SSH key for source pulling
    • access configured for your own docker registry
  6. Maybe make an example project visible for all newbies

Things to avoid...

This is of course quite opinionated, but think about it:


Be prepared to iterate this again and again as often as needed.

Docker disable ext4 journaling

Noteworthy point from the Remind Ops: when running docker containers on ext4 consider disabling journaling. Why, because a throw-away almost read-only filesystem doesn't need recovery on crash.

See also Docker - Cheat Sheet

Gedit ShellCheck Linter Plugin

Today I had enough of the missing shell linting support in Gedit. So I took the time and derived a gedit-shellcheck plugin from an already existing JSHint plugin written by Xavier Gendre.


The linter can be run using Ctrl-J or from the Tools menu 'Check with ShellCheck'. Here is a screenshot


To use the plugin you need ShellCheck
apt install shellcheck
and you need to place the plugin in your Gedit plugins folder.
git clone
mkdir -p ~/.local/share/gedit/plugins/
cp -r gedit-shellcheck/shellcheck.plugin gedit-shellcheck/shellcheck/ ~/.local/share/gedit/plugins/
Finally restart Gedit and activate the plugin.