Recent Posts

Building a Generic Sysadmin Policy Scanner

After writing the same scripts several times I decided it is time for a generic solution to check Debian servers for configuration consistency. As incidents and mistakes happen each organization collects a set of learnings (let's call it policies) that should be followed in the future. And one important truth is that the free automation and CM tools we use (Chef, Puppet, Ansible, cfengine, Saltstack...) allow to implement policies, but do not seem to care much about proofing correct automation.

How to ensure following policies?

But how to really ensure following these policies? The only way is by checking them and revisiting the check results frequently. One could build a script and send a daily/weekly mail report. This is always a custom solution and that's what I did several times already. So I do it one final time, but this times in a generic way.

Generic Policy Scanning

For me a generic configuration consistency / policy scanner has at least the following requirements:
  1. Optional generic pre-defined policies
  2. Optional custom user-defined policies
  3. Policies checked locally on the host
  4. Policies checked from CM systems
  5. Per host/hostgroup policy enabling
  6. Generic discovery of your hosts
  7. Dynamic per policy/group/host result filtering
  8. Customizable mail reports
  9. Result archival for audits
  10. Some simple trending
  11. Daily diffs, New findings, Resolved Isses
  12. Acknowledging Findings
I started implementing a simple solution (entirely bash and SSH based, realizing requirements 1,2,3,4,6,7,9,10) with It is quite easy to setup by configuring the type of and you can run it instantly with the default set of policy scanners (which of course not necessarily all make sense for all type of systems). Here is a list of the file name which give some indication of there function and groups:
By setting up the results and the static HTML (instructions in in some webserver document root you can browse through the results.


Result overview:

Filter details:

Debugging hiera-eyaml Encryption, Decryption failed

When Hiera works without any problems everything is fine. But when not it is quite hard to debug why it is not working. Here is a troubleshooting list for Hiera when used with hiera-eyaml-gpg.

hiera-eyaml-gpg Decryption failed

First check your GPG key list
gpg --list-keys --homedir=<.gnupg dir>
Check that at least one of the keys listed is in the recipients you use for decrypting. The recipients you used are either listed in your Hiera/Eyaml config file or in a file referenced from there.

To verify what you active config is run eyaml in tracing mode. Note that the "-t" option is only available in newer Eyaml versions (e.g. 2.0.5):
eyaml decrypt -v -t -f somefile.yaml
Trace output
[hiera-eyaml-core]           (Symbol) trace_given        =        (TrueClass) true              
[hiera-eyaml-core]           (Symbol) gpg_always_trust   =       (FalseClass) false             
[hiera-eyaml-core]           (Symbol) trace              =        (TrueClass) true              
[hiera-eyaml-core]           (Symbol) encrypt_method     =           (String) pkcs7             
[hiera-eyaml-core]           (Symbol) gpg_gnupghome      =           (String) /etc/hiera/.gnupg      
[hiera-eyaml-core]           (Symbol) pkcs7_private_key  =           (String) ./keys/private_key.pkcs7.pem
[hiera-eyaml-core]           (Symbol) version            =       (FalseClass) false             
[hiera-eyaml-core]           (Symbol) gpg_gnupghome_given =        (TrueClass) true              
[hiera-eyaml-core]           (Symbol) help               =       (FalseClass) false             
[hiera-eyaml-core]           (Symbol) quiet              =       (FalseClass) false             
[hiera-eyaml-core]           (Symbol) gpg_recipients_file =           (String) ./gpg_recipients  
[hiera-eyaml-core]           (Symbol) string             =         (NilClass)                   
[hiera-eyaml-core]           (Symbol) file_given         =        (TrueClass) true   
Alternatively try manually enforcing recipients and .gnupg location to make it work.
eyaml decrypt -v -t -f somefile.yaml --gpg-recipients-file=<recipients> --gpg-gnupghome=<.gnupg dir>
If it works manually you might want to add the keys ":gpg-recipients-file:" to hiera.yaml and ensure that the mandatory key ":gpg-gnupghome:" is correct.

Checking Necessary Gems

hiera-eyaml-gpg can be run with different GPG-libraries depending on the version you run. Check dependencies on Github.

A possible stack is the following
gem list
gpgme (2.0.5)
hiera (1.3.2)
hiera-eyaml (2.0.1)
hiera-eyaml-gpg (0.4)
The GEM gpgme additionally needs the C library
dpkg -l "*gpg*"
||/ Name                Version             Beschreibung
ii  libgpgme11          1.2.0-1.2+deb6u1    GPGME - GnuPG Made Easy

Using Correct Ruby Version

Another pitfall is running multiple Ruby versions. Ensure to install the GEMs into the correct one. One Debian consider using "ruby-switch" or manually running "update-alternatives" for "gem" and "ruby".

Ruby Switch

apt-get install ruby-switch
ruby-switch --set ruby1.9.1


# Print available versions
update-alternatives --list ruby
update-alternatives --list gem

# Show current config update-alternatives --display ruby update-alternatives --display gem

# If necessary change it update-alternatives --set ruby /usr/bin/ruby1.9.1 update-alternatives --set gem /usr/bin/gem1.9.1

Debugging dovecot ACL Shared Mailboxes Not Showing in Thunderbird

When you can't get ACL shared mailboxes visible with Dovecot and Thunderbird here are some debugging tipps:
  1. Thunderbird fetches the ACLs on startup (and maybe at some other interval). So for testing restart Thunderbird on each change you make.
  2. Ensure the shared mailboxes index can be written. You probably have it configured like
    plugin {
      acl_shared_dict = file:/var/lib/dovecot/db/shared-mailboxes.db
    Check if such a file was created and is populated with new entries when you add ACLs from the mail client. As long as entries do not appear here, nothing can work.
  3. Enable debugging in the dovecot log or use the "debug" flag and check the ACLs for the user who should see a shared mailbox like this:
    doveadm acl debug -u [email protected] shared/users/box
    • Watch out for missing directories
    • Watch out for permission issues
    • Watch out for strangely created paths this could hint a misconfigured namespace prefix

The damage of one second

Update: According to the AWS status page the incident was a problem related to BGP route leaking. AWS does not hint on a leap second related incident as originally suggested by this post!

Tonight we had another leap second and not without suffering at the same time. At the end of the post you can find two screenshots of outages suggested by The screenshots were taken shortly after midnight UTC and you can easily spot those sites with problems by the disting peak at the right site of the graph.

AWS Outage

What is common to many of the affected sites: them being hosted at AWS which had some problems.

[RESOLVED] Internet connectivity issues

Between 5:25 PM and 6:07 PM PDT we experienced an Internet connectivity issue with a provider outside of our network which affected traffic from some end-user networks. The issue has been resolved and the service is operating normally.

The root cause of this issue was an external Internet service provider incorrectly accepting a set of routes for some AWS addresses from a third-party who inadvertently advertised these routes. Providers should normally reject these routes by policy, but in this case the routes were accepted and propagated to other ISPs affecting some end-user’s ability to access AWS resources. Once we identified the provider and third-party network, we took action to route traffic around this incorrect routing configuration. We have worked with this external Internet service provider to ensure that this does not reoccur.

Incident Details

Graphs from

Note that those graphs indicate user reported issues:

Using the memcached telnet interface

This is a short summary of everything important that helps to inspect a running memcached instance. You need to know that memcached requires you to connect to it via telnet. The following post describes the usage of this interface.

How To Connect

Use "ps -ef" to find out which IP and port was passed when memcached was started and use the same with telnet to connect to memcache. Example:

telnet 11211

Supported Commands

The supported commands (the official ones and some unofficial) are documented in the doc/protocol.txt document.

Sadly the syntax description isn't really clear and a simple help command listing the existing commands would be much better. Here is an overview of the commands you can find in the source (as of 16.12.2008):

Command Description Example
get Reads a value get mykey
set Set a key unconditionally set mykey 0 60 5
add Add a new key add newkey 0 60 5
replace Overwrite existing key replace key 0 60 5
append Append data to existing key append key 0 60 15
prepend Prepend data to existing key prepend key 0 60 15
incr Increments numerical key value by given number incr mykey 2
decr Decrements numerical key value by given number decr mykey 5
delete Deletes an existing key delete mykey
flush_all Invalidate specific items immediately flush_all
Invalidate all items in n seconds flush_all 900
stats Prints general statistics stats
Prints memory statistics stats slabs
Prints memory statistics stats malloc
Print higher level allocation statistics stats items
stats detail
stats sizes
Resets statistics stats reset
version Prints server version. version
verbosity Increases log level verbosity
quit Terminate telnet session quit

Traffic Statistics

You can query the current traffic statistics using the command

You will get a listing which serves the number of connections, bytes in/out and much more.

Example Output:

STAT pid 14868
STAT uptime 175931
STAT time 1220540125
STAT version 1.2.2
STAT pointer_size 32
STAT rusage_user 620.299700
STAT rusage_system 1545.703017
STAT curr_items 228
STAT total_items 779
STAT bytes 15525
STAT curr_connections 92
STAT total_connections 1740
STAT connection_structures 165
STAT cmd_get 7411
STAT cmd_set 28445156
STAT get_hits 5183
STAT get_misses 2228
STAT evictions 0
STAT bytes_read 2112768087
STAT bytes_written 1000038245
STAT limit_maxbytes 52428800
STAT threads 1

Memory Statistics

You can query the current memory statistics using

stats slabs

Example Output:

STAT 1:chunk_size 80
STAT 1:chunks_per_page 13107
STAT 1:total_pages 1
STAT 1:total_chunks 13107
STAT 1:used_chunks 13106
STAT 1:free_chunks 1
STAT 1:free_chunks_end 12886
STAT 2:chunk_size 100
STAT 2:chunks_per_page 10485
STAT 2:total_pages 1
STAT 2:total_chunks 10485
STAT 2:used_chunks 10484
STAT 2:free_chunks 1
STAT 2:free_chunks_end 10477
STAT active_slabs 3
STAT total_malloced 3145436

If you are unsure if you have enough memory for your memcached instance always look out for the "evictions" counters given by the "stats" command. If you have enough memory for the instance the "evictions" counter should be 0 or at least not increasing.

Which Keys Are Used?

There is no builtin function to directly determine the current set of keys. However you can use the

stats items
command to determine how many keys do exist.
stats items
STAT items:1:number 220
STAT items:1:age 83095
STAT items:2:number 7
STAT items:2:age 1405
This at least helps to see if any keys are used. To dump the key names from a PHP script that already does the memcache access you can use the PHP code from

Puppet Solve Invalid byte sequence in US-ASCII

When you run "puppet agent" and get
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: invalid byte 
sequence in US-ASCII at /etc/puppet/modules/vendor/
or run "puppet apply" and get
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not 
parse for environment production: invalid byte sequence in US-ASCII at /etc/puppet/manifests/someclass.pp:1
then the root case is probably the currently configured locale. Check the effective Ruby locale with
ruby -e 'puts Encoding.default_external'
Ensure that it returns a UTF-8 capable locale, if needed set it and rerun Puppet:
export LANG=de_DE.utf-8
export LC_ALL=de_DE.utf-8


If you ever need to get some meaningful facts in a possible Redis vs memcached discussion you might want to benchmark both on your target system.

While Redis brings a tool redis-benchmark, memcached doesn't. But Redis author Salvatore Sanfilippo ported the Redis benchmark to memcached! So it is possible to measure quite similar metrics using the same math and result summaries for both key value stores.

Benchmarking Redis

So setup Redis in cluster mode, master/slave, whatever you like and run the Redis benchmark
apt-get install redis-tools	# available starting with Wheezy backports
redis-benchmark -h <host>

Benchmarking Memcached

And do the same for memcache by compiling the memcached port of the benchmark
apt-get install build-essentials libevent-dev
git clone
cd mc-benchmark.git
and running it with
./mc-benchmark -h <host>
The benchmark output has the same structure, with more output in the Redis version compared to the memcached variant as each command type is tested and the Redis protocol knows many more commands.


After a friend of mine suggested reading "The things you need to know to do web development I felt the need to compile a solution index for the experiences described. In this interesting blog post describes his view of the typical learning curve of a web developer and the tools, solutions and concepts he discovers when he becomes a successful developer.

I do not want to summarize the post but I wanted to compile a list of those solutions and concepts affecting the life of a web developer.

Markup Standards Knowledge HTML, CSS, JSON, YAML
Web Stack Layering Basic knowledge about
  • Using TCP as transport protocol
  • Using HTTP as application protocol
  • Using SSL to encrypt the application layer with HTTPS
  • Using SSL certificates to proof identity for websites
  • Using (X)HTML for the application layer
  • Using DOM to access/manipulate application objects
Web Development Concepts
  • 3 tier server architecture
  • Distinction of static/dynamic content
  • Asynchronous CSS, JS loading
  • Asynchronous networking with Ajax
  • CSS box models
  • CSS Media Queries
  • Content Delivery Networks
  • UX, Usability...
  • Responsive Design
  • Page Speed Optimization
  • HTTP/HTTPS content mixing
  • Cross domain content mixing
  • MIME types
  • API-Pattern: RPC, SOAP, REST
  • Localization and Internationalization
Developer Infrastructure
  • Code Version Repo: usually Git. Hosted or self-hosted e.g. gitlab
  • Continuous Integration: Jenkins, Travis
  • Deployment: Jenkins, Travis, fabric, Bamboo, CruiseControl
Frontend JS Frameworks Mandatory knowledge in jQuery as well as knowing one or more JS frameworks as

Bootstrap, Foundation, React, Angular, Ember, Backbone, Prototype, GWT, YUI
Localization and Internationalization Frontend: usually in JS lib e.g. LocalePlanet or Globalize

Backend: often in gettext or a similar mechanism
Precompiling Resources For Javascript: Minify

For CSS: For Images: ImageMagick

Test everything with Google PageSpeed Insights
Backend Frameworks By language
  • PHP: CakePHP, CodeIgniter, Symfony, Seagull, Zend, Yii (choose one)
  • Python: Django, Tornado, Pylons, Zope, Bottle (choose one)
  • Ruby: Rails, Merb, Camping, Ramaze (choose one)
Web Server Solutions nginx, Apache

For loadbalancing: nginx, haproxy

As PHP webserver: nginx+PHPFPM
RDBMS MySQL (maybe Percona, MariaDB), Postgres
Caching/NoSQL Without replication: memcached, memcachedb, Redis

With replication: Redis, Couchbase, MongoDB, Cassandra

Good comparisons: #1 #2
Hosting If you are unsure about self-hosting vs. cloud hosting have a look at the Cloud Calculator.
Blogs Do not try to self-host blogs. You will fail on keeping them secure and up-to-date and sooner or later they are hacked. Start with a blog hoster right from the start: Choose provider


Most chef recipes are about installing new software including all config files. Also if they are configuration recipes they usually overwrite the whole file and provide a completely recreated configuration. When you have used cfengine and puppet with augtool before you'll be missing the agile editing of config files.

In cfengine2...

You could write
{ home/.bashrc
   AppendIfNoSuchLine "alias rm='rm -i'"

While in puppet...

You'd have:
augeas { "sshd_config":
  context => "/files/etc/ssh/sshd_config",
  changes => [
    "set PermitRootLogin no",

Now how to do it in Chef?

Maybe I missed the correct way to do it until now (please comment if this is the case!) but there seems to be no way to use for example augtool with chef and there is no built-in cfengine like editing. The only way I've seen so far is to use Ruby as a scripting language to change files using the Ruby runtime or to use the Script ressource which allows running other interpreters like bash, csh, perl, python or ruby.

To use it you can define a block named like the interpreter you need and add a "code" attribute with a "here doc" operator (e.g. <<-EOT) describing the commands. Additionally you specify a working directory and a user for the script to be executed with. Example:
bash "some_commands" do
    user "root"
    cwd "/tmp"
    code <<-EOT
       echo "alias rm='rm -i'" >> /root/.bashrc
While it is not a one-liner statement as possible as in cfengine it is very flexible. The Script resource is widely used to perform ad-hoc source compilation and installations in the community codebooks, but we can also use it for standard file editing.

Finally to do conditional editing use not_if/only_if clauses at the end of the Script resource block.

Puppet Apply Only Specific Classes

If you want to apply Puppet changes in an selective manner you can run
puppet apply -t --tags Some::Class
on the client node to only run the single class named "Some::Class".

Why does this work? Because Puppet automatically creates tags for all classes you have. Ensure to upper-case all parts of the class name, because even if you actual Ruby class is "some::class" the Puppet tag will be "Some::Class".

Puppet Agent Noop Pitfalls

The puppet agent command has a --noop switch that allows you to perform a dry-run of your Puppet code.
puppet agent -t --noop
It doesn't change anything, it just tells you what it would change. More or less exact due to the nature of dependencies that might come into existance by runtime changes. But it is pretty helpful and all Puppet users I know use it from time to time.

Unexpected Things

But there are some unexpected things about the noop mode:
  1. A --noop run does trigger the report server.
  2. The --noop run rewrites the YAML state files in /var/lib/puppet
  3. And there is no state on the local machine that gives you the last "real" run result after you overwrite the state files with the --noop run.

Why might this be a problem?

Or the other way around: why Puppet think this is not a problem? Probably because Puppet as an automation tool should overwrite and the past state doesn't really matter. If you use PE or Puppet with PuppetDB or Foreman you have an reporting for past runs anyway, so no need to have a history on the Puppet client.

Why I still do not like it: it avoids having safe and simple local Nagios checks. Using the state YAML you might want to build a simple script checking for run errors. Because you might want a Nagios alert about all errors that appear. Or about hosts that did not run Puppet for quite some time (for example I wanted to disable Puppet on a server for some action and forgot to reenable). Such a check reports false positives each time someone does a --noop run until the next normal run. This hides errors.

Of course you can build all this with cool Devops style SQL/REST/... queries to PuppetDB/Foreman, but checking state locally seems a bit more the old-style robust and simpler sysadmin way. Actively asking the Puppet master or report server for the client state seems wrong. The client should know too.

From a software usability perspective I do not expect a tool do change it's state when I pass --noop. It's unexpected. Of course the documentation is carefull phrased:
Use 'noop' mode where the daemon runs in a no-op or dry-run mode. This is useful for seeing what changes Puppet will make without actually executing the changes.

Getting rid of Bash Ctrl-R

Today was a good day, as I stumbled over this post (at hinting on the following bash key bindings:
bind '"\e[A":history-search-backward'
bind '"\e[B":history-search-forward'
It changes the behaviour of the up and down cursor keys to not go blindly through the history but only through items matching the current prompt. Of course at the disadvantage of having to clear the line to go through the full history. But as this can be achieved by a Ctrl-C at any time it is still preferrable to Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R ....

How to Munin Graph JVM Memory Usage with Ubuntu tomcat

The following description works when using the Ubuntu "tomcat7" package:

Grab the "java/jstat__heap" plugin from munin-contrib @ github and place it into "/usr/share/munin/plugins/jstat__heap".

Link the plugin into /etc/munin/plugins
ln -s /usr/share/munin/plugins/jstat__heap /etc/munin/plugins/jstat_myname_heap
Choose some useful name instead of "myname". This allows to monitor multiple JVM setups.

Configure each link you created in for example a new plugin config file named "/etc/munin/plugin-conf.d/jstat" which should contain one section per JVM looking like this
user tomcat7
env.pidfilepath /var/run/
env.javahome /usr/

PHP preg_replace() Examples

This post gives some simple examples for using regular expressions with preg_replace() in PHP scripts.

1. Syntax of preg_replace

While full syntax is
mixed preg_replace ( mixed $pattern , mixed 
$replacement , mixed $subject [, int $limit = -1 [, int &$count ]] )

2. Simple Replacing with preg_replace()

$result = preg_replace('/abc/', 'def', $string);   # Replace all 'abc' with 'def'
$result = preg_replace('/abc/i', 'def', $string);  # Replace with case insensitive matching
$result = preg_replace('/\s+/', '', $string);      # Strip all whitespaces

3. Advanced Usage of preg_replace()

Multiple replacements:

$result = preg_replace(
    array('/pattern1/', '/pattern2/'),
    array('replace1', 'replace2'),

Replacement Back References:

$result = preg_replace('/abc(def)hij/', '/\\1/', $string);
$result = preg_replace('/abc(def)hij/', '/$1/', $string);
$result = preg_replace('/abc(def)hij/', '/${1}/', $string);

Do only a finite number of replacements:

# Perform maximum of 5 replacements
$result = preg_replace('/abc/', 'def', $string, -1, 5);

Multi-line replacement

# Strip HTML tag
$result = preg_replace('#.*#m', '', $string);

Simple Chef to Nagios Hostgroup Export

When you are automatizing with chef and use plain Nagios for monitoring you will find duplication quite some configuration. One large part is the hostgroup definitions which usually map many of the chef roles. So if the roles are defined in chef anyway they should be sync'ed to Nagios.

Using "knife" one can extract the roles of a node like this
knife node show -a roles $node | grep -v "^roles:"

Scripting The Role Dumping

Note though that knife only shows roles that were applied on the server already. But this shouldn't be a big problem for a synchronization solution. Next step is to create a usable hostgroup definition in Nagios. To avoid colliding with existing hostgroups let's prefix the generated hostgroup names with "chef-". The only challenge is the regrouping of the role lists given per node by chef into host name lists per role. In Bash 4 using an fancy hash this could be done like this:
declare -A roles

for node in $(knife node list); do for role in $(knife node show -a roles $i |grep -v "roles" ); do roles["$role"]=${roles["$role"]}"$i " done done
Given this it is easy to dump Icinga hostgroup definitions. For example
for role in ${!roles[*]}; do
   echo "define hostgroup {
   hostgroup_name chef-$role
   members ${roles[$role]}
That makes ~15 lines of shell script and a cronjob entry to integrate Chef with Nagios. Of course you also need to ensure that each host name provided by chef has a Nagios host definition. If you know how it resolves you could just dump a host definition while looping over the host list. In any case there is no excuse not to export the chef config :-)

Easy Migrating

Migrating to such an export is easy by using the "chef-" namespace prefix for generated hostgroups. This allows you to smoothly migrate existing Nagions definitions at your own pace. Be sure to only reload Nagios and not restart via cron and to do it at reasonable time to avoid breaking things.

Sharing Screen With Multiple Users

How to detect screen sessions of other users:

screen -ls <user name>/

How to open screen to other users:

  1. Ctrl-A :multiuser on
  2. Ctrl-A :acladd <user to grant access>

Attach to other users screen session:

With session name
screen -x <user name>/<session name>
With PID and tty
screen -x <user name>/<pid>.<ptty>.<host>

Screen tmux Cheat Sheet

Here is a side by side comparison of screen and tmux commands and hotkeys.
Function Screen tmux
Start instance screen screen -S <name> tmux
Attach to instance screen -r <name> screen -x <name> tmux attach
List instances screen -ls screen -ls <user name>/ tmux ls
New Window ^a c ^b c
Switch Window ^a n ^a p ^b n ^b p
List Windows ^a " ^b w
Name Window ^a A ^b ,
Split Horizontal ^a S ^b "
Split Vertical ^a | ^b %
Switch Pane ^a Tab ^b o
Kill Pane ^a x ^b x
Paging ^b PgUp ^b PgDown
Scrolling Mode ^a [ ^b [