Recent Posts

Usage Scenarios for Polscan

The generic sysadmin policy scanner for Debian based distros "Polscan" ( I wrote about recently is coming further along. Right now I am focussing on how to get it really useful in daily work with a lot of systems, which usually means a lot of findings. And the question is: how does the presentation of the findings help you with working on all of them?

For me there are roughly four scenarios when working with any sort of auditing tool or policy scanner.

Possible Scenarios

1. Everything under control

Scenario: That's the easy one. Your system automation is top notch there are no messy legacy systems, no hacks, no old construction places, no migrations. Everything is polished and when a new issue appears you automate it away, and 10min later it gets silentely applied on all your systems.

Presentation of Findings: You are in control, so you have a top level view, birds eye perspective. You spot abberations and tackle them. You can optically find the rogue policies/groups with a red number. And if there are none you work on reducing warnings, because you are bored. You spend most of your time in the summary view waiting for the auditor to present him with full compliance to everything he asks :-)

2. I'm swamped!

Scenario: You are afraid of adding more policies, as it would look even worse. You feel like you never will be able to get a clean system and at the same time your professional pride tells you have to get it under control!

Presentation of Findings: If there is no progress you do not need to try to fix anything. So it's most important to check for progress. What you care most about is the trend curve of all the findings. It gives you hope that one day all systems will be clean.

The problem here is that a ternary state OK/WARNING/FAILED does not cover how policies have different priorities. And that 2 findings out of all 500 might be absolutely critical, while 200 others are low impact issues. A trending curve does not show that you have fixed the 2 critical ones, but it nags you about not fixing all those 500 problems.

3. Let's improve something today

Scenario: It's like scenario #2, but with a positive psychological perspective. You do not care that there are a lot of issues, but you are highly motivated to solve some of them. You browse through the results intending to pick low hanging fruits and will eliminate them with your "Just do it" attitude.

Presentation of Findings: Skimming results is important. Statistics also are because you want to work on stuff that affects a lot of systems. You would like to see metrics of your progress instantly.

What works already

I personally usually find myself in scenario #2, but I know colleguages often have the spontanuous motivation and perspective of scenario #3. And I believe in a small startup company with only a few systems being the sole sysadmin you might find yourself in scenario #1 (happy you!).

With all three scenarios being realistic use cases I want them to work in polscan. Currently the main screen of polscan looks like this:

So how are the different scenarios supported already and where not?
  1. Scenario #1: "Everything under control"
    • Overview with drill down links is implemented
    • Well supported scenario
  2. Scenario #2: "I'm swamped!"
    • Overview has 30 days trending graph for critical findings
    • Policy/Group drill down result views also have the trending graph
    • Progress is easy to track
    • Overview has 'New' and 'Solved' tables giving delta statistics
    • 'New' and 'Solved' result drill-down is still missing
  3. Scenario #3: "Let's fix something"
    • The per-policy grouping in the overview allows tackling large blocks of findings.
    • No support yet to group hosts (e.g. with same security updates) to work on those
    • No instant feedback on achievements

What I'm working on

Next things to improve the scenarios:I guess I stop here as to much concept takes away implementation time!

Nonetheless if you've read through here I want to hear your opinion!
What is your use case? In which mode are you working and what do you need most?

Building a Generic Sysadmin Policy Scanner

After writing the same scripts several times I decided it is time for a generic solution to check Debian servers for configuration consistency. As incidents and mistakes happen each organization collects a set of learnings (let's call it policies) that should be followed in the future. And one important truth is that the free automation and CM tools we use (Chef, Puppet, Ansible, cfengine, Saltstack...) allow to implement policies, but do not seem to care much about proofing correct automation.

How to ensure following policies?

But how to really ensure following these policies? The only way is by checking them and revisiting the check results frequently. One could build a script and send a daily/weekly mail report. This is always a custom solution and that's what I did several times already. So I do it one final time, but this times in a generic way.

Generic Policy Scanning

For me a generic configuration consistency / policy scanner has at least the following requirements:
  1. Optional generic pre-defined policies
  2. Optional custom user-defined policies
  3. Policies checked locally on the host
  4. Policies checked from CM systems
  5. Per host/hostgroup policy enabling
  6. Generic discovery of your hosts
  7. Dynamic per policy/group/host result filtering
  8. Customizable mail reports
  9. Result archival for audits
  10. Some simple trending
  11. Daily diffs, New findings, Resolved Isses
  12. Acknowledging Findings
I started implementing a simple solution (entirely bash and SSH based, realizing requirements 1,2,3,4,6,7,9,10) with It is quite easy to setup by configuring the type of and you can run it instantly with the default set of policy scanners (which of course not necessarily all make sense for all type of systems).

Implemented Scanners

By setting up the results and the static HTML (instructions in in some webserver document root you can browse through the results.


Result overview:

Filter details:

Debugging hiera-eyaml Encryption, Decryption failed

When Hiera works without any problems everything is fine. But when not it is quite hard to debug why it is not working. Here is a troubleshooting list for Hiera when used with hiera-eyaml-gpg.

hiera-eyaml-gpg Decryption failed

First check your GPG key list
gpg --list-keys --homedir=<.gnupg dir>
Check that at least one of the keys listed is in the recipients you use for decrypting. The recipients you used are either listed in your Hiera/Eyaml config file or in a file referenced from there.

To verify what you active config is run eyaml in tracing mode. Note that the "-t" option is only available in newer Eyaml versions (e.g. 2.0.5):
eyaml decrypt -v -t -f somefile.yaml
Trace output
[hiera-eyaml-core]           (Symbol) trace_given        =        (TrueClass) true              
[hiera-eyaml-core]           (Symbol) gpg_always_trust   =       (FalseClass) false             
[hiera-eyaml-core]           (Symbol) trace              =        (TrueClass) true              
[hiera-eyaml-core]           (Symbol) encrypt_method     =           (String) pkcs7             
[hiera-eyaml-core]           (Symbol) gpg_gnupghome      =           (String) /etc/hiera/.gnupg      
[hiera-eyaml-core]           (Symbol) pkcs7_private_key  =           (String) ./keys/private_key.pkcs7.pem
[hiera-eyaml-core]           (Symbol) version            =       (FalseClass) false             
[hiera-eyaml-core]           (Symbol) gpg_gnupghome_given =        (TrueClass) true              
[hiera-eyaml-core]           (Symbol) help               =       (FalseClass) false             
[hiera-eyaml-core]           (Symbol) quiet              =       (FalseClass) false             
[hiera-eyaml-core]           (Symbol) gpg_recipients_file =           (String) ./gpg_recipients  
[hiera-eyaml-core]           (Symbol) string             =         (NilClass)                   
[hiera-eyaml-core]           (Symbol) file_given         =        (TrueClass) true   
Alternatively try manually enforcing recipients and .gnupg location to make it work.
eyaml decrypt -v -t -f somefile.yaml --gpg-recipients-file=<recipients> --gpg-gnupghome=<.gnupg dir>
If it works manually you might want to add the keys ":gpg-recipients-file:" to hiera.yaml and ensure that the mandatory key ":gpg-gnupghome:" is correct.

Checking Necessary Gems

hiera-eyaml-gpg can be run with different GPG-libraries depending on the version you run. Check dependencies on Github.

A possible stack is the following
gem list
gpgme (2.0.5)
hiera (1.3.2)
hiera-eyaml (2.0.1)
hiera-eyaml-gpg (0.4)
The GEM gpgme additionally needs the C library
dpkg -l "*gpg*"
||/ Name                Version             Beschreibung
ii  libgpgme11          1.2.0-1.2+deb6u1    GPGME - GnuPG Made Easy

Using Correct Ruby Version

Another pitfall is running multiple Ruby versions. Ensure to install the GEMs into the correct one. One Debian consider using "ruby-switch" or manually running "update-alternatives" for "gem" and "ruby".

Ruby Switch

apt-get install ruby-switch
ruby-switch --set ruby1.9.1


# Print available versions
update-alternatives --list ruby
update-alternatives --list gem

# Show current config update-alternatives --display ruby update-alternatives --display gem

# If necessary change it update-alternatives --set ruby /usr/bin/ruby1.9.1 update-alternatives --set gem /usr/bin/gem1.9.1

Debugging dovecot ACL Shared Mailboxes Not Showing in Thunderbird

When you can't get ACL shared mailboxes visible with Dovecot and Thunderbird here are some debugging tipps:
  1. Thunderbird fetches the ACLs on startup (and maybe at some other interval). So for testing restart Thunderbird on each change you make.
  2. Ensure the shared mailboxes index can be written. You probably have it configured like
    plugin {
      acl_shared_dict = file:/var/lib/dovecot/db/shared-mailboxes.db
    Check if such a file was created and is populated with new entries when you add ACLs from the mail client. As long as entries do not appear here, nothing can work.
  3. Enable debugging in the dovecot log or use the "debug" flag and check the ACLs for the user who should see a shared mailbox like this:
    doveadm acl debug -u [email protected] shared/users/box
    • Watch out for missing directories
    • Watch out for permission issues
    • Watch out for strangely created paths this could hint a misconfigured namespace prefix

The damage of one second

Update: According to the AWS status page the incident was a problem related to BGP route leaking. AWS does not hint on a leap second related incident as originally suggested by this post!

Tonight we had another leap second and not without suffering at the same time. At the end of the post you can find two screenshots of outages suggested by The screenshots were taken shortly after midnight UTC and you can easily spot those sites with problems by the disting peak at the right site of the graph.

AWS Outage

What is common to many of the affected sites: them being hosted at AWS which had some problems.

[RESOLVED] Internet connectivity issues

Between 5:25 PM and 6:07 PM PDT we experienced an Internet connectivity issue with a provider outside of our network which affected traffic from some end-user networks. The issue has been resolved and the service is operating normally.

The root cause of this issue was an external Internet service provider incorrectly accepting a set of routes for some AWS addresses from a third-party who inadvertently advertised these routes. Providers should normally reject these routes by policy, but in this case the routes were accepted and propagated to other ISPs affecting some end-user’s ability to access AWS resources. Once we identified the provider and third-party network, we took action to route traffic around this incorrect routing configuration. We have worked with this external Internet service provider to ensure that this does not reoccur.

Incident Details

Graphs from

Note that those graphs indicate user reported issues:

Using the memcached telnet interface

This is a short summary of everything important that helps to inspect a running memcached instance. You need to know that memcached requires you to connect to it via telnet. The following post describes the usage of this interface.

How To Connect

Use "ps -ef" to find out which IP and port was passed when memcached was started and use the same with telnet to connect to memcache. Example:

telnet 11211

Supported Commands

The supported commands (the official ones and some unofficial) are documented in the doc/protocol.txt document.

Sadly the syntax description isn't really clear and a simple help command listing the existing commands would be much better. Here is an overview of the commands you can find in the source (as of 16.12.2008):

Command Description Example
get Reads a value get mykey
set Set a key unconditionally set mykey 0 60 5
add Add a new key add newkey 0 60 5
replace Overwrite existing key replace key 0 60 5
append Append data to existing key append key 0 60 15
prepend Prepend data to existing key prepend key 0 60 15
incr Increments numerical key value by given number incr mykey 2
decr Decrements numerical key value by given number decr mykey 5
delete Deletes an existing key delete mykey
flush_all Invalidate specific items immediately flush_all
Invalidate all items in n seconds flush_all 900
stats Prints general statistics stats
Prints memory statistics stats slabs
Prints memory statistics stats malloc
Print higher level allocation statistics stats items
stats detail
stats sizes
Resets statistics stats reset
version Prints server version. version
verbosity Increases log level verbosity
quit Terminate telnet session quit

Traffic Statistics

You can query the current traffic statistics using the command

You will get a listing which serves the number of connections, bytes in/out and much more.

Example Output:

STAT pid 14868
STAT uptime 175931
STAT time 1220540125
STAT version 1.2.2
STAT pointer_size 32
STAT rusage_user 620.299700
STAT rusage_system 1545.703017
STAT curr_items 228
STAT total_items 779
STAT bytes 15525
STAT curr_connections 92
STAT total_connections 1740
STAT connection_structures 165
STAT cmd_get 7411
STAT cmd_set 28445156
STAT get_hits 5183
STAT get_misses 2228
STAT evictions 0
STAT bytes_read 2112768087
STAT bytes_written 1000038245
STAT limit_maxbytes 52428800
STAT threads 1

Memory Statistics

You can query the current memory statistics using

stats slabs

Example Output:

STAT 1:chunk_size 80
STAT 1:chunks_per_page 13107
STAT 1:total_pages 1
STAT 1:total_chunks 13107
STAT 1:used_chunks 13106
STAT 1:free_chunks 1
STAT 1:free_chunks_end 12886
STAT 2:chunk_size 100
STAT 2:chunks_per_page 10485
STAT 2:total_pages 1
STAT 2:total_chunks 10485
STAT 2:used_chunks 10484
STAT 2:free_chunks 1
STAT 2:free_chunks_end 10477
STAT active_slabs 3
STAT total_malloced 3145436

If you are unsure if you have enough memory for your memcached instance always look out for the "evictions" counters given by the "stats" command. If you have enough memory for the instance the "evictions" counter should be 0 or at least not increasing.

Which Keys Are Used?

There is no builtin function to directly determine the current set of keys. However you can use the

stats items
command to determine how many keys do exist.
stats items
STAT items:1:number 220
STAT items:1:age 83095
STAT items:2:number 7
STAT items:2:age 1405
This at least helps to see if any keys are used. To dump the key names from a PHP script that already does the memcache access you can use the PHP code from

Puppet Solve Invalid byte sequence in US-ASCII

When you run "puppet agent" and get
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: invalid byte 
sequence in US-ASCII at /etc/puppet/modules/vendor/
or run "puppet apply" and get
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not 
parse for environment production: invalid byte sequence in US-ASCII at /etc/puppet/manifests/someclass.pp:1
then the root case is probably the currently configured locale. Check the effective Ruby locale with
ruby -e 'puts Encoding.default_external'
Ensure that it returns a UTF-8 capable locale, if needed set it and rerun Puppet:
export LANG=de_DE.utf-8
export LC_ALL=de_DE.utf-8


If you ever need to get some meaningful facts in a possible Redis vs memcached discussion you might want to benchmark both on your target system.

While Redis brings a tool redis-benchmark, memcached doesn't. But Redis author Salvatore Sanfilippo ported the Redis benchmark to memcached! So it is possible to measure quite similar metrics using the same math and result summaries for both key value stores.

Benchmarking Redis

So setup Redis in cluster mode, master/slave, whatever you like and run the Redis benchmark
apt-get install redis-tools	# available starting with Wheezy backports
redis-benchmark -h <host>

Benchmarking Memcached

And do the same for memcache by compiling the memcached port of the benchmark
apt-get install build-essentials libevent-dev
git clone
cd mc-benchmark.git
and running it with
./mc-benchmark -h <host>
The benchmark output has the same structure, with more output in the Redis version compared to the memcached variant as each command type is tested and the Redis protocol knows many more commands.


After a friend of mine suggested reading "The things you need to know to do web development I felt the need to compile a solution index for the experiences described. In this interesting blog post describes his view of the typical learning curve of a web developer and the tools, solutions and concepts he discovers when he becomes a successful developer.

I do not want to summarize the post but I wanted to compile a list of those solutions and concepts affecting the life of a web developer.

Markup Standards Knowledge HTML, CSS, JSON, YAML
Web Stack Layering Basic knowledge about
  • Using TCP as transport protocol
  • Using HTTP as application protocol
  • Using SSL to encrypt the application layer with HTTPS
  • Using SSL certificates to proof identity for websites
  • Using (X)HTML for the application layer
  • Using DOM to access/manipulate application objects
Web Development Concepts
  • 3 tier server architecture
  • Distinction of static/dynamic content
  • Asynchronous CSS, JS loading
  • Asynchronous networking with Ajax
  • CSS box models
  • CSS Media Queries
  • Content Delivery Networks
  • UX, Usability...
  • Responsive Design
  • Page Speed Optimization
  • HTTP/HTTPS content mixing
  • Cross domain content mixing
  • MIME types
  • API-Pattern: RPC, SOAP, REST
  • Localization and Internationalization
Developer Infrastructure
  • Code Version Repo: usually Git. Hosted or self-hosted e.g. gitlab
  • Continuous Integration: Jenkins, Travis
  • Deployment: Jenkins, Travis, fabric, Bamboo, CruiseControl
Frontend JS Frameworks Mandatory knowledge in jQuery as well as knowing one or more JS frameworks as

Bootstrap, Foundation, React, Angular, Ember, Backbone, Prototype, GWT, YUI
Localization and Internationalization Frontend: usually in JS lib e.g. LocalePlanet or Globalize

Backend: often in gettext or a similar mechanism
Precompiling Resources For Javascript: Minify

For CSS: For Images: ImageMagick

Test everything with Google PageSpeed Insights
Backend Frameworks By language
  • PHP: CakePHP, CodeIgniter, Symfony, Seagull, Zend, Yii (choose one)
  • Python: Django, Tornado, Pylons, Zope, Bottle (choose one)
  • Ruby: Rails, Merb, Camping, Ramaze (choose one)
Web Server Solutions nginx, Apache

For loadbalancing: nginx, haproxy

As PHP webserver: nginx+PHPFPM
RDBMS MySQL (maybe Percona, MariaDB), Postgres
Caching/NoSQL Without replication: memcached, memcachedb, Redis

With replication: Redis, Couchbase, MongoDB, Cassandra

Good comparisons: #1 #2
Hosting If you are unsure about self-hosting vs. cloud hosting have a look at the Cloud Calculator.
Blogs Do not try to self-host blogs. You will fail on keeping them secure and up-to-date and sooner or later they are hacked. Start with a blog hoster right from the start: Choose provider


Most chef recipes are about installing new software including all config files. Also if they are configuration recipes they usually overwrite the whole file and provide a completely recreated configuration. When you have used cfengine and puppet with augtool before you'll be missing the agile editing of config files.

In cfengine2...

You could write
{ home/.bashrc
   AppendIfNoSuchLine "alias rm='rm -i'"

While in puppet...

You'd have:
augeas { "sshd_config":
  context => "/files/etc/ssh/sshd_config",
  changes => [
    "set PermitRootLogin no",

Now how to do it in Chef?

Maybe I missed the correct way to do it until now (please comment if this is the case!) but there seems to be no way to use for example augtool with chef and there is no built-in cfengine like editing. The only way I've seen so far is to use Ruby as a scripting language to change files using the Ruby runtime or to use the Script ressource which allows running other interpreters like bash, csh, perl, python or ruby.

To use it you can define a block named like the interpreter you need and add a "code" attribute with a "here doc" operator (e.g. <<-EOT) describing the commands. Additionally you specify a working directory and a user for the script to be executed with. Example:
bash "some_commands" do
    user "root"
    cwd "/tmp"
    code <<-EOT
       echo "alias rm='rm -i'" >> /root/.bashrc
While it is not a one-liner statement as possible as in cfengine it is very flexible. The Script resource is widely used to perform ad-hoc source compilation and installations in the community codebooks, but we can also use it for standard file editing.

Finally to do conditional editing use not_if/only_if clauses at the end of the Script resource block.

Puppet Apply Only Specific Classes

If you want to apply Puppet changes in an selective manner you can run
puppet apply -t --tags Some::Class
on the client node to only run the single class named "Some::Class".

Why does this work? Because Puppet automatically creates tags for all classes you have. Ensure to upper-case all parts of the class name, because even if you actual Ruby class is "some::class" the Puppet tag will be "Some::Class".

Puppet Agent Noop Pitfalls

The puppet agent command has a --noop switch that allows you to perform a dry-run of your Puppet code.
puppet agent -t --noop
It doesn't change anything, it just tells you what it would change. More or less exact due to the nature of dependencies that might come into existance by runtime changes. But it is pretty helpful and all Puppet users I know use it from time to time.

Unexpected Things

But there are some unexpected things about the noop mode:
  1. A --noop run does trigger the report server.
  2. The --noop run rewrites the YAML state files in /var/lib/puppet
  3. And there is no state on the local machine that gives you the last "real" run result after you overwrite the state files with the --noop run.

Why might this be a problem?

Or the other way around: why Puppet think this is not a problem? Probably because Puppet as an automation tool should overwrite and the past state doesn't really matter. If you use PE or Puppet with PuppetDB or Foreman you have an reporting for past runs anyway, so no need to have a history on the Puppet client.

Why I still do not like it: it avoids having safe and simple local Nagios checks. Using the state YAML you might want to build a simple script checking for run errors. Because you might want a Nagios alert about all errors that appear. Or about hosts that did not run Puppet for quite some time (for example I wanted to disable Puppet on a server for some action and forgot to reenable). Such a check reports false positives each time someone does a --noop run until the next normal run. This hides errors.

Of course you can build all this with cool Devops style SQL/REST/... queries to PuppetDB/Foreman, but checking state locally seems a bit more the old-style robust and simpler sysadmin way. Actively asking the Puppet master or report server for the client state seems wrong. The client should know too.

From a software usability perspective I do not expect a tool do change it's state when I pass --noop. It's unexpected. Of course the documentation is carefull phrased:
Use 'noop' mode where the daemon runs in a no-op or dry-run mode. This is useful for seeing what changes Puppet will make without actually executing the changes.

Getting rid of Bash Ctrl-R

Today was a good day, as I stumbled over this post (at hinting on the following bash key bindings:
bind '"\e[A":history-search-backward'
bind '"\e[B":history-search-forward'
It changes the behaviour of the up and down cursor keys to not go blindly through the history but only through items matching the current prompt. Of course at the disadvantage of having to clear the line to go through the full history. But as this can be achieved by a Ctrl-C at any time it is still preferrable to Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R ....

Screen tmux Cheat Sheet

Here is a side by side comparison of screen and tmux commands and hotkeys.
Function Screen tmux
Start instance screen screen -S <name> tmux
Attach to instance screen -r <name> screen -x <name> tmux attach
List instances screen -ls screen -ls <user name>/ tmux ls
New Window ^a c ^b c
Switch Window ^a n ^a p ^b n ^b p
List Windows ^a " ^b w
Name Window ^a A ^b ,
Split Horizontal ^a S ^b "
Split Vertical ^a | ^b %
Switch Pane ^a Tab ^b o
Kill Pane ^a x ^b x
Paging ^b PgUp ^b PgDown
Scrolling Mode ^a [ ^b [

How to dry-run with chef-client

The answer is simple: do not "dry-run", do "why-run"!
chef-client --why-run
chef-client -W
And the output looks nicer when using "-Fmin"
chef-client -Fmin -W
As with all other automation tools, the dry-run mode is not very predictive. Still it might indicate some of the things that will happen.

Removing newlines with sed

My goal for today: I want to remember the official sed FAQ solution to replace multiple newlines:
sed ':a;N;$!ba;s/\n//g' file
to avoid spending a lot of time on it when I need it again.

Redis Performance Debugging

Here are some simple hints on debugging Redis performance issues.

Monitoring Live Redis Queries

Run the "monitor" command to see queries as they are sent against an Redis instance. Do not use on high traffic instance!
redis-cli monitor
The output looks like this
redis> MONITOR
1371241093.375324 "monitor"
1371241109.735725 "keys" "*"
1371241152.344504 "set" "testkey" "1"
1371241165.169184 "get" "testkey"

Analyzing Slow Commands

When there are too many queries better use "slowlog" to see the top slow queries running against your Redis instance:
slowlog get 25		# print top 25 slow queries
slowlog len		
slowlog reset

Debugging Latency

If you suspect latency to be an issue use "redis-cli" built-in support for latency measuring. First measure system latency on your Redis server with
redis-cli --intrinsic-latency 100
and then sample from your Redis clients with
redis-cli --latency -h <host> -p <port>
If you have problems with high latency check if transparent huge pages are disabled. Disable it with
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Check Background Save Settings

If your instance seemingly freezes peridiocally you probably have background dumping enabled.
grep ^save /etc/redis/redis.conf
Comment out all save lines and setup a cron job to do dumping or a Redis slave who can dump whenever he wants to.

Alternatively you can try to mitigate the effect using the "no-appendfsync-on-rewrite" option (set to "yes") in redis.conf.

Check fsync Setting

Per default Redis runs fsync() every 1s. Other possibilities are "always" and "no".
grep ^appendfsync /etc/redis/redis.conf
So if you do not care about DB corruption you might want to set "no" here.