Chef

How to dry-run with chef-client

The answer is simple: do not "dry-run", do "why-run"!

chef-client --why-run
chef-client -W

And the output looks nicer when using "-Fmin"

chef-client -Fmin -W

As with all other automation tools, the dry-run mode is not very predictive. Still it might indicate some of the things that will happen.

Simple Chef to Nagios Hostgroup Export

When you are automatizing with chef and use plain Nagios for monitoring you will find duplication quite some configuration. One large part is the hostgroup definitions which usually map many of the chef roles. So if the roles are defined in chef anyway they should be sync'ed to Nagios.

Using "knife" one can extract the roles of a node like this

knife node show -a roles $node | grep -v "^roles:"

Scripting The Role Dumping

Note though that knife only shows roles that were applied on the server already. But this shouldn't be a big problem for a synchronization solution. Next step is to create a usable hostgroup definition in Nagios. To avoid colliding with existing hostgroups let's prefix the generated hostgroup names with "chef-". The only challenge is the regrouping of the role lists given per node by chef into host name lists per role. In Bash 4 using an fancy hash this could be done like this:

declare -A roles

for node in $(knife node list); do
   for role in $(knife node show -a roles $i |grep -v "roles" ); do
      roles["$role"]=${roles["$role"]}"$i "
   done
done

Given this it is easy to dump Icinga hostgroup definitions. For example

for role in ${!roles[*]}; do
   echo "define hostgroup {
   hostgroup_name chef-$role
   members ${roles[$role]}
}
"
done

That makes ~15 lines of shell script and a cronjob entry to integrate Chef with Nagios. Of course you also need to ensure that each host name provided by chef has a Nagios host definition. If you know how it resolves you could just dump a host definition while looping over the host list. In any case there is no excuse not to export the chef config :-)

Easy Migrating

Migrating to such an export is easy by using the "chef-" namespace prefix for generated hostgroups. This allows you to smoothly migrate existing Nagions definitions at your own pace. Be sure to only reload Nagios and not restart via cron and to do it at reasonable time to avoid breaking things.

Chef: How To Debug Active Attributes

If you experience problems with attribute inheritance on a chef client and watch the chef-client output without knowing what attributes are effective you can either look at the chef GUI or do the same on console using "shef" or in "chef-shell" in newer chef releases.

So run

chef-shell -z

The "-z" is important to get chef-shell to load the currently active run list for the node that a "chef-client" run would use.

Then enter "attributes" to switch to attribute mode

chef > attributes
chef:attributes >

and query anything you like by specifying the attribute path as you do in recipes:

chef:attributes > default["authorized_keys"]
[...]
chef:attributes > node["packages"]
[...]

By just querying for "node" you get a full dump of all attributes.

Missing Roles in "knife node show" Output

Sometimes the knife output can be really confusing:

$ knife node show myserver
Node Name:   myserver1
Environment: _default
FQDN:        myserver1
IP:          
Run List:    role[base], role[mysql], role[apache]
Roles:       base, nrpe, mysql
Recipes:     [...]
Platform:    ubuntu 12.04
Tags:        

Noticed the difference in "Run List" and "Roles"? The run list says "role[apache]", but the list of "Roles" has no Apache. This is because of the role not yet being run on the server. So a

ssh root@myserver chef-client

Solves the issue and Apache appears in the roles list.

The learning: do not use "knife node show" to get the list of configured roles!

Solving 100% CPU usage of Chef beam.smp (RabbitMQ)

Search for chef 100% cpu issue and you will find a lot of sugestions ranging from reboot the server, to restart RabbitMQ and often to check the kernel max file limit.

All of those do not help! What does help is checking RabbitMQ with

rabbitmqctl report | grep -A3 file_descriptors

and have a look at the printed limits and usage. Here is an example:

 {file_descriptors,[{total_limit,8900},
                    {total_used,1028},
                    {sockets_limit,8008},
                    {sockets_used,2}]},

In my case the 100% CPU usage was caused by all of the file handles being used up which for some reason causes RabbitMQ 2.8.4 to go into a crazy endless loop rarely responding at all.

The "total_limit" value is the "nofile" limit for the maximum number of open files you can check using "ulimit -n" as RabbitMQ user. Increase it permanently by defining a RabbitMQ specific limit for example in /etc/security/limits.d/rabbitmq.conf:

rabbitmq    soft   nofile   10000

or using for example

ulimit -n 10000

from the start script or login scripts. Then restart RabbitMQ. The CPU usage should be gone.

Update: This problem only affects RabbitMQ releases up to 1.8.4 and should be fixed starting with 1.8.5.

chef-server "Failed to authenticate."

If your chef GUI suddenly stops working and you see something like the following exception in both server.log and server-webui.log:

merb : chef-server (api) : worker (port 4000) ~ Failed to authenticate. Ensure that your client key is valid. - (Merb::ControllerExceptions::Unauthorized)
/usr/share/chef-server-api/app/controllers/application.rb:56:in `authenticate_every'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:352:in `send'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:352:in `_call_filters'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:344:in `each'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:344:in `_call_filters'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:286:in `_dispatch'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:284:in `catch'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:284:in `_dispatch'
[...]

Then try stopping all chef processes, remove

/etc/chef/webui.pem
/etc/chef/validation.pem

and start everything again. It will regenerate the keys.

The downside is that you have to

knife configure -i

all you knife setup locations again.

How-to Write a Chef Recipe for Editing Config Files

Most chef recipes are about installing new software including all config files. Also if they are configuration recipes they usually overwrite the whole file and provide a completely recreated configuration. When you have used cfengine and puppet with augtool before you'll miss a possibility to edit files.

In cfengine2...

You could write

editfiles:
{ home/.bashrc
   AppendIfNoSuchLine "alias rm='rm -i'"
}

While in puppet...

You'd have:

augeas { "sshd_config":
  context => "/files/etc/ssh/sshd_config",
  changes => [
    "set PermitRootLogin no",
  ],
}

Now how to do it in Chef?

Maybe I missed the correct way to do it until now (please comment if this is the case!) but there seems to be no way to use for example augtool with chef and there is no built-in cfengine like editing. The only way I've seen so far is to use Ruby as a scripting language to change files using the Ruby runtime or to use the Script ressource which allows running other interpreters like bash, csh, perl, python or ruby.

To use it you can define a block named like the interpreter you need and add a "code" attribute with a "here doc" operator (e.g. <<-EOT) describing the commands. Additionally you specify a working directory and a user for the script to be executed with. Example:

bash "some_commands" do
    user "root"
    cwd "/tmp"
    code <<-EOT
       echo "alias rm='rm -i'" >> /root/.bashrc
    EOT
end

While it is not a one-liner statement as possible as in cfengine it is very flexible. The Script resource is widely used to perform ad-hoc source compilation and installations in the community codebooks, but we can also use it for standard file editing.

Finally to do conditional editing use not_if/only_if clauses at the end of the Script resource block.

Syndicate content