Automation

How to dry-run with chef-client

The answer is simple: do not "dry-run", do "why-run"!

chef-client --why-run
chef-client -W

And the output looks nicer when using "-Fmin"

chef-client -Fmin -W

As with all other automation tools, the dry-run mode is not very predictive. Still it might indicate some of the things that will happen.

Chef Gets Push in Q1/2014

Sysadvent features a puppetlabs sponsered article (yes, honestly, check the bottom of the page!) about chef enterprise getting push support. It is supposed to be included in the open source release in Q1/2014.

With this change you can use a push jobs cookbook to define jobs and an extended "knife" with new commands to start and query about jobs:

knife job start ...
knife job list

and

knife node status ...

will tell about job execution status on the remote node.

At a first glance it seems nice. Then again I feel worried when this is intended to get rid of SSH keys. Why do we need to get rid of them exactly? And in exchange for what?

Simple Chef to Nagios Hostgroup Export

When you are automatizing with chef and use plain Nagios for monitoring you will find duplication quite some configuration. One large part is the hostgroup definitions which usually map many of the chef roles. So if the roles are defined in chef anyway they should be sync'ed to Nagios.

Using "knife" one can extract the roles of a node like this

knife node show -a roles $node | grep -v "^roles:"

Scripting The Role Dumping

Note though that knife only shows roles that were applied on the server already. But this shouldn't be a big problem for a synchronization solution. Next step is to create a usable hostgroup definition in Nagios. To avoid colliding with existing hostgroups let's prefix the generated hostgroup names with "chef-". The only challenge is the regrouping of the role lists given per node by chef into host name lists per role. In Bash 4 using an fancy hash this could be done like this:

declare -A roles

for node in $(knife node list); do
   for role in $(knife node show -a roles $i |grep -v "roles" ); do
      roles["$role"]=${roles["$role"]}"$i "
   done
done

Given this it is easy to dump Icinga hostgroup definitions. For example

for role in ${!roles[*]}; do
   echo "define hostgroup {
   hostgroup_name chef-$role
   members ${roles[$role]}
}
"
done

That makes ~15 lines of shell script and a cronjob entry to integrate Chef with Nagios. Of course you also need to ensure that each host name provided by chef has a Nagios host definition. If you know how it resolves you could just dump a host definition while looping over the host list. In any case there is no excuse not to export the chef config :-)

Easy Migrating

Migrating to such an export is easy by using the "chef-" namespace prefix for generated hostgroups. This allows you to smoothly migrate existing Nagions definitions at your own pace. Be sure to only reload Nagios and not restart via cron and to do it at reasonable time to avoid breaking things.

Chef: How To Debug Active Attributes

If you experience problems with attribute inheritance on a chef client and watch the chef-client output without knowing what attributes are effective you can either look at the chef GUI or do the same on console using "shef" or in "chef-shell" in newer chef releases.

So run

chef-shell -z

The "-z" is important to get chef-shell to load the currently active run list for the node that a "chef-client" run would use.

Then enter "attributes" to switch to attribute mode

chef > attributes
chef:attributes >

and query anything you like by specifying the attribute path as you do in recipes:

chef:attributes > default["authorized_keys"]
[...]
chef:attributes > node["packages"]
[...]

By just querying for "node" you get a full dump of all attributes.

Solving chef-client Errors

Problem:

merb : chef-server (api) : worker (port 4000) ~ Connection refused - connect(2) - (Errno::ECONNREFUSED)

Solution: Check why solr is not running and start it

/etc/init.d/chef-solr start

Problem:

merb : chef-server (api) : worker (port 4000) ~ Net::HTTPFatalError: 503 "Service Unavailable" - (Chef::Exceptions::SolrConnectionError)

Solution: You need to check solr log for error. You can find

  • the access log in /var/log/chef/2013_03_01.jetty.log (adapt the date)
  • the solr error log in /var/log/chef/solr.log

Hopefully you find an error trace there.


Problem:

# chef-expander -n 1
/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require': cannot load such file -- http11_client (LoadError)
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/vendor_ruby/em-http.rb:8:in `'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/vendor_ruby/em-http-request.rb:1:in `'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/vendor_ruby/chef/expander/solrizer.rb:24:in `'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/vendor_ruby/chef/expander/vnode.rb:26:in `'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/vendor_ruby/chef/expander/vnode_supervisor.rb:28:in `'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/vendor_ruby/chef/expander/cluster_supervisor.rb:25:in `'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
        from /usr/bin/chef-expander:27:in `
'

Solution: This is a gems dependency issue with the HTTP client gem. Read about it here: http://tickets.opscode.com/browse/CHEF-3495. You might want to check the active Ruby version you have on your system e.g. on Debian run

update-alternatives --config ruby

to find out and change it. Note that the emhttp package from Opscode might require a special version. You can check by listing the package files:

dpkg -L libem-http-request-ruby
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/libem-http-request-ruby
/usr/share/doc/libem-http-request-ruby/changelog.Debian.gz
/usr/share/doc/libem-http-request-ruby/copyright
/usr/lib
/usr/lib/ruby
/usr/lib/ruby/vendor_ruby
/usr/lib/ruby/vendor_ruby/em-http.rb
/usr/lib/ruby/vendor_ruby/em-http-request.rb
/usr/lib/ruby/vendor_ruby/em-http
/usr/lib/ruby/vendor_ruby/em-http/http_options.rb
/usr/lib/ruby/vendor_ruby/em-http/http_header.rb
/usr/lib/ruby/vendor_ruby/em-http/client.rb
/usr/lib/ruby/vendor_ruby/em-http/http_encoding.rb
/usr/lib/ruby/vendor_ruby/em-http/multi.rb
/usr/lib/ruby/vendor_ruby/em-http/core_ext
/usr/lib/ruby/vendor_ruby/em-http/core_ext/bytesize.rb
/usr/lib/ruby/vendor_ruby/em-http/mock.rb
/usr/lib/ruby/vendor_ruby/em-http/decoders.rb
/usr/lib/ruby/vendor_ruby/em-http/version.rb
/usr/lib/ruby/vendor_ruby/em-http/request.rb
/usr/lib/ruby/vendor_ruby/1.8
/usr/lib/ruby/vendor_ruby/1.8/x86_64-linux
/usr/lib/ruby/vendor_ruby/1.8/x86_64-linux/em_buffer.so
/usr/lib/ruby/vendor_ruby/1.8/x86_64-linux/http11_client.so

The listing above for example indicates ruby1.8.

Solving 100% CPU usage of Chef beam.smp (RabbitMQ)

Search for chef 100% cpu issue and you will find a lot of sugestions ranging from reboot the server, to restart RabbitMQ and often to check the kernel max file limit.

All of those do not help! What does help is checking RabbitMQ with

rabbitmqctl report | grep -A3 file_descriptors

and have a look at the printed limits and usage. Here is an example:

 {file_descriptors,[{total_limit,8900},
                    {total_used,1028},
                    {sockets_limit,8008},
                    {sockets_used,2}]},

In my case the 100% CPU usage was caused by all of the file handles being used up which for some reason causes RabbitMQ 2.8.4 to go into a crazy endless loop rarely responding at all.

The "total_limit" value is the "nofile" limit for the maximum number of open files you can check using "ulimit -n" as RabbitMQ user. Increase it permanently by defining a RabbitMQ specific limit for example in /etc/security/limits.d/rabbitmq.conf:

rabbitmq    soft   nofile   10000

or using for example

ulimit -n 10000

from the start script or login scripts. Then restart RabbitMQ. The CPU usage should be gone.

Update: This problem only affects RabbitMQ releases up to 1.8.4 and should be fixed starting with 1.8.5.

chef-server "Failed to authenticate."

If your chef GUI suddenly stops working and you see something like the following exception in both server.log and server-webui.log:

merb : chef-server (api) : worker (port 4000) ~ Failed to authenticate. Ensure that your client key is valid. - (Merb::ControllerExceptions::Unauthorized)
/usr/share/chef-server-api/app/controllers/application.rb:56:in `authenticate_every'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:352:in `send'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:352:in `_call_filters'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:344:in `each'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:344:in `_call_filters'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:286:in `_dispatch'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:284:in `catch'
/usr/lib/ruby/vendor_ruby/merb-core/controller/abstract_controller.rb:284:in `_dispatch'
[...]

Then try stopping all chef processes, remove

/etc/chef/webui.pem
/etc/chef/validation.pem

and start everything again. It will regenerate the keys.

The downside is that you have to

knife configure -i

all you knife setup locations again.

Chef: Which nodes have role X / recipe Y

Why is it so hard to find out which nodes have a given role or recipe in chef?

The only way seems to loop yourself:

for node in $(knife node list); do
   if knife node show -r $node | grep 'role\[base\]' >/dev/null; then       
     echo $node;
   fi;
done

Did I miss some other obvious way? I'd like to have some "knife run_list filter ..." command!

How-to Write a Chef Recipe for Editing Config Files

Most chef recipes are about installing new software including all config files. Also if they are configuration recipes they usually overwrite the whole file and provide a completely recreated configuration. When you have used cfengine and puppet with augtool before you'll miss a possibility to edit files.

In cfengine2...

You could write

editfiles:
{ home/.bashrc
   AppendIfNoSuchLine "alias rm='rm -i'"
}

While in puppet...

You'd have:

augeas { "sshd_config":
  context => "/files/etc/ssh/sshd_config",
  changes => [
    "set PermitRootLogin no",
  ],
}

Now how to do it in Chef?

Maybe I missed the correct way to do it until now (please comment if this is the case!) but there seems to be no way to use for example augtool with chef and there is no built-in cfengine like editing. The only way I've seen so far is to use Ruby as a scripting language to change files using the Ruby runtime or to use the Script ressource which allows running other interpreters like bash, csh, perl, python or ruby.

To use it you can define a block named like the interpreter you need and add a "code" attribute with a "here doc" operator (e.g. <<-EOT) describing the commands. Additionally you specify a working directory and a user for the script to be executed with. Example:

bash "some_commands" do
    user "root"
    cwd "/tmp"
    code <<-EOT
       echo "alias rm='rm -i'" >> /root/.bashrc
    EOT
end

While it is not a one-liner statement as possible as in cfengine it is very flexible. The Script resource is widely used to perform ad-hoc source compilation and installations in the community codebooks, but we can also use it for standard file editing.

Finally to do conditional editing use not_if/only_if clauses at the end of the Script resource block.

Get Recipes from the cfengine Design Center

Today I learned that the makers of cfengine have launched a "Design Center" which essentially is a git repository with recipes (called "sketches") for cfengine. This really helps learning the cfengine syntax and getting quick results. It also saves googling wildy for hints on how to do special stuff.

Beside just copy&pasting the sketches that cfengine guarantees to be runnable without modifications the design center wiki explains how to directly install sketches from the repository using cf-sketch.

Using cf-sketch you can search for installable recipes:

# cf-sketch --search utilities
Monitoring::nagios_plugin_agent /tmp/design-center/sketches/utilities/nagios_plugin_agent
[...]

...and install them:

# cf-sketch --install Monitoring::nagios_plugin_agent

cf-sketch itself is a Perl program that need to be set up separately by running

git clone https://github.com/cfengine/design-center/
cd design-center/tools/cf-sketch
make install
Syndicate content