Puppet: System Administration Automated

Closing the first DTrace loop


I ended up spending what time I had for a couple of days at LCA optimizing Puppet's lexer. As I mentioned in my first DTrace post, I'm mostly just trying to use existing scripts for now and worry about understanding the darn thing later.

It turns out that the Swiss Army Knife of available scripts for Ruby is the rb_calltime.d script in the DTraceToolkit. This one gives us pretty much everything we might want to know in a first pass of debugging a Ruby program, including (most importantly for me) the inclusive and exclusive elapsed times for each method in the system (DTrace insists upon calling them functions, but it's Ruby, so we know better).

Unfortunately, I'd already hacked out most of the optimization before I discovered this script, and it doesn't seem to want to run at the moment, so all I can do is show the current data. Here are the methods that take the most time inclusively (meaning that it counts the time from entry to exit, and thus just tells us where the time is being spent in the overall program, not where the actual problems might lie):

lexer.rb             func       Puppet::Parser::Lexer::munge_token 12062622
lexer.rb             func       Puppet::Parser::Lexer::TokenList::lookup 12410291
branch.rb            func       Puppet::Parser::AST::Branch::initialize 19084001
methodhelper.rb      func       Hash::each                       19144097
methodhelper.rb      func       Object::set_options              20712067
ast.rb               func       Puppet::Parser::AST::initialize  22431107
lexer.rb             func       Array::each                      23868735
parser_support.rb    func       Class::new                       28520157
lexer.rb             func       Puppet::Parser::Lexer::find_string_token 29610463
lexer.rb             func       Puppet::Parser::Lexer::find_regex_token 29844180
parser_support.rb    func       Puppet::Parser::Parser::ast      36132924
lexer.rb             func       Puppet::Parser::Lexer::find_token 44581509
lexer.rb             func       Object::catch                    46187059

And here are the exclusive times (i.e., only counting the time spent in each method, not the time between entry and exit):

ast.rb               func       Puppet::Parser::AST::initialize   1719039
lexer.rb             func       Puppet::Parser::Lexer::Token::convert  1920189
branch.rb            func       Puppet::Parser::AST::Branch::initialize  2062436
lexer.rb             func       Object::catch                     2262789
lexer.rb             func       StringScanner::match?             2471109
parser_support.rb    func       Class::new                        3112051
lexer.rb             func       Puppet::Parser::Lexer::skip       3282333
lexer.rb             func       Puppet::Parser::Lexer::find_token  4930508
lexer.rb             func       Puppet::Parser::Lexer::munge_token  5232851
lexer.rb             func       Puppet::Parser::Lexer::find_regex_token  5572052
lexer.rb             func       Puppet::Parser::Lexer::TokenList::lookup  6659161
parser_support.rb    func       Puppet::Parser::Parser::ast       7144763
lexer.rb             func       Hash::[]                          7179650
methodhelper.rb      func       Hash::each                       14954253
lexer.rb             func       Puppet::Parser::Lexer::find_string_token 17045065
lexer.rb             func       Array::each                      20768233
-                    total      -                                132806477

Given this data, we're spending about 1/7th of the total parsing time just in the find_string_token method, which currently looks like this:

def find_string_token
    matched_token = value = nil

    # We know our longest string token is three chars, so try each size in turn
    # until we either match or run out of chars.  This way our worst-case is three
    # tries, where it is otherwise the number of string chars we have.  Also,
    # the lookups are optimized hash lookups, instead of regex scans.
    [3, 2, 1].each do |i|
        str = @scanner.peek(i)
        if matched_token = TOKENS.lookup(str)
            value = @scanner.scan(matched_token.regex)
            break
        end
    end

    return matched_token, value
end

The method is responsible for determining if the next token is a simple string-based token. I've optimized it by taking the fact that the longest string-based tokens are three characters (the <<| and |>> tokens), so I look for three character matches, then two, then one. If I don't get a match by then, then we don't have a match. I could probably optimize further, since these three character tokens are pretty darn rare, especially compared to the one character tokens, but I'd need to hard-code a lot more knowledge about the token list, and really, this iteration should be delimited by an automatic determination of the longest token, rather than hard-coding it.

This really isn't a very good write-up of what I did with DTrace or how it was helpful, other than showing the interesting differences between the exclusive and inclusive data, and letting you know that the rb_calltimes.d script is the one to start with, but hey, that's more than I could find when I started looking, so hopefully this will get you somewhere.

I expect to continue spending more time using DTrace for optimizations, and I'll hopefully start uploading my data so I don't have to worry about taking these snapshots. Graphs would sure be nice....

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 05 Feb 2008 | Tags: , , , , , ,


Name:


E-mail:


URL:


Comment: