Puppet: System Administration Automated

Git branch in your bash prompt


Kevin Barnes has posted his mechanism for getting the current branch of the git repository into his bash prompt.

He mentions color in his article, and it turns out I'm the person who added the color, so I figured I'd post my version.

Here are the functions I have:

git_current_branch()
{
  git branch 2>/dev/null | sed -n '/^\*/ s/^\* //p'
}

git_display()
{
  br=$(git_current_branch)
  if [ -n br ]; then
      echo $br | BRANCH="$br" GIT_COLOR=$(git_color) awk '{if ($1) { print ENVIRON["GIT_COLOR"] ENVIRON["BRANCH"] " " } }'
  fi
}

git_color()
{
    git status 2>/dev/null | grep -c : | awk '{if ($1 > 0) { print ENVIRON["ORANGE"] } else { print ENVIRON["PINK"] } }'
}

And then here's my prompt:

title="033]0;h:W007" PS1="$title[$(git_display)$GREENw$NOCOLOR]nu@h("'$?'") $ "

First, the git bits. As Kevin mentions, I color the branch name; I use orange if I've got modified files, and pink if I don't (these are names that I map elsewhere to terminal codes). The three functions provide the three, um, functions: See what branch I'm on, see whether there are uncommitted files, and colorize the branch name.

Now, the bash bits.

First, you'll notice I have a multi-line prompt. I first started this when I switched to a color prompt, because for a while there bash didn't like the hidden characters that add color. I got to choose between a multi-line prompt, or a prompt that wrapped in broken ways. Since I really only wanted color in the path, I put that on the upper line, avoiding most of the wrapping problems. It was worth it, because (especially when I was still a sysadmin by trade) having color, almost any color, in the prompt makes it easy to pick out my commands from command output.

Now that I've had my prompt this way for about 4 years, I'm pretty fond of it. I think the bash wrapping problems are mostly fixed now, but I'm keepin git.

The $title sets the terminal title (which is slightly useful).

So, that's how I add the git branch, colored by whether I have uncommitted files, to my prompt, with a bit more prompt info thrown in for kicks.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 05 May 2008 | Tags: , ,


ralsh is Awesome


So, I'm testing ticket #1099, and I run this snippet of code:

user { testing: ensure => present, home => "/var/tmp" }

Then this one:

user { testing: ensure => present, home => "/tmp" }

Sure enough, the home directory changes (although it doesn't actually move the directory, thankfully), so I clearly didn't do due diligence on accepting the bug information from my client. Now I need to remove the user. Sure, I can modify and reexecute the file, but why should I, when I can just do this:

luke@culain(0) $ sudo puppet/bin/ralsh user testing ensure=absent
notice: /User[testing]/ensure: removed
user { 'testing':
    uid => 'absent',
    home => 'absent',
    password => 'absent',
    gid => 'absent',
    groups => 'absent',
    comment => 'absent',
    ensure => 'absent',
    shell => 'absent'
}
luke@culain(0) $

The extra output of the user is kinda silly, and really only matters when printing rather than modifying users, but still, I use this all the time, and I'm quite fond of it and its silly name.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 26 Feb 2008 | Tags: , ,


Closing the first DTrace loop


I ended up spending what time I had for a couple of days at LCA optimizing Puppet's lexer. As I mentioned in my first DTrace post, I'm mostly just trying to use existing scripts for now and worry about understanding the darn thing later.

It turns out that the Swiss Army Knife of available scripts for Ruby is the rb_calltime.d script in the DTraceToolkit. This one gives us pretty much everything we might want to know in a first pass of debugging a Ruby program, including (most importantly for me) the inclusive and exclusive elapsed times for each method in the system (DTrace insists upon calling them functions, but it's Ruby, so we know better).

Unfortunately, I'd already hacked out most of the optimization before I discovered this script, and it doesn't seem to want to run at the moment, so all I can do is show the current data. Here are the methods that take the most time inclusively (meaning that it counts the time from entry to exit, and thus just tells us where the time is being spent in the overall program, not where the actual problems might lie):

lexer.rb             func       Puppet::Parser::Lexer::munge_token 12062622
lexer.rb             func       Puppet::Parser::Lexer::TokenList::lookup 12410291
branch.rb            func       Puppet::Parser::AST::Branch::initialize 19084001
methodhelper.rb      func       Hash::each                       19144097
methodhelper.rb      func       Object::set_options              20712067
ast.rb               func       Puppet::Parser::AST::initialize  22431107
lexer.rb             func       Array::each                      23868735
parser_support.rb    func       Class::new                       28520157
lexer.rb             func       Puppet::Parser::Lexer::find_string_token 29610463
lexer.rb             func       Puppet::Parser::Lexer::find_regex_token 29844180
parser_support.rb    func       Puppet::Parser::Parser::ast      36132924
lexer.rb             func       Puppet::Parser::Lexer::find_token 44581509
lexer.rb             func       Object::catch                    46187059

And here are the exclusive times (i.e., only counting the time spent in each method, not the time between entry and exit):

ast.rb               func       Puppet::Parser::AST::initialize   1719039
lexer.rb             func       Puppet::Parser::Lexer::Token::convert  1920189
branch.rb            func       Puppet::Parser::AST::Branch::initialize  2062436
lexer.rb             func       Object::catch                     2262789
lexer.rb             func       StringScanner::match?             2471109
parser_support.rb    func       Class::new                        3112051
lexer.rb             func       Puppet::Parser::Lexer::skip       3282333
lexer.rb             func       Puppet::Parser::Lexer::find_token  4930508
lexer.rb             func       Puppet::Parser::Lexer::munge_token  5232851
lexer.rb             func       Puppet::Parser::Lexer::find_regex_token  5572052
lexer.rb             func       Puppet::Parser::Lexer::TokenList::lookup  6659161
parser_support.rb    func       Puppet::Parser::Parser::ast       7144763
lexer.rb             func       Hash::[]                          7179650
methodhelper.rb      func       Hash::each                       14954253
lexer.rb             func       Puppet::Parser::Lexer::find_string_token 17045065
lexer.rb             func       Array::each                      20768233
-                    total      -                                132806477

Given this data, we're spending about 1/7th of the total parsing time just in the find_string_token method, which currently looks like this:

def find_string_token
    matched_token = value = nil

    # We know our longest string token is three chars, so try each size in turn
    # until we either match or run out of chars.  This way our worst-case is three
    # tries, where it is otherwise the number of string chars we have.  Also,
    # the lookups are optimized hash lookups, instead of regex scans.
    [3, 2, 1].each do |i|
        str = @scanner.peek(i)
        if matched_token = TOKENS.lookup(str)
            value = @scanner.scan(matched_token.regex)
            break
        end
    end

    return matched_token, value
end

The method is responsible for determining if the next token is a simple string-based token. I've optimized it by taking the fact that the longest string-based tokens are three characters (the <<| and |>> tokens), so I look for three character matches, then two, then one. If I don't get a match by then, then we don't have a match. I could probably optimize further, since these three character tokens are pretty darn rare, especially compared to the one character tokens, but I'd need to hard-code a lot more knowledge about the token list, and really, this iteration should be delimited by an automatic determination of the longest token, rather than hard-coding it.

This really isn't a very good write-up of what I did with DTrace or how it was helpful, other than showing the interesting differences between the exclusive and inclusive data, and letting you know that the rb_calltimes.d script is the one to start with, but hey, that's more than I could find when I started looking, so hopefully this will get you somewhere.

I expect to continue spending more time using DTrace for optimizations, and I'll hopefully start uploading my data so I don't have to worry about taking these snapshots. Graphs would sure be nice....

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 05 Feb 2008 | Tags: , , , , , ,


A bit more DTrace


(This should have been posted a while ago, but I guess I had a problem and it's been sitting uncommitted for a while.)

After pulling apart the skip method in the lexer, so that the various parts are in separate methods, I get this as my count:

Puppet::Parser::Lexer                    munge_token              56778        358     20335592
Class                                    new                      28242        889     25132822
Puppet::Parser::Parser                   ast                      25881       1147     29695496
Fixnum                                   <                       1817071         16     30723097
StringScanner                            check                   1829886         26     48732560
String                                   length                  3757782         20     78611361
Puppet::Parser::Lexer::TokenList         each                     56778       6618    375813485
Puppet::Parser::Lexer                    find_token               56778       6714    381227038
Hash                                     each                     84949       4563    387630769
Puppet::Parser::Parser                   import                       9   45754308    411788774
Puppet::Parser::Parser                   _reduce_132                  9   45755009    411795083
Object                                   catch                    56018       8086    452970031
Puppet::Parser::Lexer                    scan                       173    2751816    476064309
Racc::Parser                             _racc_yyparse_c            173    2751907    476080064
Object                                   __send__                   173    2751984    476093248
Racc::Parser                             yyparse                    173    2752322    476151712
Puppet::Parser::Parser                   parse                      173    2752742    476224530
Array                                    collect                    331    1446548    478807659
Array                                    each                     26303      18476    485983221

The interesting one there is the Lexer.find_token method -- I just created that, and it looks like it's taking 38/48 of the total parse time, which is a helluva lot.

This method is responsible for picking the token to return, and the complicated aspect of the method is that it has to return the longest match, which is currently done by matching each token in turn (skipping those that don't match), and picking the longest match. This is expensive, because it means that every token is iterated over for every returned token, which means it scales at O(N^2), which is bad.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 28 Jan 2008 | Tags: , , ,


A first pass at DTrace


I've never really spent much time optimizing Puppet except in those areas that get particular complaints (and not always then), but now that I'm forced to run Leopard I figured I should see if I can put DTrace to use.

The first pass used the functime.d script, which tells me how long Puppet spends in each function. I couldn't get the file to execute directly, and I also couldn't get it to execute my script for me (which is a pretty good indication that I don't really know how to use DTrace), so I added the ability to pause my test script, giving me time to start dtrace. So, I run my test script, which I'm using to test parse time:

~/puppet/ext/puppet-test --modulepath /Users/luke/Desktop/puppet-stanford/modules/ -s parser -t parse --manifest ~/Desktop/puppet-stanford/master/manifests/site.pp -p

Then I run the dtrace script:

sudo dtrace -s ./functime.d -p 45847 2>&1 | tee functimes.log

This takes a heckuva long time to run (380 seconds or so, vs. about 6 normally), but in the end I get a big file that has histograms for all of the classes and methods, along with a sorted list of how long Puppet spends in each method. E.g., here's a histogram:

Puppet::Parser::Parser                              parse
         value  ------------- Distribution ------------- count
       8388608 |                                         0
      16777216 |                                         1
      33554432 |                                         0
      67108864 |                                         1
     134217728 |@@@@@@@                                  30
     268435456 |@@@@@@@@@                                39
     536870912 |@@@@@@@@@@@                              49
    1073741824 |@@@@@                                    21
    2147483648 |@@@                                      14
    4294967296 |@@                                       10
    8589934592 |@                                        6
   17179869184 |                                         2
   34359738368 |                                         0

And here's a few of the methods:

Puppet::Parser::Parser                   ast                      25881       1090     28219110
NilClass                                 nil?                    1982238         19     38713614
StringScanner                            check                   2044008         24     50380323
Hash                                     each                     84949       2789    236945697
Puppet::Parser::Parser                   import                       9   29288385    263595467
Puppet::Parser::Parser                   _reduce_132                  9   29289048    263601440
Object                                   catch                    56018       5403    302711262
Puppet::Parser::Lexer                    scan                       173    1752645    303207689
Racc::Parser                             _racc_yyparse_c            173    1752730    303222439
Object                                   __send__                   173    1752803    303234951
Racc::Parser                             yyparse                    173    1753138    303292971
Puppet::Parser::Parser                   parse                      173    1753536    303361785
Array                                    collect                    331     925551    306357434
Array                                    each                     26303      11912    313340970

The first annoying thing to notice about this is that this test is clearly collecting total time between method entry and exit, not the total time that we're in a method, which makes it a bit less useful for testing.

The next thing to notice is that we're calling nil? and check a ton of times, which adds up even though they're individually very cheap.

If we add up all of the calls to check and nil?, we get a bit less than half of the total run time of the parse method (which is the entry point to all of this code), which means they're having a big impact.

This really isn't anything I couldn't get from normal Ruby profiling, but based on my experience working with Brendan a bit at OSCON last year, I know there's much more available.

My next post on DTrace will hopefully include me covering how I used it to drill down a bit further.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 28 Jan 2008 | Tags: , ,


A Better Signature Generator


I just discovered Signature Profiler, which is a great plugin for Mail.app (works on Leopard and Tiger) for creating signatures in Mail.app. Finally, I can get rid of the painfully hackish python (!!) plugin I was maintaining, which is good since it apparently didn't work in Leopard anyway. I never could figure out how to make it provide the signatures without a leading space on each line, which was pretty annoying.

This plugin provides plenty of nice options for managing signatures, but the main thing I wanted was to be able to include the output from my long-standing signature generation script (which largely just pulls a random file from a directory of the quotes I've collected over the years).

I also took the opportunity to trim my signature list; some of the quotes were funny in 1997 but not so much now.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Sat, 05 Jan 2008 | Tags: , , , , ,


Git, one month on


I've been using Git for about a month now. Overall, everyone has been right about it -- it's got some heinous usability problems, but man is it kick ass to have distributed version control.

For instance, I've taken a few trips since I switch to Git, and I've committed on an airplane at least twice now. This seems like a small thing, in that I could always wait to commit, but I'm often surprisingly productive in planes, and there are plenty of things you can't actually recover from in SVN without the full repository (e.g., moving directories around).

The cool things about Git don't all require its distributed aspect -- for instance, its branching is far superior ot SVN's (if you could say SVN even has branching). I found myself three commits into some work last week that really should have been a separate branch. With Git, this was really easy to do -- I branched from the current state, then rewound the current branch to remove the commits I didn't want in it.

I was in a branch named indirection, and I decided it made sense to make a new branch named configurations.

Using the git reset man page, this is what I did:

$ git branch configurations
$ git reset --hard HEAD~3
$ git checkout configurations

This left me in the new branch I wanted and left the indirections branch in the state it was at before I made the big changes.

It's clearly not all peaches and cream, though. As I mentioned, there are definite usability issues. It's not so much that you can't figure it but that it's just seldom what you expect. It doesn't help that the majority of the examples are from Linus's life, and his life is far more complicated than most, in terms of managing repositories.

The mechanism for pulling, fetching, and pushing branches is especially counterintuitive.

Overall, though, I'm very happy with it.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Sun, 23 Sep 2007 | Tags: , , , , ,


Linus on Git responding to KDE


Linus Torvalds posted a lengthy response to someone from the KDE community about using Git with KDE, and it's definitely worth a read:

Practically speaking, you'd generally have one or a few central repositories, yes. But no, it really doesn't have to be a single one. And I'm not just talking about mirroring (which is really easy with a distributed setup), I'm literally talking about things like some people wanting to use the "stable" tree, and not my tree at all, or the vendor trees.

And they are obviously connected, but it doesn't have to be a totally central notion at all.

Think of the git trees as people: some people are more "central" than others, but in the end, the kernel is actually fairly unusual (at least for a big project) in having just one person that is so much in the "center" that everybody knows about him.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Mon, 27 Aug 2007 | Tags: , , , , , ,


Giving Git a run-out


Something apparently snapped while I was at OSCON, and I apparently collapsed my distributed source control management quandary down to Git. I think in the end it doesn't matter all that much, since they're so similar in basic functionality, and I think I mostly got tired of sitting on the fence looking over but not being willing to commit to a specific dSCM.

Once I decided I'd go ahead with Git, my main priority was to get to the point where I could do my development on Puppet in it, which is especially important since it's the only real way for me to figure out if it will work for me, not that I really know what "work for me" means.

There are two crucial steps to testing an SCM for me: Getting Puppet's code into it, with as much history as possible, and making it available for others to have access to.

Getting the code was moderately easy, but made harder by the fact that when I first made my Subversion repository, when SVN was just starting to get popular, so I started without the typical branches/tags/trunk directory set. Here's the command I used in the end:

git svnimport -A ~/puppet-users -i -v http://reductivelabs.com/svn/puppet/ > /tmp/git.out

I tried git-svn, but it never got past revision 567 or so (which is when I switch to the popular directory structure). In addition, I was never able to actually get a working copy of the repository up to that point.

The puppet-users file contains a mapping from svn-style user names to email addresses:

luke = Luke Kanies <luke@domain.com>
lutter = David Lutterkort <dlutter@domain.com>
mpalmer = Matthew Palmer <mpalmer@domain.org>

I redirect output to a file, because it produces a bunch of output (I've got about 2800 revisions) and I don't actually care about any of it, and in addition, because I use iTerm, it takes a whole freaking cpu to scroll a terminal.

This basically worked, except that it started at revision 600 (arbitrarily close enough to the time when I changed the directory structure in the repository).

To make the repository shareable, I first just exported it via http, which was pretty easy, but then I was told I need to use git-server for performance reasons. I built a Puppet module to set it all up, and although the server doesn't work as well as I like (I really like SVN's auth file, which allows me to control who has access to the 32 repositories I maintain).

I'm getting some gritching from the Australians, and it's not like it's perfect, but at least I know I want something like that.

At the least, this has been a great experiment, and I figure we'll spend a week or so messing around with it. I'm not sure I can afford the time to experiment with all of the competitors; Matt's really pushing on darcs, but... I dunno, it seems niche, and at this point, I'm niche enough for all of us.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 07 Aug 2007 | Tags: , , , , ,


gitDisplay all 140 possibilities? (y or n)


I guess this is what people meant when they said git was "Unixy":

luke@phage(0) $ git
Display all 140 possibilities? (y or n)
git                     git-get-tar-commit-id   git-rebase
git-add                 git-grep                git-receive-pack
git-add--interactive    git-gui                 git-reflog
git-am                  git-hash-object         git-relink
git-annotate            git-http-fetch          git-remote
git-apply               git-http-push           git-repack
git-applymbox           git-imap-send           git-repo-config
git-applypatch          git-index-pack          git-request-pull
git-archimport          git-init                git-rerere
git-archive             git-init-db             git-reset
git-bisect              git-instaweb            git-rev-list
git-blame               git-local-fetch         git-rev-parse
git-branch              git-log                 git-revert
git-bundle              git-lost-found          git-rm
git-cat-file            git-ls-files            git-runstatus
git-check-ref-format    git-ls-remote           git-send-email
git-checkout            git-ls-tree             git-send-pack
git-checkout-index      git-mailinfo            git-sh-setup
git-cherry              git-mailsplit           git-shell
git-cherry-pick         git-merge               git-shortlog
git-citool              git-merge-base          git-show
git-clean               git-merge-file          git-show-branch
git-clone               git-merge-index         git-show-index
luke@phage(0) $ git

I think I'm going to be sick.

add to del.icio.us Add to Blinkslist add to furl Digg it add to ma.gnolia Stumble It! add to simpy seed the vine TailRank post to facebook

Tue, 17 Jul 2007 | Tags: , ,


[1] 2 3  >>