Ruby has a distribution problem
I've been doing a better job of reading development books recently (e.g., Domain Driven Design), and something has really begun to stick out at me. There seems to be a split between those developers who write software that is expected to run in one place and those who write software that is expected to run in many places.
If you, as a developer, know that your software will really only be installed at a single customer (whether that customer is your employer, a consulting client, or whatever), then your life is drastically easier -- you don't usually have to worry about cross-platform issues, and you don't have to worry about different users having different needs, because you only have one user.
Obviously there's no inherent problem with having the simpler life of a developer with only one user, but it seems to me that the Ruby community is, as a group, largely adopting that perspective as the default. This is worrying to me, because I'm building an application that I expect to be installed in thousands of locations (in fact, it's probably already installed in thousands of locations). I'd like to take as much advantage of existing Ruby code as possible, but it's not exactly easy.
For example, rubygems (probably my least favorite ruby software of all time) basically require that you always try to load them, because their design stupidly requires that you know whether a given piece of software is installed via rubygems or some other mechanism. For instance, if you've installed the Facter gem, then this code doesn't work:
ruby -rfacter -e 'Facter.to_hash'
Instead, you have to do this:
ruby -rubygems -rfacter -e 'Facter.to_hash'
The reason is that rubygems installs in a location that Ruby doesn't search by default. The reason for that is that apparently this one guy, somewhere, wanted to have multiple versions of a given package installed at once. Who wants this? Let's just say it's not the guys who are distributing hundreds and thousands of copies of their software.
The truth is, most Rubyists don't even seem to use gems this way -- they tend to create a vendor subdirectory in their project, and then install their gems there. This is a clear example of how little they expect to have their projects distributed. These gems might be compiled, they might conflict with installed software, they might require installed software -- you have no idea, because it's an entirely separate repository of packages.
This is basically anathema to how I think about management, yet it's the standard, recommended practice in the Rails community, because it makes it easy to "guarantee" behaviour in a given environment. Of course, your guarantee is only good if no one ever tries to run the software anywhere except an exact duplicate of where you run it.
I tried to have a conversation about this at the Ruby Hoedown last year -- my claim was it was difficult to turn a Rails project into a native package, especially with the tendency toward requiring all kinds of random gems. Quite a few people kind of stared blankly at me and said, multiple times, "I just put it in vendor." Since then, this has become my go-to phrase for describing the Ruby way of solving distribution problems: "I just put it in vendor." I keep waiting for someone to try to put their kernel or web browser in vendor: "We only support the Firefox copy in vendor, sorry."
I don't know if other communities are any better at this. From what I can tell, this is basically how the Java community behaves, too. They have pathologically bad distribution systems, and as a language it seems to be most influenced by consulting shops developing huge, worthless software projects for large enterprises, rather than developing distributed applications that will be installed in thousands of locations.
I'd like to think that Puppet would have some counter-affect to this. It's one of the largest and most sophisticated publicly available Ruby projects, it's already installed in at least hundreds and probably thousands of places, and it does a pretty good job of working nearly everywere. However, I keep getting blank stares when I talk about this with other Rubyists, half the time I'm called a troll for even bringing it up, and when I explain why Puppet exists to most Rubyists, they just say, "I just put it in vendor", or, maybe, "Why not just use Capistrano?" To that I ask, how do you install Capistrano, but you know what they say to that.
I think Rails is a big part of the problem. Rails is clearly created by a company that will never distribute its software, and the Rails philosophy is again almost pathologically opposed to the idea of turning your software into a package. Imagine trying to make a Rails project LSB compliant -- your database.yaml file would need to be in /etc, your log directory would need to be in /var, and your actual code would need to be in /usr. There went all of your fancy Rails "convention over configuration", and you're suddenly fighting Rails instead of using it, and everyone you ask for help just tells you to "put it in vendor".
I'm looking at creating a new application that I'm planning on distributing, and one of my big goals is to be able to distribute the core in one package and various additional pieces of functionality as separate packages. I'll need to simultaneously support as many of my customer platforms as I can and provide a consistent operating environment for my packages. The only way to do this is to have supported operating environments with well-defined dependencies, such as you can almost trivially build in Debian or Red Hat.
For those of you who are thinking, "you could just put it in vendor", or "you could at least use gems", No, I couldn't. Take a trivial example: Say I want to use RRD support in my application (which is likely, in this case). There is Ruby support for RRD, but not in Gem form. Even if there were a gem, though, it would require a native RRDTool package, and, of course, Gems can't specify dependencies on native packages, so I'd be telling my customers, "well, install X gems and Y packages".
Instead, if I use native packages (say, those for Debian and Red Hat, to cover most cases), I can define clear dependencies for all cases. I know Debian provides everything I need, and in the rare case it doesn't, I can provide my own apt repository (and the same for yum). Gems, on the other hand, can really only do Ruby stuff. No, I don't actually want to put glibc in vendor, thanks.
I don't see a solution to this, other than getting more Rubyists distributing their software, but I'd really like to see this issue begin to be approached by the community. I feel like a wolf howling in the wilderness at this point, and if often feels like I'm fighting against my community in order to produce software that hundreds or thousands of people will install, as opposed to just use over the web.
Mon, 05 May 2008 | Tags: ruby, industry, distribution, package
Reductive Labs should fill out the Little 4
Michael Cote and John Willis have been talking for a while about the Little 4 in management software, and it looks like Qlusters is no longer on the list.
I'd love to be able to comfortably say that Reductive Labs deserves to fill that fourth slot (you can have your say too). The truth is, of course, that all of those companies are far larger than Reductive Labs, and they've all successfully gotten investment while we have not (although we haven't tried all that hard). The products of the companies are pretty dissimilar, too -- they're mostly more focused on monitoring rather than what I would call management, as far as I can tell.
On the other hand, Puppet has a lot of traction, and is a clear leader in its space. We've been profitable since almost the beginning (which is to say, profitable enough to pay my meagre wages), and we've got a great and growing community. Now that Andrew Shafer has joined the company full-time as a partner, I do think we're going to start growing, and it's well-timed in terms of how the community is developing.
I do hope we grow this year, but I don't really know how we will. I'm still considering how hard we should be seeking investment, but it seems that VCs are pretty uninterested in infrastructure (or maybe they're just uninterested in me). Really, I'm hoping I can just get a big enough customer base that Andrew and I can build a bigger development team and start doing some of the almost-obvious but really interesting projects to enhance the Puppet ecosystem, like change control applications.
But the summary is, I want to deserve to be in the Little 4, and I think Puppet is popular enough that we just might, but there's still lots more to do.
Mon, 21 Apr 2008 | Tags: industry, software
ArsTechnica Launches Forum on Large-Scale IT
(by Luke) ArsTechnica, my favorite tech news site, has just announced the launch of a forum dedicated to the discussion of large-scale IT:
The forum has been in beta for a few weeks, and already there are some great discussions happening. We envision The Server Room as a place to discuss IT topics that don't already have a dedicated forum, and are of an IT nature. Users are already talking about virtualization, storage, disaster recovery, and systems design. IT hardware discussion is appropriate as a function of the plan.
So go join the Server Room and tell 'em how great Puppet is.
Wed, 09 Apr 2008 | Tags: industry, automation, arstechnica, forum
Podcast with Hyperic
I know it's been a long time since I posted, and there's lots to post about, but it's been a very long month with little time.
Until I get my act together (which likely won't happen until I'm in Melbourne for LCA), here's at least a snippet.
I did a podcast with John Mark Walker of Hyperic a couple of weeks ago when I was in San Francisco for the Velocity summit.
I actually haven't had a chance to listen to it yet, but apparently I do some smack talk or something. Give it a listen.
Mon, 21 Jan 2008 | Tags: industry, podcast, puppet, hyperic, interview
Puppet Blogs
I somehow missed this the first time around, but Blake Barnett has created a Puppet community blog aggregator at http://puppetblogs.com. You can look through it and see that even having only a few blogs makes for interesting reading.
Go out, start your own Puppet blog, and get writing.
Wed, 12 Sep 2007 | Tags: puppet, blogging, industry, community
iLike HJK and Puppet
As I mentioned, John Willis posted about Puppet, and in his post he mentions that iLike uses Puppet. Well, Adam Jacob, one of the partners at HJK Solutions, which is the company that did the Puppet work for iLike, has written up a bit more information:
Puppet enables us to get a huge jump-start on building automated, scaleable, easy to manage infrastructures for our clients. Using puppet, we:
- Automate as much of the routine systems administration tasks as possible.
- Get 10 minute unattended build times from bare metal, most of which is data transfer. Puppet takes it the rest of the way, getting the machines ready to have applications deployed on them. Its down to two and a half minutes for Xen.
- Bootstrap our clients production environments while building their development environment. I cant stress how cool this really is. Because we are expressing the infrastructure at a higher level, when it comes time to deploy your production systems, its really a non-event. We just roll out the Puppet Master and an Operating System auto-install environment, and its finished.
- Cross-pollinate between clients with similar architectures. We work with several different shops using Ruby on Rails, all of whom have very similar infrastructure needs. By using Puppet in all of them, when we solve a problem for one client, weve effectively solved it for the others. I love being able to tell a client that we solved a problem for them, and all its going to cost is the time it takes for us to add the recipe.
Puppet, today, is a tool that is good enough to handle the vast majority of issues encountered in building scalable infrastructures. Even the places where it falls short are almost always just a matter of it being less elegant than it could be, and the entire community is working on making those parts better.
I'm very happy to see people successfully building businesses around Puppet, and it's great to see that companies who are getting press are depending on Puppet to manage those famous applications.
Fri, 31 Aug 2007 | Tags: puppet, ilike, hjk, industry
The CMDB Is A Consultant's Myth
Update: Looks like the author has wisely taken his post down.
So, based on recommendations of a friend of a friend, I've been casually reading up on ITIL, including adding a few ITIL bloggers (yes, there are ITIL bloggers) to my blogroll.
So, I come across this gem in my feeds today:
The CMDB is, first and foremost, an application system. It is not an infrastructure service (like networking), nor is it a core operating system, nor is it middleware. It is an application to be used in the fulfillment of use cases that add value to the efforts of stakeholders.
(My emphasis added.)
I've long thought that the CMDB is just a bunch of crap. What is the (usually 'the', not 'a') CMDB? Well, it stands for 'configuration management database', but as far as I can tell the term is entirely meaningless. It's basically what the whole ITIL world has concluded they don't understand, wrapped into a thing and named.
If the above is a definition of a CMDB, I'll eat my shorts.
The article is apparently a rant, explaining how everyone else is just a pretender because they aren't all super-architect/developer/sysadmin ninjas like the author is.
Ugh. This stuff just drives me nuts. If the CMDB were something real, like a web server or an LDAP service, then you could talk about the actual product, instead of a bunch of "you're not good enough" crap. Just once I'd love to see someone talk about what the CMDB is, rather than what it isn't, in plain terms. I'm not convinced it's possible, because I'm not convinced anyone knows what it is.
This just seems so much like the ERP debacle of the last 15 years or so, where very large companies all competed for who could waste the most money on a product they didn't understand. "We don't know what it is, but we're sure we need it." Just like the ERP crap, any real solution depends on stand-alone, decoupled systems working through abstract interfaces. That means no huge monolithic applications that no one understands and no one can maintain, and it means replacing existing systems, not just layering on top like drywall mud.
It doesn't hurt that this blogger actually calls his blog "ERP for IT", which should say enough right there. "We don't know what it is, but we sure know it's profitable to the consultants and vendors."
Wed, 15 Aug 2007 | Tags: industry, itil, cmdb, crap
LISA 2006
I'm now at LISA in DC, back from FOSS.in and India. I only got a couple of days home before I headed here, and it's taken some time to adjust. I somehow managed to travel for 33 hours last Wednesday (from 2:30am in Bangalore until 11:30pm in Nashville, which is 11.5 hours later than Bangalore's).
We had two configuration management workshops this year; I chaired the first one with Narayan Desai of Argonne Nat'l Labs, which was theoretically focused more on tools and practice, and Paul Anderson and Sanjai Narain chaired the second one, which focused on configuration validation, most of it around network configurations rather than systems.
One of the best parts of the conference so far has been that Cory Doctorow did the keynote, and he's already blogged about the best paper this year, which is about protecting your RFIDs from unauthorized access.
The rest of the conference, for me, is going to different talks, hopefully learning a bit but usually half listening and half hacking and surfing in the back of the room.
I am also now confirmed at LinuxConf Australia in January, so I've got yet more travel lined up already.
Fri, 15 Dec 2006 | Tags: sysadmin, industry, travel
Hit the Ground Running With Puppet
LISA is now winding down, and I'm finally done with my responsibilities for the week. I did a BoF on Puppet last night; it wasn't as full as I might have hoped, but I was in direct conflict with Google's "free beer and schwag' BoF, so it's not too surprising. There were a lot of good questions, and I think I had good opportunities to point out the specific benefits of Puppet and talk about some of my long-term goals.
Then today I gave a 15 minute talk on Puppet in LISA's new "Hit the Ground Running". The amount I talked about Puppet this week gave me plenty of opportunity to know what I should be focusing on in a short talk (I actually had to describe three times how Puppet differs from Bcfg2), and I'm pretty happy with the talk I came up with.
I head home tonight, thankfully, and can get back to development full time, finally.
Fri, 15 Dec 2006 | Tags: puppet, industry, travel
~Puppet
Tim O'Reilly's post on operations has got me even more obsessed with why it is that operating system administration is left to the professionals while development is considered acceptable for anyone.
Or rather, why people don't realize that they're maintaining operating systems all the time.
I'll be doing my best to post about this more as time goes on, but I figured I'd start out with how I use Puppet in my daily life, largely unrelated to being a sysadmin.
I have a very complex shell profile, and I try to keep all of my other important configurations in Subversion, since I have so many accounts in so many places and I'm basically retarded without syntax highlighting and all of my cool little aliases and functions. So, I created a Puppet manifest that manages these things for me, and I'm going to walk through what it does and how it works.
My homedir manifest, which I'll call ~puppet, is mostly responsible
for two things: Link my configuration files to where they're supposed
to be, and create any necessary cron jobs.
Most of my configuration files are in ~/etc/<application>, and I
link them back to the appropriate location; e.g.:
lrwxrwxrwx 1 luke luke 19 Feb 13 14:20 .gaim -> /home/luke/etc/gaim
lrwxrwxrwx 1 luke luke 26 Feb 13 14:20 .procmailrc -> /home/luke/etc/procmail/rc
lrwxrwxrwx 1 luke luke 27 Feb 13 14:20 .profile -> /home/luke/etc/profile/init
lrwxrwxrwx 1 luke luke 18 Feb 13 14:20 .ssh -> /home/luke/etc/ssh
lrwxrwxrwx 1 luke luke 15 Feb 6 19:37 .subversion -> etc/subversion/
lrwxrwxrwx 1 luke luke 22 Feb 13 14:20 .vim -> /home/luke/etc/vim/vim
lrwxrwxrwx 1 luke luke 24 Feb 13 14:20 .vimrc -> /home/luke/etc/vim/vimrc
This makes it easy to control everything in subversion and use the same configurations everywhere.
There's a twist in making the links, though; not all of my home
directories have the same path. So, to make these links work, I have to
create a Facter fact that
points to my home directory. The long term solution to this is that I
have ~/lib/ruby in my $RUBYLIB environment variable and the
following code at ~/lib/ruby/facter/home.rb:
require 'facter'
Facter.add("home") do
setcode do
ENV['HOME']
end
end
That's set in my profile, though, which hasn't been linked through yet, so for setup I usually just set 'FACTER_HOME=$HOME', which Facter picks up and sets for me in Puppet. Because I create a bunch of these links in my ~puppet manifest, I create a reusable component to do so:
define homelink(ensure) {
case $home {
"": {
exec { "nohome-$name":
command => "/bin/echo No home variable",
logoutput => true
}
}
default: {
file {
"$home/$name": ensure => "$ensure"
}
}
}
}
If I've forgetton to set $home, then this warns me; otherwise, it
just saves a very small amount of typing. Here's how I use it:
class profile {
file {
"$home/.bashrc": ensure => ".profile"
}
homelink {
".gaim": ensure => "etc/gaim";
".procmailrc": ensure => "etc/procmail/rc";
".profile": ensure => "etc/profile/init";
".ssh": ensure => "etc/ssh";
".vim": ensure => "etc/vim/vim";
".vimrc": ensure => "etc/vim/vimrc";
}
}
As you can see, I stuck them into a profile class to simplify
referring to them. I just include this class on every host, since I
always want these links. The first file element just creates a link
from .bashrc to .profile, so I don't have to care whether I'm a
login shell or not. The homelink elements create each of the links
above.
The only other work I currently do in my ~puppet manifest is create spam
processing cron jobs, which are only used on my mail server, and
a signature generation cron job, which is only used machines that I use
as a mail client. Here's what my mailserver class looks like:
class mailserver {
cron { spamyup:
minute => [0, 30],
user => luke,
command => "/home/luke/bin/spamproc"
}
cron { spamclean:
hour => 2,
minute => 15,
user => luke,
command => "/home/luke/bin/spamproc -a"
}
}
Every half an hour I want spam processed, and once a day I want spam deleted. Done, portably.
class mailclient {
cron { "joyent-sig":
command => "$home/bin/signature --write",
minute => [0,5,10,15,20,25,30,35,40,45,50,55]
}
}
I have a large collect of quotes and such that I like to use for random signatures, and some stupid mail clients coughThunderbirdcoughEvolution* aren't smart enough to consistently be able to use scripts for signature sources, so I change the signature every five minutes. I only run this on my desktop and laptop, not on all the machines I have server accounts on.
Lastly, I decide what happens where:
include profile
case $hostname {
culain: {
include mailserver, mailclient
}
phage: { include mailclient }
}
You can see that everyone gets the profile class, but only specific machines get the mail-related classes.
So, you might say to yourself, "Well, Luke's a sysadmin; I don't have that kind of problem". I would instead say that you've probably structured your computer usage in a way that allows you to avoid the problems that I'm solving, and I would guess that there are things you do that could be significantly simplified through this kind of application of Puppet to your personal computing uses.
Just for posterity, here's the full configuration:
define homelink(ensure) {
case $home {
"": {
exec { "nohome":
command => "/bin/echo No home variable",
logoutput => true
}
}
default: {
file {
"$home/$name": ensure => "$ensure"
}
}
}
}
class mailserver {
cron { spamyup:
minute => [0, 30],
user => luke,
command => "/home/luke/bin/spamproc"
}
cron { spamclean:
hour => 2,
minute => 15,
user => luke,
command => "/home/luke/bin/spamproc -a"
}
}
class profile {
file {
"$home/.bashrc": ensure => ".profile"
}
homelink {
".gaim": ensure => "etc/gaim";
".procmailrc": ensure => "etc/procmail/rc";
".profile": ensure => "etc/profile/init";
".ssh": ensure => "etc/ssh";
".vim": ensure => "etc/vim/vim";
".vimrc": ensure => "etc/vim/vimrc";
}
}
class mailclient {
cron { "joyent-sig":
command => "$home/bin/signature --write",
minute => [0,5,10,15,20,25,30,35,40,45,50,55]
}
}
include profile
case $hostname {
culain: {
include mailserver, mailclient
}
phage: { include mailclient }
}
Updated to fix library path