<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Madstop &#187; cache</title>
	<atom:link href="http://madstop.com/tag/cache/feed/" rel="self" type="application/rss+xml" />
	<link>http://madstop.com</link>
	<description>Puppet development, configuration management, and less</description>
	<lastBuildDate>Mon, 02 Aug 2010 04:07:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Data Lifetimes and Cache Expiration</title>
		<link>http://madstop.com/2008/11/08/data-lifetimes-and-cache-expiration/</link>
		<comments>http://madstop.com/2008/11/08/data-lifetimes-and-cache-expiration/#comments</comments>
		<pubDate>Sat, 08 Nov 2008 23:33:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Programmer Therapy]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[therapy]]></category>

		<guid isPermaLink="false">http://madstop.com/?p=34</guid>
		<description><![CDATA[This stuff drives me crazy.   (I can&#8217;t seem to say &#8220;drives me nuts&#8221; any more because of the damn joke.  That, and hanging out with too many Brits.)  I&#8217;m putting this post in &#8216;programmer therapy&#8217; because it&#8217;s written more for &#8230; <a href="http://madstop.com/2008/11/08/data-lifetimes-and-cache-expiration/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This stuff drives me crazy.   (I can&#8217;t seem to say &#8220;drives me nuts&#8221; any more because of the damn <a href="http://www.coolrunning.com/forums/Forum1/HTML/164425.shtml">joke</a>.  That, and hanging out with too many Brits.)  I&#8217;m putting this post in &#8216;programmer therapy&#8217; because it&#8217;s written more for me than for you, but maybe you&#8217;ll get something out of it.</p>
<p>Anyway, so I&#8217;m once again wrestling with data lifetime in Puppet.  This is one of those problems that always seems licked but then crops up again.</p>
<p>See, there&#8217;s plenty of data in a Puppet transaction whose lifetime should only be that transaction:  File stats, user name to gid mappings, and all kinds of information about the current state of the machine.  We need to collect this information, and we don&#8217;t want to collect it more than once a transaction (e.g., we need a file&#8217;s uid, gid, and mode, but we can get them all from a single Stat instance); but we also don&#8217;t want the data lying around for the next transaction.</p>
<p>So Puppet has support for a &#8216;flush&#8217; method throughout most of the RAL:  By default, the transaction calls &#8216;flush&#8217; on each resource, and the resource calls &#8216;flush&#8217; on its provider.  This makes it easy to get rid of data that should be cleared up after the transaction.</p>
<p>Kind of.  See, the *real* reason for the &#8216;flush&#8217; method is actually to flush changes to disk; e.g., the provider might have multiple attributes changed, and then you call &#8216;flush&#8217; on it to make all of those changes at once.  It&#8217;s just that it&#8217;s also a convenient place to clean up data because, well, it&#8217;s the only place to do so.  So some time in the last few months or years or decades, my brain decided it does both things, but it seemed to hide this conclusion from me until yesterday.</p>
<p>But yesterday I was trying to fix all of the broken tests resulting from my file serving refactoring, and I was finding that, not surprisingly, I kept having these cached stat instances lying around &#8212; but *only* if the file hadn&#8217;t changed.  E.g., consider this code:</p>
<pre><code>
assert_events([], resource)
File.unlink(file)
assert_events([:file_created], resource)
</code></pre>
<p>Ignore, please, whether this is a good idea or what; the point is that the first line runs a transaction that results in a cached stat but no changes; and because there are no changes, there&#8217;s nothing to flush to disk; and because there&#8217;s nothing to flush to disk, &#8216;flush&#8217; isn&#8217;t called; and because &#8216;flush&#8217; isn&#8217;t called, the &#8216;stat&#8217; is lying around still.  Which means the next transaction uses the cached stat, but of course, reality has changed in the meantime.</p>
<p>So, I need something that will make it easy for my data to match the lifetimes I want.  I made this <a href="http://github.com/lak/puppet/tree/master/lib/puppet/util/cacher.rb">Cacher Module</a> that purportedly solves this problem for me, but noooo, it solves a different data-lifetime problem:  I have a lot of initialization code that generally only runs once but ends up running many times during testing, so I needed a clean way to remove the initialized data after every test.  So, this module has a single, global boolean that defines whether a given chunk of data is expired or still valid.  That&#8217;s all fine and dandy for one-time configuration data, but it doesn&#8217;t work for transactions.</p>
<p>So, now I&#8217;m trying to enhance that module to support either the global expiration marker or a per-instance marker, so that I can have all of the resources use a timestamp in their catalog to determine whether their cached data is expired.</p>
<p>One of the hard problems here is that you don&#8217;t want to find yourself maintaining a global list of anything anywhere.  My first design for the Cacher module involved it keeping a reference to all the cached data, which would have made it easy to clear the cache but would have had lots of code accessing tons of global data; stupid.  Instead, everyone keeps references to their own data, and the only global data is the expiration marker.  This works great for global things.</p>
<p>But resources and catalogs aren&#8217;t global.  Even worse, I&#8217;m (reasonably) using an internal class to check expiration.  So, that internal class needs to have access to the catalog&#8217;s timestamp in order to figure out if its cached data is still valid.  This is automatically ugly &#8212; an instance you shouldn&#8217;t even need to know exists now needs access to a big collection of resources.  Yuck.  Without this, the internal class doesn&#8217;t have a reference to anything except the data it&#8217;s managing, but there&#8217;s just no escaping it needing access to that timestamp somehow, so you either give it a reference, or you pass it one every time you check for a value.</p>
<p>The *stupid* thing is that this isn&#8217;t the problem I want to solve.  I just want the old fileserving behaviour but now with gooey RESTfulness.  That, plus the fact that I&#8217;ve apparently decided that I can no longer use spackle and sheetrock mud during development means that I have to fix this even though it wouldn&#8217;t normally be on my critical path.  Yuck.</p>
]]></content:encoded>
			<wfw:commentRss>http://madstop.com/2008/11/08/data-lifetimes-and-cache-expiration/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>
