I recently went mountain biking in Bend, Oregon, which is a few hours’ drive from my house. As I usually do, I used Strava to record the ride. This was a new trail for me, and I got lost multiple times, so I also used Trailforks to figure out how to get back to my vehicle.
I’ve been thinking a lot about how services like Facebook start out helping us, but at some point the relationship shifts and we’re stuck helping them more. I don’t think Strava has gotten to a stage where the relationship is abusive, and maybe this is weird but I hope they stay too small to ever get there. Even without that shift, though, I am a little uncomfortable with our relationship.
I was standing on the trail, trying to find the right trail and idly watching a coyote search for chipmunks. As I switched between apps, I realized that I was losing a lot by recording in Strava instead of Trailforks. Or rather, I wasn’t; people who want to bike this trail in Bend were.
I log my rides (and my infrequent, hilariously slow runs) in Strava for, um, reasons. I don’t want kudos, and I don’t really look at my historical performance, although I do enjoy being able to study my rides at times. I want the data there, even if I rarely use it. And it conveniently automatically completes my daily exercise goal in Streaks. It’s kind of useful, but if it went away tomorrow, I wouldn’t miss the online component much, if at all.
If I instead recorded all of the trail rides in Trailforks, then everyone who came after me could get some value from the information provided by my ride. They could see what route I took, could likely tell based on my speed when I had to walk and when I was able to ride, and they could over time get a sense of what routes are most popular, or even what signage is confusing.
I built an open source company, so I’ve thought a lot about the worth of contributions. A developer’s time on one project can’t be spent on another. Someone who writes documentation for your baby is giving up the opportunity to contribute elsewhere. It’s a conscious choice on the part of the contributor, and a constant interaction between the project and its constituents to keep people coming back.
I think Trailforks really understands the value of my contributions: If you do this, people who ride here will have higher quality data, and probably better rides.
I am super confident that Strava knows the value of my usage: I’ll get feedback from friends and I can track my speeds and feeds. But those aren’t contributions; that’s not something I’m giving up for the greater good. It’s something I am doing for selfish reasons.
The value of my trail ride could be for the greater good, though. Even my road rides and runs could be, as they could help people find routes, but the trail rides especially seem valuable, because the downsides of being lost out there are materially worse than not having the right route for your run.
It’s clear to me that Strava is not seeing my data as a contribution. They’re focused on engagement. That’s not inherently bad - lots of people use and love the app - but it is different. I find it interesting to think of what the experience would look like if that changed.
But after that, I thought: Why can’t I just share the data with both apps?
I mean, to some extent I can. I can just run both of them and let each record its own view of the world. This is what I did with Slopes and Strava in Mammoth, taking the lifts up and riding down. That made a little more sense because neither quite has the correct view of the world - Slopes doesn’t know what bikes are, and Strava doesn’t know what lifts are. It was pretty kludgy, but more importantly, I didn’t run into this conflict because the apps exist to do pretty much the same thing, just for different sports.
I could duplicate here, I assume, but… it seems stupid.
Beyond our relationships to service providers, I’ve been thinking about what it means to own your own data. It sounds awesome, but it’s rarely very useful in practice.
It turns out, I do own the data that I have posted on Strava. Great! So I’ll just share it with Trailforks, too.
Hmm. What would that look like? Can I… download the data, and then upload it to Trailforks? Is it a common data type?
Can I record it separately on my phone and post it to both apps? Is that what truly owning my data would look like?
It’s hard to imagine that world: You use apps that generate data, which by default is yours and only yours. It gets recorded in forms that are easy to share, understand, and manage. If you like, you can then contribute that data to other sites, and in doing so, you get to negotiate with them exactly what rights you’re passing on. Either way you have the data, but now they get a copy, too. If you don’t like their offer, you still get the data, and most likely, given you have all your delicious data, other apps will crop up with a different offer, because they can focus on that rather than all the data collection.
It would look a lot like the text editor I’m using to write this article, Ulysses. It allows me to publish, but is built first and foremost to make it easy to write. Sharing, contributing, engagement, and all of the other online stuff is left to other sites, other apps, like Wordpress and Medium. And yeah, those apps do allow both writing and publishing, but it’s a horrible experience, a great way to lose data, and if you only write there then your data is stuck in their system and is pretty hard to get out in a useful way.
The world of writing looks weirdly different from the world of recording rides. And a lot worse.
I’m not in control. Legally I own the data, but, ah, I don’t have it. Strava does.
I would never write directly in Medium, so why am I logging my rides directly in Strava? What am I giving up because of it?
I’m pleased to find that Strava will allow me to give other people the data - the data that I own! - and it turns out that Trailforks knows how to slurp it out of Strava.
So it all ends well: My rides are in both locations, and every mountain bike ride I post to Strava will now be automatically imported intro Trailforks. Probably.
But for that brief moment, in Bend, while watching the coyote… I saw what it would take for me to really own my data. I liked it a lot.
I’ve always had a warm place in my heart for filesystems.
I taught myself shell scripting while automating the installation of Disksuite, Sun’s free but sadistic disk mirroring software. I barely recall the actual work, instead remembering a hallway. I undertook a literal journey to learn programming, a repeated pilgrimage to the desk of a friend who took visible pleasure in explaining to me what I was doing wrong.1 It’s fair to say that if filesystems were less painful in the 90s, I would not be where I am today.
When Sun started advertising ZFS as the (finally!) successor to Disksuite and the filesystem it was built around, UFS, most of its functionality seemed obviously good - make the computers manage the disks, don’t demand people know up front how big a filesystem should be, don’t fail miserably when the server crashes, little things like that. But what was this data integrity thing? I’m embarrassed to say it took me a while to realize I needed it - who really cares if your filesystem is good at storing data, amirite? - and even longer to understand how it worked.
To explain it, I’m going to have to teach you cryptography. Just a little. You’re welcome to skip ahead if you’ve already got this part covered, but I expect most could use a little, ah, refresher. Step 1 in cryptography guides is usually: “Get a masters in mathematics from MIT.” I’m hoping to do a bit better than that. Cryptography really is just a form of math, and while we can’t all understand the details (I certainly don’t) we can at least understand the “algorithms happen here” flow diagrams2.
Cryptography is most famous for its privacy utility: You use it to ensure you and only you can read your files and chat messages. It gets more complex once we need to read them on all of our different devices, but most of it is pretty similar in concept. Even more useful is ensuring both you and I can read some text, but no one else can. It’s more complex, but is essentially an extension of that first use.
Privacy is not the only use case for cryptography. It’s also useful for efficient validation. That is, it can be used to see if a file you have today is the same one you had yesterday. I sent you a document, you think it looks wrong; how do we make sure it did not get changed somehow in transit?
Obviously one way to do that is to just send it again. This is not a great solution, because if you did not trust it the first time, why would you trust it the second? That might also be a bad idea if bandwidth is expensive. You generally want a verification mechanism that takes less space than the original file, and less CPU power than directly comparing the two files.
Cryptography provides just such a capability, usually called a ‘hash function’. It’s an algorithm that converts, say, a large text file into a much shorter string. If you want to ensure the file is not changed in some way, just run it again and compare the output. The short strings are easier to compare than the long documents, and you could even read them over the phone to someone so they can check the file on their end. These algorithms generally produce a string of a fixed length, regardless of input - this makes them efficient for long term storage and comparison, and safe to run on any size file. Here’s an example hash from my files:
03f39f4bfad04f6f2cfe09ced161ab740094905c
As you can see, it’s just a long string of gibberish. It’s not only useful for comparison, not meaningful in it’s own right.
What’s critical about these algorithms is that given a unique input they always provide a unique output. If you and I each have a file that hashes to a given string, then we can be confident we have exactly the same file. Of course, this can’t literally be true: We could design a hash function that only had 256 possible outputs, and there are obviously more than 256 possible inputs. This would produce a lot of what are called collisions, when two files hash to the same output, and, ah, is not terribly useful.
All of the modern hash functions are incredibly long. It is possible in theory but not in practice that a collision would happen. You’d need to execute the function 2^128 times. That’s 3.4 with 38 zeros after it. So, mathematically possible, but you can expect the sun to swallow the earth before the most secure hash functions get compromised. I mean, you can’t. You’ll be gone by then. But your files will still be safe.
Now that you’re at least as much an expert on cryptography as most of the bitcoin hodlers, why does any of this matter?
We were talking about data integrity.
You’d be right to guess that ZFS uses these hash functions to provide it. It goes further than just validating individual files. A little bit of cryptographic genius called a Merkle tree is the key. These don’t just hash the content on disk for later validation; they build a tree of hashes, where the leaf nodes are hashed by the nodes above them in the tree, which are themselves hashed by the root node. If any part of this system is corrupted - because the disk is broken, or someone changed the content some other way - it’s easy to detect. It’s not just that the individual hash will be different; remember each parent hashes all of its children, so now the parent is wrong. And its parent is wrong, too.
If the content is changed by any mechanism that does not also also update the Merkle tree, then it is easy to detect by rehashing all of the content and comparing the results to the stored tree.
This is how ZFS validates data integrity. It can write a block to disk, then pull the block and ensure it still matches the hash. When it writes a block, it updates the parallel tree, and when you ask for the block later, it can tell you if the block is still correct. If it’s not, it throws an error instead of handing it back to you.
When I first learned of this, it seemed overkill, but over time I remembered just how many ways there are for data to get corrupted. The most obvious one is someone changes it for nefarious reasons, but far more commonly you have a failure somewhere in the writing or reading process. The old spinning disks were error-prone, and the new SSD drives degrade eventually. It’s the complexity of reading and writing that really gets you, though: There are multiple layers of caches, drivers, and connections, any of which could introduce corruption.
For the first time on a normal production system, you could at least detect any of those problems. It’s too bad no one ever used it.3
I know, I know, you came to hear about how you could get all the awesomeness of blockchain without using the blockchain and instead I’m giving lessons on two things you could literally not care less about, cryptography and filesystems. Don’t worry. It gets worse from here.
Long after I learned about and promptly forgot ZFS (after all, it’s not like I was using it), I adopted Git. It’s a version control system, used for storing and managing source code. Every geek knows about it, but most of the world only recently learned of it when Microsoft bought Github for $7.5B with a ‘b’. I was an early adopter, switching Puppet to Git in 20084. Eventually I even learned how it works. I was titillated and a bit horrified that I had duplicated in Puppet one of the key features that made Git work: A system of storing files that allowed them to be looked up by their content (or rather, a hash of their content). Normally you store files by a name, but if lots of people (or, in Puppet’s case, computers) store the same file, they might not call it the same thing, so Git and Puppet instead stored them by their hash. This ensured we never backed up more than one copy of a file, saving a lot of space, and made it easy to check for changes in files.
For Puppet, we just used this to back up files we changed, in case people later wanted to revert.
Git did a lot more than that.
Like ZFS, it builds a Merkle tree of the entire file repository, with a similar goal: To understand what files have changed and how. After all, git is used to track and share changes to a collection of files. The sharing is a critical component; you can easily copy an entire git repository to another computer, or another person, and it’s important that they be able to confirm that they have a faithful copy.
Git stores the hash tree alongside all of the files. At any point, you can use the tree to validate every file in your tree. If there are changes (which is pretty much the whole point of a version control system), it can automatically store the new files and update the related tree.
Just like ZFS, one of the key features here is that the Merkle tree allows us to validate every file stored. We can walk the file tree and compare each file to its hash, and then compare the file listing to its own hash, all the way up. Any discrepancy is easily spotted.
This is my favorite kind of cleverness: It’s simple in implementation, yet makes Git more flexible and useful. It has power that other version control systems are missing, just because it relies on this basic mechanism for storage and validation.
It would be easy to see the blockchain as a sudden revolution, a dramatic change in what’s possible. Viewed this way, it’s hard to separate the pieces from the whole. If all you see is the big picture, it’s easy not to notice that every individual component has its own history, its own value.
The blockchain was gradual, for both me and the industry. It was not one giant leap forward. It was part of a story, a sequence, and the most interesting aspect - Merkle trees - is decades old in math and now pushing decades old even in popular usage. Most of the interesting features touted in the blockchain come directly from them. Immutability (which isn’t) and trustless systems derive directly.
It’s worth understanding that history, to see which stages and steps apply to problems you have. The current cryptocurrency tech stack is built to solve problems I don’t think exist. Certainly they aren’t problems I have.
Unlike the blockchain as a whole, though, the individual technical components have been used for years, even decades, in production. Focusing on the current trend can blind you to the opportunity history demonstrates. I think you’re a lot more likely to find broadly applicable solutions there than in trying to replace currency.
Because I got here from the world of filesystems and version control, I see different benefits than you might if you approach thinking of currencies or exchanges. Or chat messages. That does not make me right or wrong, but it does, at least, mean we’re going to work on different problems.
I expect most of you think this is boring. That’s great. It will give me that much more time to build something.
My brightest memory is learning that of course the ’echo’ command resets the exit code variable. This was a critical early lesson in how your own debugging can dramatically change the behavior of a program. ↩
When people talk about the futility of trying to ban cryptography, this is what they mean: You can’t ban math. ↩
Yes, I know some people use and love ZFS. But never to the extent it should be. ↩
Resulting in one of our critical community members abandoning Puppet in protest, for some reason. ↩
I am a tool junkie. I love the effortless balance of a well-known chef’s knife, like my hands know what to do all on their own. Heavy usage builds callouses and tunes muscles, its usefulness evidenced by scuff marks and changed infrastructure. Failure leaves blisters or even hospital visits in its wake.
A good tool proves its utility. Knives slowly shrink with sharpening, work pants thin, machines need oil. If they don’t, you’re either not maintaining your tools, or barely using them.
This wear is proof of your usage. They should be scratched. Dented. Aged. Patinas should be acquired from the shop, not factory treatments. Their callouses should pair yours. Tools can not be precious. They’ll just live on a shelf, then retire to your attic. You should seek that perfect middle ground, where you spend enough money that your kids can inherit them, but not so much that you are squeamish about giving them a job.
Tools only deserve the label if they help you work.
You might say I have strong feelings about them. I’m assuming this love led to my focus as a software entrepreneur on helping people people work. Or maybe my experience with tools in the physical world led me to seek them in the digital world, learning to make what I could not buy.
Given my tool fetish, you’d think I’d have a solid grasp of what I mean when I use the word. Apparently, not so much. I was recently pulled up short by a simple question, asked by Jordan Hayles of the Radical Brand Lab: What do you mean by tools?
What do you mean, what do I mean? It’s a simple question, right? The above text gives one example, but I would have thought I could answer it in a bunch of reasonable ways, none of which seem terribly controversial.
But the more I explored, the less simple the question became.
I’ve been describing my goal as building power tools for people. This phrase comes from my time building houses with my dad, and ‘power tools’ just meant the things you plugged in. You know? Because they needed power? It’s a common usage, maybe the word choice here did not mean much.
Except… I’ve spent more than a decade learning product management, describing myself as a product-oriented founder, managing that function in a growing company, and attempting to teach it to others. Yet here I am ignoring both the term and the field entirely. Why am I so quickly dumping my work of the last ten years? Is it just creative branding? Cynicism about my industry?
Why not power products? That’s a motor boat of alliteration: ‘power products for people.’ Awesome, right?
Ok, maybe not.
Product management as we know it began in the consumer goods industry. You’re handed a train car full of dish soap and told to sell it. You’ve got to package it, set pricing, convince a local store to carry it, argue with them about location, move it away from competitors, all that. Every product you see in your local grocery store is loved by a product manager who fights for its shelf space, believes it is beautiful, and wants you to give it a good home.
Tide soap is one of the most commonly stolen consumer goods, but not because it’s soap. The strong brand makes it easy to resell, even allowing it to be used as a stand-in for money in drug deals. I wish I was that good at product management. For all that, it says nothing about the soap.
Product management can also be used for evil. Laser printers had toner cartridges you could just refill. Not very clean, but cheap and reliable to run once you plonked down the cash for the expensive printer. Modern inkjet printers instead use disposable cartridges. To sustain profit margins in a rapidly commoditizing industry, manufacturers started putting rules in place on the cartridges: You had to buy them from the manufacturer, they had to be replaced every year, you could not refill them, you could not print in black and white if any color cartridges are empty.
The user was getting hurt so the vendor could make more money. People got pissed of enough that the US Supreme Court weighed in.
That’s good product management. Well, it’s evil, but you know what I mean. It’s effective. We’re talking big-B revenue effective. Hmm. A moral distinction begins to reveal itself.
These are examples of companies forcing their business model onto their customers. There’s no difference between the dish soap sold at retail and the one sold in bulk, yet they’re separate products, differentiated through packaging, shipping needs, and labeling. You pay much more for the retail package than the wholesale one, primarily because the business model behind them is so different.
But when I think of a tool, these complications are missing. When I use a hammer, it just has to fit my hand and smash stuff. When I pick up my drill, it works with every bit I own, regardless of the logo. The battery and charger are proprietary, but the vendor’s most visible role in my life is color choice. My yellow drill works just fine with bits from the blue or green companies. (You probably visualized brands by my just mentioning colors. That’s still effective here.) It does not matter whether I bought the drill from Home Depot or inherited it from my dad; once in my hands, it just works.
I think this begins to answer the question of what a tool is.
It helps you do your job, without your worrying about the vendor’s needs.
I know that DeWalt and Mikita need to make money to sell me a drill, but I don’t think about it when I’m using their tools. Even after more than two decades without one, I can comfortably recite that “my” hammer is the Estwing 22oz waffle head with a straight claw1, but none of those details mean I need the vendor’s permission to hit a nail with it. I make a decision about the right tool, I buy it, I use it. End of story.
It is small. If you call something a tool, not a product, you’re saying it’s less, it’s not as complete a solution. This can be belittling, insulting, but it does not have to be. It’s also a statement of independence. Of freedom. Of, and this is going to sound crazy, compatibility.
Products have an implicit, ongoing dependence on their vendor. If that’s me, I love it: I want you to pay me all the time, not just once. That ongoing relationship is how I afford to keep improving what I’ve built for you. This can be a great way to ensure we have a long-term, sustainable partnership. But it’s not always a healthy relationship. The more you have to deal with how I make money, the worse the experience is for you.
I think this is what I like about tools. They’re self-contained. Independent. Using them is fundamentally pragmatic, not a lifetime commitment.
That independence has downsides for me as a vendor. You don’t get any of those delicious growth-hacker buzzwords. Your product isn’t “sticky”, there’s no “moat.” Those are examples of my customers being constrained by my business model, and their absence means revenue is harder to build, to protect.
One might argue I’m better off because treating my customers with more respect makes a better business in the long term, and I’d probably agree. This kind of respectful partnership should deliver higher returns than one that traps and mistreats its customers. I think this is often the right answer, but it’s not a popular one. It’s harder to get funding, to get off the ground. I might be accused of not “wanting to build a real company,” or I might have Silicon Valley’s most dire insult hurled at me: “That’s just a lifestyle business”.
Tell that to Adobe. Or AutoDesk. These are great tools companies. They are the behemoths we know today because they knuckled down and solved their customers’ problems. They worried about that, rather than how they could extract maximum revenue over time. It was a different time, but people have not changed.
I don’t think that every product is compromised when the vendor’s needs show up in the customer’s life, but I think most are. Some of it is laziness, shoring up product limitations with business model innovations, but a lot of it is strategy, recognizing the value of painting your customer into a corner.
Honestly, some of it is just survival. A lot of those inkjet printers are unaffordably cheap, but buyers care only about cost, not value. Some markets are intrinsically dysfunctional, with users and vendors slowly killing each through bad deals and cynical behavior. But as a vendor, I get to make a choice about what markets to play in, and how to work with my customers.
I am a simple person with a simple dream: I want to build something that helps someone work. I have to make money while doing it, because that’s the nature of the job, but I’m more interested in my customers’ work than my own. I know I need a business model, a go-to-market strategy, a plan for growing and supporting my business. But my customers should not need to care about that, should they? If they like what I’m building, they should be able to buy it, and use it. And tell all their friends how great it is. They should not wake up one day to find they’ve accidentally gotten married to me.
I just want to build tools. And I’m proud of it.
We told with great pleasure the (most likely apocryphal) story that this hammer was illegal in Florida because the metal haft could cut your thumb off. ↩
Last year I decided to write more. Daily, in fact. One of my first actions was to ask Om Malik for advice. I had been following him since he was writing for Business 2.0, which was an actually valuable business magazine in the tech bubble. I am now lucky enough to know him through True Ventures.
When we talked, he shared a story about his first published piece. I did not find it helpful.
The magazine he was writing for (Fortune, I think) said his article was too long. He shortened it. A lot. It was still rejected. He compressed it more. Again. And again. I believe the original 1500 word piece became 300. Om’s punch line was that they told him it was their best-ever first contribution by an author.
I don’t think I’m cut out for writing 300 word articles. It’s not just that that kind of compressed writing is hard. It’s also about the missed opportunity: If you’re reading one of my articles, I want to take full advantage.
But my lack of desire to write short pieces is not why his story wasn’t helpful.
It’s one thing to say: “Write shorter”. Ok. I can see how that makes sense. Everyone knows the line, “I’m sorry this is so long, I didn’t have time to make it shorter.” It’s intuitive, implying that investing in the quality of writing somehow intrinsically shortens it.
But more than a year after this story from Om, I had still not found a way to put this principle into practice.
I could compress writing a bit. Programming taught me that shorter code is often simpler and clearer. Of course, it still took years to be good at it.
Stephen King had taught me that adverbs often indicate a problem. If you can get rid of them without losing meaning, your work is almost always better. One way to do this is to choose more expansive verbs.
I was able to translate my speaking coach’s advice into writing, too: She helped rid my talks of filler words. You might not think this is useful in writing - I never find myself typing out ‘um’1 - but I am over-fond of long phrases that can be easily replaced with a single word.
I tend to over-qualify. This is an opinion piece. You know that. I can skip all the incidences of “I think”. One “maybe” is enough, I don’t need “should maybe”. Two examples suffice; I don’t need three and an “etc”.
For example, in this piece, I replaced the phrase “was still not able to learn” with “had not learned”. “I had learned as a programmer” became “programming taught me”. Both of them are clearer, simpler, just… better. In particular, the verbs are much stronger.
Again, it’s not that I had learned nothing over the last year. I just… I knew I was using tactics. Simple rules. My writing was shorter, but… not short. The Hemingway app was still harshly judging all my failures.
Then I got lucky. I ran into Lucy Bellwood - a fellow Reedie! - at an Indie.vc event, and we traded book recommendations while her partner patiently slept on his feet. She recommended a new book to me: Several Short Sentences About Writing, by Verlyn Klinkenborg.
It’s amazing.
Of course, I doubt I’ll ever finish it.
It reads more like poetry than prose. I do not like poetry. Nearly every paragraph is a single sentence. Almost none grow to more than one line.
Many books on writing can be summarized as: Sit down and write. Seriously. Now. Keep doing that.
Not this one.
I can’t summarize it for you. It’s, it’s… dense. But I can share a little I have learned. And how it has cursed me.
I received a new term from it: Transitions. I knew what adverbs were long before Stephen King taught me that they’re suspect. Not to be entirely avoided, but hold your nose when you use them.
I already knew my sentences were too long. Even when I broke them up, I knew I relied too heavily on words that connect them (like the phrase at the beginning of this one). This knowledge wasn’t useful, because I wasn’t able to translate it into methods of fixing them.
The first thing Klinkenborg gave me was a name for these words: Transitions.
Like adverbs, they’re usually a sign that you’ve failed somewhere. That you were lazy.
This labeling did not magically fix my writing, but it gave me something to track. An easy measure for how I was doing.
Then he delivered the kicker: Simple guidance on how to eliminate them. Or at least, reduce them.
Knowing sentences should be shorter is not useful if you don’t know how to get there. What makes a great short sentence, vs a crappy long one?
His answer is incredibly dense. I have to slow down when reading it: You should minimize the distance between the period of your previous sentence and the point of this one.
Those long, complex sentences I like to write aren’t bad because they’re long, or because they have too many phrases. They’re bad because their point is so far away from their start. The reader is left to wander through it, holding out hope for a conclusion.
What are you trying to say? Say that. Immediately. Did you leave something important out? Say that. Now. Keep it up until you’re done.
Your old, complex sentence is now a series of short sentences, in order of importance, each getting right to the point.
Of course, that’s not how the book describes it. My description would likely horrify its author. There’s far more to it than this. But it’s a start. And a huge departure from what I was doing.
It’s also why I can’t write any more.
The topics I’m working on now are incredibly important to me. They’re hard to reason about, to capture. And while I’m sitting here struggling with the content, the writing itself keeps getting in the way. The form. The structure of the sentences. The line breaks.
Where I put paragraphs.
It’s not that I’m embarrassed. It’s that the process of writing is distracting me from what I am trying to write.
So please. Forgive me a little writer’s block. I promise I’ll get better.
When I do, I hope you’ll also tolerate experimentation and failure in how to put all this to work. Expect wild oscillations in sentence length. Inconsistency between sentences and paragraphs. Confusion across pieces. I’m in that ‘conscious incompetence’ phase of short sentences, and it’s going to be a rough path for a little while.
I’ll come out a better writer, though. And hopefully you’ll gain something from witnessing the process.
I recently updated Om on my progress, sharing how much his story confused me and these tactical lessons I’d just received. Thankfully he appreciated the note, and was happy to see I took his story as a challenge, rather than a judgement. And he didn’t seem fazed in the slightest that it took more than a year to make anything of his help.
Except, obviously, in this case. Not never. Just not usually. ↩
Congress recently required Mark Zuckerberg to defend his lifelong practice of mistreating your private information. Movements to give you control of this critical data took the opportunity to claim they can prevent future such breaches. Blockchain is the new solution in search of a problem, and personal data is in the crosshairs.
But can the blockchain actually help secure your personal data? What would that take? And seriously, what do people mean when they say we should own our own data?
It sounds nice. Too bad it won’t help. The problem is not “ownership” (whatever that even means in a world of infinite digital copies). It’s centralization. Having one person’s data is a small threat, only to that individual. Having everyone’s data is a national crisis.
By now we’re familiar with the huge amounts that Facebook, Google, Amazon, and apparently everyone except Apple have on us. But how did they get it? Mostly, we gave it to them, through using their products. What we didn’t give to them, we gave to someone else who then passed it on.
There have been massive breaches at Equifax, Facebook1, and many others. Even the general public is becoming aware of the real causes. Some of the the largest companies in the world exist purely to collect your information and sell access to you based on it. They might not sell your data, but they definitely sell your attention using it.
These are the problems you know about. Don’t worry; it gets worse from here. If you think your birthdate and pictures of your kids are personal, what about your DNA?
Anne Wojcicki is married to a Google founder, and she liked their data accumulation so much she started her own company to build a huge pile of even more personal data. 23andMe does not scrape the internet - or your cheeks - to get your DNA; no, people pay for the privilege of giving it to them. Yes, they offer a service in return, but do they clean house after? Hah! No. They keep it. (Hopefully somewhat more safely than Equifax does.)
What’s so wrong about there being a database of DNA from a big chunk of the population? Let’s ask the police.
You might not be afraid of the police. You should consider yourself lucky. I know anyone of color in the US is and should be. I know I am; I grew up on a commune, and policed raided us using helicopters and assault rifles in hopes of busting us for cannabis. I don’t mean to imply that hippies have been as systematically oppressed as African Americans (and certainly not in the south); just that I grew up with my own justified skepticism of exactly what that force was here for.
Even if you don’t fear the police, you should fear the consequences of DNA testing. The science behind most parts of DNA are absolutely rock solid2. The police work is another matter. Beyond outright fraud used to wrongly convict people, the messy world of testing DNA at crime scenes just makes it hard to get correct results. Juries inappropriately treat a complicated test as foolproof. It could be compromised anywhere from the crime scene to police handling to the lab itself. The failure rate even without fraud is high enough that I would not want to trust my life to it.
Not to imply that DNA testing is worthless; quite the contrary. It has been used to exonerate many people who were incorrectly imprisoned and put on death row. It’s not that it always fails, just that you don’t want to finding yourself gambling on it against life in prison.
But remember: This is just for cases where someone has a single person’s DNA. Like having just your fingerprint. What happens when someone like 23andMe has a whole database of it?
“If you didn’t do anything wrong, then you have nothing to fear.” Pfft. Yes, it starts with requests for the DNA of individual suspects, but it escalates to doing a database-wide search for DNA that matches. And by ‘matches’, we don’t mean, “is 100% guaranteed”, we mean, “eh, it’s pretty close”. A DNA “match” directed the police to someone they thought was a relative of a suspect, who was then brought in for questioning. So I guess as long as you’ve never done anything wrong, and aren’t related to anyone ever doing anything wrong, you’re fine. Right?
I feel so much better.
I had investors literally laugh at the idea that collecting this data introduced security concerns. They grew up at Google, so it’s not surprising they could not see centralization as a problem. Just like Equifax started out wanting to make it easier to get loans, and now they’ve got so much power you can’t get one without them.
There is a world of difference between giving someone your data, and allowing someone to include your data in a massive pile of it. Any discussion of the risks of data needs to acknowledge that.
Now we see our discussions of owning your own data don’t quite have it right. What we actually want is decentralization of data. We don’t want a single company to have access to this much information about huge groups of people.
And now you see the problem.
New technology can’t break Facebook’s business model. It can’t prevent Google from scraping every web site on the internet and identifying you by connecting everything. Whether you give it to them or not, they’ll know what you look like, where you live, and who you hang out with.
Most importantly, it can’t prevent people from sharing all that data with these services. After all, they’re getting something valuable in return, like connecting with friends and family. Or figuring out their family tree.
The problem is not the centralization. It’s the effectiveness of a business model built on centralization.
So anyone who comes to you and says “The blockchain will allow you to own your own data!”, ask them in return, “How will you make it such a joy to use that Facebook will go bankrupt?” And please, record the conversation, because I want to see them stammer.
This is fundamentally a product and design problem, but the technofuturists are treating it like a technology problem. “Oh, if only those college students had access to better cryptographic tools they never would have shared that data with Facebook!” 🤯 No. People will stop using Facebook, and 23andMe, and Google, when there are better solutions. And unfortunately, they need to work ten times better, not just a little bit.
So talk to me about the blockchain. I really do want to hear how you’ll use it to help people own their own data, and remove the incentive to centralize all of this data.
But talk to me of products. Of user benefits. Of business models built around all of this.
Because people have to want what you’re selling, and the only way to get that is to build something they want to use. Only then will they be able to own their own data.
I’m a tech founder and recovering SysAdmin. I helped found DevOps and grew up in open source. These days I am convalescing from a decade of physical and mental neglect while running Puppet.
Read more