Book Review: Mechanizing Proof


This is a really fascinating, if highly specialized, book about an important topic.

Donald MacKenzie’s Mechanizing Proof: Computing, Risk, and Trust [wc] is a good companion-piece to Olav Lysne’s The Huawei and Snowden Questions [wc]; taken together they provide a grim view of computer security, and all of the implications of the software-driven world we are finding ourselves in. When I first read them, last year, I wandered around like Private Hudson in ALIENS, “Game over man! This is some really pretty shit we’re in.” There are three questions regarding software:

  1. Why it is so bad?
  2. Why it is so good?
  3. Given that writing security or reliability-critical software is manifestly harder than writing plain old “just make it work” applications, and plain old software is pretty bad, how can we make security/reliability-critical software good enough to rely on it?

The cynic in me notes that industry’s response to question #3 is to throw patches at software and hope that the problem will go away. Meanwhile, everyone who has thought about the problem for a second realizes that if there’s one bug, there are probably two or more; they are just unknown. Since I got involved in computer security in the late 1980s, we’ve basically seen an entire civilization built on top of beta-test quality code. If that. There are critical pieces of software (e.g.: Microsoft Office) that have been a continuous mine of bugs for over 20 years, with no sign that the vein is running out. For one thing, it’s because software vendors are busy adding new bugs as fast as the old ones can be fixed.

What is needed, everyone eventually realizes, is a way of making software that is good by design. And that’s the topic of Mechanizing Proof. I’m tempted to say it’s an exciting book that reads like a novel, but that would be a lie. I mean, it’s as exciting as some popular pot-boilers and it’s dealing with a reality that is historical record, but it’s a highly specialized topic. That it’s an extremely important topic doesn’t change the fact that it’s highly specialized. It’s a history of the early days of smart people realizing “this stuff is going to be a problem.” And that’s not a minor thing: when John Von Neumann says “this stuff is going to be a problem” the wise man will shut up and listen. Which, people did – but they don’t, anymore.

The meeting was held in the Hotel Sonnenbichl on the outskirts of Garmisch. The vista from its windows and terrace unrolled across the little town to the towering mountains. The keynote speech opening the conference was given by Alan Perlis of Carnegie Mellon University in Pittsburgh, a leading figure in U.S. computer science. “Our goal this week,” Perlis told the small, select group of participants, is the conversion of mushyware to firmware, to transmute our products from jello to crystals.” Buxton recalled the tone being set by frank admissions in an after-dinner talk by Robert S. Barton, who had played a leading role in the development of Burroughs’ innovative B-5000 computer. Barton told participants, “that we were all guilty, essentially, of concealing the fact that big pieces of software were increasingly disaster areas,”

You see? Old fogeys complaining that software sucks is nothing new; it began shortly after the invention of software and has continued more or less apace. The main change is that nowadays, coders are drinking Cristal and driving Lamborghinis, lighting their shirts on fire with $100 bills in VIP lounges in strip clubs. That is my dim memory of the 90s, anyway.

There was general agreement at Garmish that “software engineering” was needed. But what was it? As an analogy, an inspiration, a slogan, few would have disagreed, but it remained an empty box awaiting content. Consensus around the slogan largely evaporated when it came to deciding more concretely what software engineering should consist of. The report of the Garmisch meeting noted that “The phrase ‘software engineering’ was chosen deliberately,” by the study group that set up the meeting because it was “provocative in implying the need for software manufacture to be based on the types of theoretical foundations and practical disciplines that are traditional in the established branches of engineering.” That formulation, however, immediately indicated the potential for radically different emphases on the relative importance of “practical disciplines” and “theoretical foundations.”

The whole book chronicles the white-knuckled warfare that has obtained between those practical disciplines and the theoretical foundations. One school of thought was that we could devise tools that would prove that a piece of software did what it was supposed to, then apply those tools to the foundations of the code-mass, and build trust upwards through the system. The other school of thought was that those foundations could be well-constructed by experts and code that depended on them would gain a degree of reliability through code re-use. I see this as a dialogue between builders and architects – the architects are responsible for knowing the properties of the components and making designs that factor in engineering overhead, and the builders are responsible for reading the architects’ blueprints and then doing something else entirely. It is a fact that architects often design things where the I-beams don’t line up correctly, or there is no space to run the brown water from the bathroom, and the builders just fudge things back together. It is also a fact that occasionally buildings collapse and kill a lot of people, so why hasn’t that happened with software, yet?

My opinion is that it has happened with software; the collapse just is not dramatic and lethal. It’s merely incredibly expensive. I’m speaking there from the perspective of someone who made a good, lucrative, career out of sheltering people from the consequences of their unwise choice to use Microsoft Windows. It wasn’t strictly their fault, I would say, because there were no reasonably viable alternatives anyway – but let’s just say that I don’t ride amusement park rides that are computer-controlled unless I can see that the controllers are Siemens PLCs and not a Windows box.

MacKenzie is writing history of technology, not popular science, so his book lacks the literary flourish of Simon Singh or Siddartha Mukerjee but it’s highly readable and it’s interesting. I’d say it’s just short of fascinating, unless you work in the field or you work in a field affected by software. Of course every field is affected by software, whether you acknowledge it or not: we have moved into an era where your bank’s IT department may screw up and discover that all the money has gone – all of your money has gone – because someone depended on an unpatched version of Apache Struts. That’s not implausible, it’s merely unpleasant. What really sucks is that U.S. Missile Command replaced its old 1960s command/control systems with stuff running on Windows. Probably Windows 95. Do you feel better, now?

I have to confess that I am firmly in the camp that says formal verification is a pipe-dream. That’s because, for formal processes to work, we’d have to also be able to describe the business purpose of the software formally, as well. It doesn’t matter if the software is perfect, if it allows the user to transfer negative one million dollars to someone else’s account – the overarching system in which the software works has to have a verifiable explanation of how the world should work, not merely the software implementing the software simulation of the world. All of this stuff seems like details that matter only to a practitioner, but if you know someone who thinks that their ‘soul’ is going to be uploaded into a computer, I’d suggest you start them with The Huawei and Snowden Questions and then break Mechanizing Proof to them gently.

------ divider ------

The Huawei and Snowden Questions is a topic for another day, but let me explain its relevance, briefly. Basically, it’s a shot below the water-line fired by a theoretician who points out that:

  • We cannot reasonably trust the provenance of our software (Huawei)
  • We cannot reasonably trust the operators of our software (Snowden)
  • It is implausible to compose systems of higher trust out of untrustworthy components
  • Game over, man

People have been saying that for decades (I have slides from my old “writing secure software” course that I used to teach in the 90s) but Lysne says it well, and in a particularly timely way.

The debate between security-by-design and discover-flaws-and-fix was a subtext of my career for about 20 years. Alan Paller, the founder of the SANS conference, and I, used to stake out that debate every so often – with SANS arguing professionally that you could run insecure software on public internets if you just tried hard enough, or something. In fact, SANS has built a massive reputation out of teaching people “patch, patch, patch.” I used to bump into Paller every so often and would snark at him, “so, have we fixed all the bugs, yet?” Of course the problem is that nobody wants to pay the time and resource costs to actually design software to be good. There’s another problem: well-designed software is harder to repurpose by just throwing a few new chunks of code into it – that breaks the design. Bad software is more nimble, which means it’s more commercially successful.

Comments

  1. says

    I should also note that industry’s response has been to automate patching, which has resulted in systems that are even less reliable because they may reboot at random or update the network interface driver and take themselves offline for a while. It means that security gets blamed for unreliable systems, which is not really the problem. The problem is that systems should not need to be patched constantly in order to function.

  2. Dunc says

    I’ve said a number of times that software “engineering” is roughly where European architecture was in the 14th century – aspiring to build soaring cathedrals, but without the knowledge of how to do so safely, with the result that our creations are always falling down, often before they’re even finished.

    I’d also note that we can’t reasonably trust the provenance of our hardware, either.

  3. says

    Dunc@#2:
    You are correct but that is because hardware has become a software process. With all of its warts.

    Remember when Intel made a CPU with a buggy math library? That was a result of failure to do basic regression testing since they trusted their hardware compiler (which is software!)

    When I was at Digital they were running this huge distributed silicon optimizer and simulator called DRACULA. It took weeks to run and that was just layout optimization – I asked if they did any regression testing and the engineer I was talking to said “Why? This is hardware?” It was surreal. Literally, right after we were talking about running software to optimize the layout on my rack of CPUs (which were running software atop software atop an operating system on a virtual machine running on hardware that had been designed by software) that was for the Alpha/2164 CPU. A billion dollar gamble.

    PS – 14th century is spot on.

  4. GerrardOfTitanServer says

    I’ve also become worried about the trustworthiness of hardware, especially after I learned about the backdoors baked into Intel chips and AMD chips that you can do nothing about. Ditto for basically all modern smartphones too. There’s a hidden chip within the chip, that runs its own full operating system, has its own TCP/IP stack, and even a f’ing JVM. That, to me, is game over. I don’t know how to make the public care enough to get the required legislation to even attempt to fix this mess.

  5. johnson catman says

    . . . to get the required legislation to even attempt to fix this mess.

    You are precious. Ain’t gonna happen.

  6. says

    It is not at all clear what is going to happen, and to airily wave off this notion or to propose another outcome as obvious is absurd.

    This is an unprecedented and very new situation. We have been building out vast software infrastructure in the modern sense for… maybe 2 decades? There is no roadmap, there is no historical precedent to consult.

    We’re seeing similar phenomena in the area of physical infrastructure. Highways, bridges, electrical grids, and so on. There’s, I dunno, a trillion dollars of built up technical debt there, because we’ve built and built and built to create this awesome new society, but now it turns out that we need $10B in remediation in California alone if we want to be able to keep the lights on when it’s windy, while not setting the state on fire.

    You can point all kinds of fingers, but at the end of the day this was a group effort. Nobody wants to pay for software, or electricity, or anything else what it actually costs, so we end up where we are today. Sure, we were helped along by some awful people, but it was a group effort.

    And there is no road map. There is no rule that says society has to survive any of this, and there is no rule that says we cannot. There is no rule that says it’s impossible. There aren’t any rules at all.

  7. cvoinescu says

    It is also a fact that occasionally buildings collapse and kill a lot of people, so why hasn’t that happened with software, yet?

    Does the Boeing 787 MAX count? They essentially replaced balanced placement of the engines with software.

  8. says

    @Andrew Molitor#7:
    Yeah, it’s “interesting”

    I suspect that people who build power systems like Germany’s wonder “why did they use above ground lines?” The same way I wonder why anyone used Windows for anything but entertainment. You probably remember my paper on security disasters: these things can take several decades to mature, at which point the costs are much higher. If people had ditched Windows at Version 2 (seriously: I used Windows 2!) we wouldn’t be looking at flipping over a complete software ecosystem. That it’s a walking dead zombie is irrelevant, look, it’s smiling!!

  9. says

    cvoinescu@#8:
    Maaaaybe. That was both a management failure and a user interface disaster.

    What’s crazy is if it was Windows there would be people saying “throw a patch at it and let’s board the passengers! Hey ho off we go to the wild blue yonder!”

  10. xohjoh2n says

    @10

    “An update is available. The autopilot will now shut down for 30 minutes to install it. Since you have already chosen to delay the update once, it will now install immediately and you cannot delay it any longer.”

  11. says

    xohjoh2n@#11:
    “An update is available. The autopilot will now shut down for 30 minutes to install it. Since you have already chosen to delay the update once, it will now install immediately and you cannot delay it any longer.”

    Please log into Microsoft True-ID verification system to make sure your software license is up to date! Thank you! We are sure you will love your new version if Windows; we have made the user experience much better. Here’s a neat photo for your screen background. We have sent an activation link to your registered Email address.

  12. xohjoh2n says

    @12

    “The Windows Genuine Advantage Activation Server could not be reached to confirm the status of your installation. You aircraft will now revert to “safe mode”. Only the cabin light controls will be available until you call this number to purchase a genuine Windows license…”

  13. John Morales says

    I don’t get this hate on Windows. It works sufficiently well.

    (Me, I think users are the problem)

  14. dangerousbeans says

    This sounds like a book I should read, given the recent announcement of ransom-ware disabling some local hospitals
    Though given that, I’m not sure I want to read it :\

  15. says

    Unrelated: [bd]

    The toughest problem the program is having is matching the timing of the aircraft’s fusion software with its sensors’ software. “As we add different radar modes and as we add different and capabilities to the DAS system and to the EOTS system, the timing is misaligned,” and then you have to reboot it. Bogdan said he’s aiming for eight to nine hours between such software failures when a radar or DAS or EOTS needs to be rebooted, which is what legacy aircraft boast. Right now they are at four to five hours between such events. “That’s not a good metric.”

    Elsewhere, this is what the F-35 runs:

    The Integrity DO-178B real-time operating system (RTOS) from Green Hills Software

    That was all in 2016. I’m sure it’s been patched by now.

  16. says

    dangerousbeans@#16:
    That’s a separate but related problem. Program verification is a bit more theoretical than what the hospitals are dealing with. They have a different problem: many medical devices are certified only at a particular operating system release, which means that for them to be vendor-supported they have to be running some ancient unpatched version of Windows. That means they are lunchmeat.

    Proper system administration practice is to isolate legacy systems behind one-way connections, and to segment networks to prevent the spread of attacks. Doing that, however, requires extra time and skill and raises system administration costs – so many local governments, small businesses, and even hospitals do not run their networks in segments and are easy meat for hackers who are able to get a foothold in the systems.

    Security is becoming one of those “you can pay me now, or pay me later” kind of things except that it’s really “you can pay me now and pay me later.” So much for all those cost savings that IT was expected to produce.

  17. says

    John Morales@#14:
    I don’t get this hate on Windows. It works sufficiently well.

    Tens of billions of dollars spent annually on security and Windows system administration say you don’t even realize how wrong you are.

  18. dangerousbeans says

    @Marcus
    It also requires people to know that’s what they should be doing, which is a big(er) problem once you get away from dedicated IT companies.
    Plus, like Windows, they are built by successively bolting more functionality onto an existing core, and never stopping to rationalise the mess. Which you can’t do because that breaks things and you can’t afford to break things.

  19. John Morales says

    Sufficiently good. If it weren’t, people would not use it.

    (Heh. I remember the VCR wars; VHS vs. Betamax — same thing)

  20. John Morales says

    dangerousbeans,

    It also requires people to know that’s what they should be doing

    So, there’s some OS that works for people who don’t know what they should be doing?

    (Heh)

  21. Curt Sampson says

    John Morales @14 write, ” It works sufficiently well.”

    Bwahahahah! John, you have a good career as a “manager” ahead of you. And all the rest of you, you see why we have these problems.

    “The first step to fixing a problem is admitting you have a problem.”

    (Oh, and yes, I am a “Linux” guy of sorts, since I stopped being a NetBSD developer back in the early 2000s, but this is not about OS fanboyism. Linux has many of the same problems that Windows has. It’s easier to configure to have fewer problems, sure, but as with almost all software out there, it shows the same lack of focusing on the core things that would make the problems go away.)

  22. lochaber says

    I worked at a hospital previously, and several of the computers I used (running windows, of course…) were borked all to hell. One of the computers had internet explorer, and it had like 3 search bars. Another computer would open a web page if you typed in a URL, but if you clicked anything, you would get redirected to fake google pages and have several popups made to look like system alerts. I couldn’t dismiss or escape them (I sure as hell wasn’t going to click on them), and had to pull up the task manager and forcequit the browser. Others couldn’t use a firewall, because then they wouldn’t work with the machine that they were expressly there for.

    And then updating… it would take 5-30 minutes just to download the updates, during which you couldn’t do anything else. And then it would often repeat the process again after restarting. One time I was asked to update a computer that hadn’t been updated in some time, and it took the computer out for the whole day while it repeatedly sat there downloading, installing, restarting, rinsing, and repeating.

    It’s been long enough since I’ve owned a Mac that I can’t remember the update procedure, but I think I just installed the OS from CD.
    Been running a Linux at home for a while, and the update is pretty painless – it periodically checks, and changes an icon in the menu bar if there is something to update. I click on it, review the list of stuff to update, enter the password, and it downloads and installs in the back ground, and I can keep doing whatever I was before.

    I’m hardly a computer person, but I’ve been pretty happy running Linux for the most part. for one, it’s hella cheap. secondly, almost all of the times I’ve had problems were when I was fiddling with something, usually reading off of a message board or blog post, and skipped the “make sure you back up your system before this next part”. And, I like how Linux (and I believe Mac, although they may have changed recently) require a password to install anything, which is a problem with the default windows machines I’ve used.

  23. John Morales says

    Curt: Yeah, well, I most recently worked part-time for 13 years in local Government and we ran with Windows.
    Fuck-all downtime. So, what exactly was the purported problem with it?

    And also, if you’re in the business, you know the difference between applications and the OS.
    Dunno how many times I’ve had people whinge about the latter when the issue is with the former.

    (Also, I’m now a self-funded retiree, who always turned down management positions.
    I worked to live, not lived to work. So, no career beckons, since I’m not a big spender, and what I have suffices)

  24. lochaber says

    eh, yeah, but you wanted to know why people didn’t like windows computers.

    I don’t like them, because in prior work environments, more of them than not were various sorts of screwed up. Some of it was browser malware and such, but also a lot of it was the PITA of updating, and the various system messages and popups and what not. And Ransomware…

    I said I’m not really a computer person, but I’ve frequently been the person asked to fix problems and such in past work environments. You can only blame so much on user error. If you need a programmer or an IT expert to use a basic desktop effectively for basic tasks, that’s a poorly designed system.

  25. colinday says

    One problem with software security is the anonymity of the internet. What prevents people from breaking your windows is not the physical properties of the glass, but the difficulty of their getting away with it. Did anyone notice that the failure of a particular support in the OKC federal courthouse would bring down a lot of the building, and also that the support was susceptible to a fertilizer bomb? Did the architects of the World Trade Center think about what would happen if a jetliner crashed into it? We don’t expect physical engineers to design against malice, but we do expect software engineers to do so.

  26. John Morales says

    lochaber:

    I said I’m not really a computer person, but I’ve frequently been the person asked to fix problems and such in past work environments.

    :)

    Yeah.

  27. Dunc says

    This reminds me that I still need to read Joseph Tainter’s The Collapse of Complex Societies… As I understand it, his core thesis is that societies tend to solve problems by increasing complexity, but complexity (like most things) suffers from the problem of diminishing returns, until it eventually has negative returns, at which point any further attempt to solve problems by increasing complexity only makes the problems worse. Sound familiar?

  28. says

    John Morales @#23

    Sufficiently good. If it weren’t, people would not use it.

    Bullshit. I’m using Windows not because it’s sufficiently good, but only because there are no other better alternatives. I hate Windows. Just like I hate Apple products too. I’d sort of prefer Ubuntu Linux, but CorelDRAW doesn’t run on it.

  29. says

    Windows is interesting. I gather that baseline Windows on conservative hardware is excellent and stable.

    Windows supports an insane matrix of both software back compatibility and hardware diversity, which means they’re letting an huge army of people, some of whom are yahoos, into the control room, with more or less inevitable results. It is a stellar example of what happens when you have a vast array of conflicting requirements.

    It’s probably a pretty good metaphor for politics, if you squint.

  30. says

    Andrew Molitor@#35:
    Windows is interesting. I gather that baseline Windows on conservative hardware is excellent and stable.

    It is now – after being rewritten several times. What I’m puzzling over is that basically everyone took a look at Windows 3 and said, “yes, that is our future.” After experiencing DOS 1-6, they managed to conclude that.

    Most of what we are dealing with regarding Windows since 1994 or so is the sunk cost fallacy. “Sure, it’s shit but if we work hard on it we can make it adequate”

    This is not simply a theoretician’s complaint. One can easily argue that the cost, in terms of software complexity due to the various Windows APIs, has raised the global cost of code to the point where it’s long past the opposite of paying for itself. The multibillion-dollar security industry, and cloud computing, and system administration, are all “externalities” of the choice to continue propping up Windows.

    I had one consulting client that used Windows as an embedded O/S, which meant that they paid $50/node for some tool that made Windows behave like a headless O/S, then they spent $40 on some antivirus and patch management thing, then they spent $100 on Windows, and $100 on Faronics’ Deep Freeze. By the time all was said and done, the cost of making Windows a semi-tolerable piece of shit dwarfed the cost of the hardware, and it was an ongoing expense to maintain it! Meanwhile, you and I both know old farts who have been running servers on BSD that have been stable and un-fucked-with for years. My old BSD server finally got too much mouse poop in it to boot any more (I think mice at the SATA cable) after 10 years of flawless operation.

    The IT industry ignores operational cost, which is odd, because capex and opex is why they are moving to the cloud. The cloud is all about system administration costs and the cost of using garbage operating systems. It’s as if, after 20 years of struggling with Windows, a few people at Amazon finally “got it” I don’t have enough faint praises to damn AWS with, right now, so I’ll stop there.

  31. cvoinescu says

    The “sunk cost” is not a small obstacle. There’s so much of it! Take the bold decision to start over, and you run into second-system syndrome (Netscape). Keep focused to avoid that, then the sunk cost of your users kicks in, and you never fully transition (Python 3, sadly). And I think you can have both second-system syndrome and zombie-first-system syndrome (IPv6).

    It seems that the knee-jerk response to API complexity is wrappers. Abstraction layer over abstraction layer, often not abstracting the warts very well — so you still need to know how every level behaves to build anything non-trivial (and especially to debug it). When mere wrappers proved not abstract enough, we brought in the virtual machines. I think I’m being conservative when I say that that did not improve things overall.

    Then there’s API change. I kind of lived under a rock for a while, but when I came out and looked around, it seemed we gave up trying to be sensible about that, and just wrote tools to make every piece of software appear to have the entire system to itself, set up with just the right versions of everything. Also, there’s now a thick coat of marketing on a lot of tools and libraries; things that used be plain no more than a decade ago. I am still coming to terms with these changes. Again, it doesn’t look like the good kind of progress.

  32. xohjoh2n says

    @37

    Sigh. You appear to be describing my life. And when “You’re overthinking things. How hard can it be? We’ll just make it easy by using *this* abstraction.” turns out not to be a valid solution to the real world, who gets to clean up the mess…?

  33. cafebabe says

    Wow, nothing like a beat-up on software quality to bring out the comments!

    So time for a mea culpa. In my time I taught innumerable students “unchecked speculative execution is safe, provided you destroy improperly fetched data”. This turns out to not be entirely correct(tm).
    Or how about “managed execution systems eliminate whole classes of vulnerabilities based on type errors”. Wrong again!
    As a penance I spent the last years of my working life on static analysis of libraries for both main managed execution platforms, trying to find and patch bugs faster than the black hats found and exploited the same bugs. Neither side has won that war.

  34. call me mark says

    Nitpick:

    software vendors are busy adding new bugs as fast as the old ones can be fixed

    s/as fast as/much faster than/
    ;-)

  35. dangerousbeans says

    It’ll be kind of interesting to see how bad this mess gets and how it all collapses
    It all looks pretty unstable in the long term

  36. Some Old Programmer says

    For those that want a horrifying glimpse of the amount of non-OS software that your PC is executing, I recommend you take a look at the article “Open Source Firmware” in the 2019.10 issue of the Communications of the ACM.

Leave a Reply