When I was done interviewing Tom Van Vleck, he suggested I might want to read Mechanizing Proof, by Donald MacKenzie. [wc] Which, I did.
It’s a history of how computer scientists and programmers have confronted the problem of software reliability. Since it’s a history, it doesn’t take “sides” in particular; instead we learn about the various currents of thought and the factions and disagreements, who, how, and why. Consequently, I found it a bit unsatisfying at the end, but that’s not the author’s fault – dissatisfaction is a familiar feeling for anyone who starts thinking about software reliability.
When it comes to computer security, software reliability is the innermost circle of hell. None of the outer circles are exactly fun, but it just gets worse and worse the more you look at it. As Tom Van Vleck mentioned in the interview, hardware was not as reliable back in the MULTICS days: it was not out of the question that a value you stored in a place in memory might occasionally change unexpectedly. Memory, after all, was implemented with little magnetic rings being moved about by varying charges on wires that ran between them. When the ancient programmers wrote software, they were writing stuff that had to run on an unreliable system. Computer security programmers are “programming Satan’s computer” – it’s not just unreliable it’s out to get you. That value you stored may be the return address from a function call, and a hacker may be trying to stuff a new value into that memory, in order to make your program do something it was not intended to do.
In other words, computer security programmers have to worry about every possible unintended side-effect – which tremendously magnifies the complexity of things that you’re worrying about as you write code. And, as we know already, writing complex code is hard. Writing secure code is harder than hard. Yet, we continue to want to write code that can function reliably in a landscape full of hostile actions. I believe it was around 1989 when I remarked to one of my co-workers, “whoever invented computer networking shot us all in the foot.” That was around the time I was realizing that what we’d now call “attack surface” (I called it “zone of risk”) was a problem. Here’s what’s crazy, to me: it’s still a problem but we don’t even know how to measure it. Which, per Lord Kelvin, means we don’t know what it is. Pause and let that sink in for a minute: we are running risks that we don’t even have a model for.
That’s not a hypothetical. There are several documented incidents in which software flaws nearly resulted in full-exchange nuclear war between the US and USSR. That was a result of normal glitches, nothing as fancy as some hacker deliberately fiddling with the communications. That would be unthinkable, right? No, it’s US government policy – and apparently Chinese, Russian, North Korean, and Israeli government policy, too. And then there are the people who do that stuff for fun.
Mechanizing Proof is about the way computing responded to this problem. It tracks through the realization that there was a problem, and then through some of the responses to it: formal design, formal verification, provably secure design, etc. MacKenzie is straddling the interface between software development and philosophy of science – how do we know things? – and, like philosophy in general the discipline of software as science falls into a scorched-earth battle of epistemology and skepticism. It’s fascinating, to me, to read this stuff but someone who’s not interested in software will probably want to steer away from this book. Computer programmers, like myself, who have a nodding familiarity with the corrosive effects of radical skepticism, will find a lot of this to be eerily familiar. Computer programmers who lived through the era of “Structured Design” and “General Systems Theory” (1985-1990) will know, as we knew then, that there was a lot of time wasted.There are a lot of fascinating historical events covered in the book, such as the discovery of the Intel X86 CPU’s tendency to mis-calculate floating point math when you got to around the 8th decimal place. The error was so small that it remained undetected for a long time, which meant that $400 million worth of CPUs were out, in computers, mis-calculating. It only mattered if you were doing floating point, but it said a lot about how significant a mistake could be.
The book is a little less humanizing than something like The Soul Of A New Machine [wc] – it’s an event-driven and time-line driven story, not a character-driven one. That’s OK, but some of the events MacKenzie writes about could have used a more dramatic presentation. I think my favorite part [mild spoiler alert!] is his description of a meeting that took place at Garmisch, in October 1968. It was where the second generation of computer scientists got together and collectively said, “we have a problem.” MacKenzie somewhat rotates his entire history around that fulcrum. If he were a writer like Siddartha Mukherjee or James Gleick he would have dramatized that meeting as the focus of the book, but he doesn’t. Probably, because, in real life, it wasn’t that important. But it was important:
The breadth and depth of the problems with software development were the focus of what was to become one of the most famous of all computer science conferences, the meeting on “software engineering” held in October, 1968 at Garmish-Partenkirchen, a resort nestling beneath the mountains of the Bavarian Alps. The idea for the conference had come most centrally from a leading German computer scientist, Professor Fritz Bauer of the Technische Hochschule München. Bauer was not the only one to coin the term “software engineering” – Douglas Ross of the MIT Servomechanisms Laboratory proposed a course on the subject in January, 1968 – but the choice of the term as the conference’s title greatly increased the salience of the idea that, in Bauer’s words:
The time has arrived to switch from home-made software to manufactured software, from tinkering to engineering – twelve years after a similar transition has taken place in the computer hardware field … it is high time to take this step. The use of computer has reached the stage that software production … has become a bottleneck for further expansion.
The context was that a number of significant software programs were being attempted (operating systems such as IBM’s OS/360) and nothing was working: the more programmers that were thrown at a problem, the longer it took. Software was becoming a money-sucking black hole of despair. Programmers working on OS/360 were literally losing their minds.
Buxton recalled the tone being set by frank admissions in an after-dinner talk on the first evening by Robert S. Barton, who had played a leading role in the development of Burroughs’ innovative B-5000 computer. Barton told participants, “that we were all guilty, essentially, of concealing the fact that big pieces of software were increasingly disaster areas.” The confessional atmosphere – “everybody else just chimed in and said ‘Yes!'” – created an unusual degree of rapport and openness.
To some of those involved, the situation was bad enough to merit being called a “software crisis.” Not all agreed. Some Garmisch participants pointed to the large number of computer installations that appeared to work perfectly satisfactorily. K. Kolence, of Boole and Babbage commented:
I do not like the word “crisis.” It’s a very emotional word. The basic problem is that certain classes of system are placing demands on us which are beyond our capabilities and our theories and methods of design and production at this time. There are many areas where there is no such thing as a crisis – sort routines, payroll applications, for example. It is large systems that are encountering great difficulties. We should not expect the production of such systems to be easy.
MIT’s Douglas Ross countered:
It make no difference if my legs, arms, brain, and digestive track are in fine working condition if I am at the moment suffering from a heart attack. I am still very much in a crisis.
When I was a young pup at Trusted Information Systems (a company which crops up significantly in the book) I was coding an “internet firewall” and found myself challenged by some of the characters who crop up in the book. One project that was spun up at TIS was a formal verification attempt to analyze my firewall’s code. As team lead/lead engineer/chief cook/bottle-washer for the firewall product, I supported a group that attempted to prove – I still don’t know what – about my code-base. It was a fascinating experience because, as I realized, there were 10 people who didn’t know how to program at all – let alone do network programming on UNIX in C – and they were trying to analyze my code. At the time I was privately derisive. When I later learned that the results of the analysis had been classified, and ran to hundreds of pages, I stopped pretending any respect for that process.
The rise and fall of formal methods for software correctness is another central story arc in the book. It boiled down to a problem that I encountered, myself, in those days: why take 20 pages to ‘prove’ 3 lines of code? Especially when the code is its own documentation? I remember saying, “I don’t care if you can’t understand it – I can, and the compiler can.” What was being built was a parallel edifice, also complex notation assembled by humans – so it was also prone to errors. That dynamic became and remained the rock on which formal methods ran onto.
I enjoyed the book. I can’t recommend it unreservedly because it’s appeal will tend to be narrower than a popularized treatment would be. For an “insider” it’s fascinating stuff.