Computer systems have grown so complex and interconnected that it’s very difficult to identify a failure; back when I started things were simpler and were not as virtualized as they are, now.
Virtualizing everything increases complexity and complexity, per Charles Perrow [wc] becomes non-linear – eventually we have failures that we cannot understand or control because they are subtle interactions of dependent parts. Consider something like a modern internet connection: it appears to be a wired connection because that’s what the most affordable routers on the market prefer to work with. But, actually, the wires are just an interface/signalling mechanism into a “cloud” of virtual wires. I used to have a T-1 line into my house (the only broadband option at the time in Verizon Country) and it ‘came’ from Verizon, but I knew it was actually Covad provisioning the service. The wires were actually copper carrying data back to a box, somewhere, which pretended to have copper coming out the other side but which actually packetized/repacketized the data onto fiber. Of course the fiber was also virtual – layers of software telling higher-level abstractions “there is a fiberoptic cable here” when actually there were more layers of virtualization creating virtual fiberoptic cables, etc.
Here’s what’s interesting: when layered virtual systems go down, because they are simulating a single thing, the way you diagnose a problem is by burrowing down into layers of virtualization until you find something that affects everything within the same scope. So, if my home line is down, and everyone in my area who’s using the same virtual layer is down, then the problem is at or below that virtual layer.
Software works the same way: in the early days, you ran your program on a computer. Then, there were virtual machines starting in the 1970s, where each “process” on a multitasking system thought that it was running on the bare metal, but it was running in a software simulation of its own address space and its own bare metal. Then there were virtual machines (starting in the 90s) where you had a computer running a bunch of processes each of which thought it was a bare metal computer, and which ran its own operating system which, in turn, runs its own processes. Now, with cloud networks and software-defined networks, you can have a system that’s running a bunch of processes that think they are a cluster of separate computers connected over a network. It’s a very powerful capability since you can have all the computers and networks you like – so long as nobody pulls the power cable of the one computer on which all the simulated computers and networks are running.
I’ve seen some pretty crazy things happen, in the virtual world. One time, a systems administrator accidentally re-initialized the wrong partition because he thought he was running in a virtual machine and was initializing the virtual machine’s storage – in fact, he re-initialized the storage in the host operating system in which all the virtual machines were running. Normally, that would not be a big deal, except that the cluster of virtual machines included some important production servers as well as test/development systems for software engineers. That’s why Charles Perrow and graybeard computer programmers say “do not mix development and production systems.” That piece of advice is ignored constantly by each successive wave of system engineers. They learn. If they are lucky, they learn from watching horrible things happen to their peers.
Unfortunately, the technological fixes have frequently only enabled those who run the commercial airlines, the general aviation community, and the military to run greater risks in search of increased performance. – Charles Perrow
In terms of complexity in computing, this equates to “virtualization and software-defined networks make it easier to build things you cannot comprehend.”
When you read about cloud personal information breaches, the breaches are almost always, at the core, a result of mis-managed complexity: someone built something that they could not understand. As Ken Thompson once said:
Debugging code is harder that writing it. So, if you are writing code that is close to the edge of your ability, you are writing code that you cannot fix.
Monday the FBI came to install a new wire-tap on my house, and in the process of doing that, they cycled the power. When everything came back up, my internet didn’t work properly any more – between the lot of them, my home router, can-tenna, desktop system, and Verizon’s LTE cloud somehow managed to decree that IPv6 traffic works, but IPv4 traffic does not. So I called Verizon’s tech support and explained the whole thing to them, and all they can do is replace my router. I’m 99.99% sure the problem is something in the virtualization layers that make LTE work (perhaps my IPv6 address is OK but my IPv4 address has been dual-allocated) but the system is so complex that first-tier support doesn’t understand much beyond “cycle power” and if that doesn’t work “replace the thingies.”
Back in 1987 I joked that the future of computing would be that we would have a special mode in our car where the steering wheel could adopt the channel-change function (very convenient!) and everyone would be weaving all over the road between lanes as they channel-hopped. I realize now that I was an optimist, then.
When you hear transhumanists talking about “uploading” to a computer, point out to them that humans suck at system administration and, as complexity continues to increase there will be bigger, more complex, outages. Imagine your soul is uploaded to some future AWS-like cloud service and some junior systems administrator re-initializes you and a whole planet-ful of avatars, because they were too lazy to enforce a separation between production and development systems. Look at the personal information leaks that happen constantly: they will leak your soul, and hackers will be able to spin up instances of you, in dark cloud servers, to torture endlessly for their amusement.
The start-up we were trying to kick off, which did not get funded, was a simplifying management layer for “internet of things” devices. As a side-effect of its operation it solved a large number of provisioning and security problems (by changing the direction of communication from cloud->home to home->cloud) but the hardest part of doing it would have been getting device manufacturers to specify administrative operations. We had meetings with developers and their attitude was “that would constrain us.” They actually said (translating from developer into management-speak) “taking the time to understand what we are doing would constrain us.” Halfway through our fund-raising process, I was starting to get very depressed indeed.
As part of my column at Techtarget/SearchSecurity, I hunt down and interview interesting people in computing. The last one I did was Tom Van Vleck, who was a systems programmer on MIT’s MULTICS system. Tom has great experience and deep thoughts on system administration, reliability, and complexity (security – my field – is just a side-effect of getting those wrong) it was a great time and a delight to interview him. One of the things he said that I loved was that there was a specification for all the stuff that an operator might normally do to the system, and for each of those things, there was a program that did it, and a manual page for that program. Developers did not simply decide “I am going to back up the database now” and throw a bunch of commands at an interpreter – they would run the “back up the database” tool. That tool was responsible for knowing all the constraints under which it was expected to operate, and would only allow the operator to do things that they were supposed to be able to do. In other words: the system was fully specified. Nowadays when you talk to a programmer the discussion goes like:
Programmer: “I need administrative access to all the systems.”
Security Person: “Really? Why? To do what?”
Programmer: “All the things.”
Security Person: “What things?”
Programmer: “Whatever things I think up when I think them up.”
I have email to handle, and some documents to send to a client. So I thought “well I will just fire up my laptop, sync my email over from my desktop, queue up the messages, and hook my laptop to the personal hotspot in my iPhone and send them.” Except Windows’ file-browsing and sharing capability does not appear to work under IPv6 and since IPv4 is not functioning for some reason, I guess I’m going to have to move things on a USB drive. The miracle of virtual layering and cloud computing has reduced me to “sneakernet” in 2018. And I’m not unhappy because at least I know where my data is going.
New FBI wiretap: The guy said he was from the power company and was installing a new meter. How do I know?