Lost In The Clouds


Back when I worked in security, I regularly encountered things that just left me shaking my head, “why would anyone want to do this?” It made me feel increasingly distanced and out of touch with the industry/community, as the decision-making herd went thundering off over the horizon, ignoring the sign that said “cliff.”

vendors assisting a cloud computing deployment

Initially, cloud computing and outsourcing got their impetus as a result of federal tax guidelines and had nothing to do with technological merit. The math is bogus and always was bogus: if a company wanted to have a lot of storage, they had to incur a capital expense (buying hard drives, racks, computers, etc) and then staffing expenses (system administrators to keep it all working and IT specialists to manage the data) – capital expenses and staffing expenses are realized as direct expenses, whereas renting something is not. If a company buys a computer, they can declare the cost of the computer as an expense, annualized as it depreciates. That is a pain in the neck to track and, starting in the 90’s, boards of directors would ask dumb questions like: “we are a porn manufacturer, what does operating a storage array have to do with our business purpose, which is producing and selling porn?” A sensible CTO would reply: “well, porn is data, boss, and being good at managing data is a crucial part of our business.” That didn’t happen, much, because a lot of the time management had already decided to outsource non-essential stuff. The problem is, they didn’t/don’t understand what “essential” means.

A company is reasonable (though perhaps not wise) to not buy real estate, preferring instead to lease office space. But, as anyone who has ever purchased real estate knows, you have to navigate a lot of trade-offs between the fact that your lease payments are basically money down the drain, set against the advantages of not having to maintain a building, and the tax complexities of what happens if the building’s value increases or decreases substantially. Another crucial factor is that leasing dramatically shortens deployment time; building an office building takes years whereas signing a lease takes weeks. So, in corporate terms there is “opportunity cost” – what opportunities are lost in the time that it takes to build the new headquarters? There are also “business risks” to owning stuff: what if it winds up underwater in a flood? Or what if something is wrong with it? Renting is more flexible than buying because you can push certain risks off onto the property-owner (in return for which they get the benefit of owning the property long-term) Renting properties for businesses is a fairly mature/refined market, by which I mean that a lot of possible optimizations have already been made, which is why you should be extremely skeptical of something like WeWork.com, that claims to completely invert the balance by use of some capitalist magic. Let me put that another way: you can tell that Uber is screwing its drivers, because otherwise it wouldn’t be able to compete with conventional taxi companies, that maintain a fleet. In principle, maintaining a fleet is also an optimized process and there’s no way that a cloud of independent third party contractors can maintain a cloud fleet of taxis for less, unless there are corners being cut.

That’s one of the key points I want you to keep in the back of your mind as you read this rant: profits are squeezed out of mature systems mostly by cutting corners.

There is another way to squeeze profits, and that’s by aggregating capabilities. Imagine if Uber could somehow have one meta-driver drive 10,000 cars. Suddenly, one of the big costs in the system can be factored down dramatically, and they can undercut everyone else’s pricing by passing the cost savings on to the customer (in whole or in part).

Let’s get back to cloud computing, specifically, though all of this stuff is about cloud computing, too.

Commentariat(tm) Member Dunc made a comment [stderr] to the effect:

Surely if there’s anybody in the world that can and should maintain their own dedicated IT infrastructure, it’s the military?

Exactly; you would think that if you saw IT as something the military does that is crucial to its mission. But the prevailing trend is to interpret “mission critical” very loosely, so that virtually nothing is actually mission critical if it can possibly be done by an outsourcer. That’s why the US military pays nosebleed amounts of money for civilian organizations to do things like:

  • Operate and maintain air conditioning systems at bases
  • Operate and manage
  • Write software
  • Build and manage the Security Operations Center for the US Marine Corps’ data network
  • Move food around
  • Design and build F-35s
  • Write and maintain operating systems (Microsoft)
  • Design and maintain a CPU that can host software
  • Make guns

They pay top dollar because, in principle, there is a risk to the providers that the military become their only customer and then suddenly change its mind or stop having wars. Back in the day companies like Lockheed independently took it upon themselves to develop new fighter aircraft, then would sell them or the design rights – like a fairly normal business transaction – to the government. The business risks were assumed by the company, and if you had a good product, like a Lockheed P-38 Lightning fighter, you made a hellacious amount of money. Building advanced systems, for a while, began to look like Hollywood: you had to come up with a mixture of blockbusters and flops or you were out of business. And, like Hollywood, the vendors began to game the system by trying to come up with no-risk propositions (which is why Hollywood makes so many “shitty guys in spandex punching eachother movies”). The defense contractors pushed the process to the point where there is no risk for them; it’s all the taxpayers’ risk.

And that’s why the military doesn’t run its own IT infrastructure: they are allowed to fail.

It’s implicit in the idea of letting someone else do a dangerous, nasty job, “for less than I could do it myself” that they won’t do it as well as I’d do it myself – because, otherwise, I’d do it myself.

I know guys who can set up petabyte storage arrays, and the only way that the government could affordably gain access to talent like that is to rent it from Oracle or Amazon or Microsoft or to spend a lot of money hiring people who don’t carry guns and who don’t drop bombs on Medcins Sans Frontieres hospitals. The military is absolutely not constructed to work like that (though it has since its inception) with a sort of class division between the warrior caste and the worker caste. Or the “war fighters” and the REMFs*; when I was in in the 80s there were culture clashes between civilian employees, who ran supply depots and maintained buildings, and the blockheads who carried rifles and thought they were Sun Tzu and Genghis Khan rolled together because they were 19 and had too much testosterone coursing in their veins.

Virtually everything that a business does can be peeled off as non-essential and outsourced. Except for the executives, marketing assholes and HR, who are always convinced they are completely essential. That view is why I kept, during the course of my security career, blowing up at executives who’d say complacent things like: “our business is about gathering customers’ credit ratings, not operating storage arrays, so naturally we’re going to outsource the storage array part.” I’d be shaking my head frantically to clear it, and trying to explain that you can’t just separate stuff like that; there has to be interaction. And that “interaction” is called “governance.” It’s the part where you oversee things and make sure they are not going horribly wrong. You know, like the way the F-35 program has stayed on cost and on schedule because of snappy, perfectionistic, government oversight from the pentagon.

war clouds

I’m still ranting about cloud computing, but it probably appears like I am talking all around the topic. For example: changing the oil in your car. I haven’t changed the oil in my car since the 1970s when I could barely afford a filter and oil, let alone pay the amortized aggregate cost for a garage with a hydraulic lift and insurance and a guy with a filter wrench. I know people who have never learned the technique of changing a car’s oil, and who have saved themselves some time and never experienced war motor oil dropping into an eyeball. Those people have made a winning bet as long as Jiffy Lube never goes away, and as long as Jiffy Lube’s technology is fairly interchangeable. This is where the “governance” kicks in: unless you’re stupid, you don’t trust Jiffy Lube 100% and you watch them change the oil and you understand what is happening and you put it in “Park” and get out and do a walk-around before you go out of the lot, to make sure nothing is dripping. “Governance” is the part of the process where you make sure your contractors are doing a good job. It’s your responsibility to make sure that contractors are doing a good job, or you’ve completely given up on a process. Back to data storage: it’s someone’s responsibility somewhere in the JEDI project to do some rational thinking about intangible other things rather than cost/CPU/hr and cost/terabyte/month. There is reliability, guarantees of reliability, mean time between failure, migration cost, lock-in, and – yes – even IT security.

You have to picture me climbing on some CTO’s desk yelling, “what about ‘being bad at IT security’ is part of a CTO’s portfolio?” I have, literally, encountered CTOs who say things like “we are not a software company, so we don’t do any development.” Oh, really? Hey, dipshit, every configuration file in every switch on your network and every Oracle form and every database row in the HR database you host at ADP is software and you’re responsible for it. It may not be written in BASIC or C++ but it’s written in Cisco IOS or Kubernetes configuration mojo, or whatever. I’d usually end up conciliatory, by pointing out that lock-in is a more severe threat than almost anything to do with IT security, so they should worry about that, more. Then they’d say they didn’t have that problem and I’d sneer, “OH, REALLY? Why don’t you ask your non-software non-developers whether they are using only portable features of AWS, or if they have been drinking deeply of the extra features that come with the platform but which aren’t available on any other cloud service?”

In its most basic form, cloud computing allows you to transfer some risks around, and that’s it. It saves you money if you choose well and can aggregate services with other customers and avoid lock-in, and it allows you to fire those pesky system administrators who used to manage your storage array. Instead of capital expenses for salary and desks and hard drives, you can pay more for something you don’t own which is slower and out of your direct control. Before you outsourced to the cloud you didn’t have a visible governance problem (most IT executives do not wander down to the system administrators on a regular basis and ask, “how’s the backup and recovery plan?”) (which is why they get whacked by ransomware) governance is implicit and it’s the corner that gets cut when you’re working in-house. When you move the stuff outside the walls, you no longer have to worry about the system administrators’ HR issues – you have to worry about your cloud providers’ stability, etc. What I observe is that most organizations’ response to taking on governance for cloud systems is to look at the price and say, “OK where do I sign?”

The modern military loves to talk about how ‘cyber’ is a new battle-front. At the same time, they want to put all their data in Amazon’s data warehouse, or Microsoft’s, or bob who has a storage array in his garage’s. They want to say “we don’t develop software” and not have someone say, “then you’re not a fucking ‘cyber warrior’ you’re a button-clicker I could replace with a 2 line shell script.” Imagine if WWII-era tank commanders had expected to simply hop into a fresh Sherman and charge the Germans without knowing the first thing about how the tank’s engine tended to fail and how to repair it, or any tank operations more advanced than pointing the gun, pulling the wossname until it goes BOOM, and shoving the doodads around so it goes back and forth mostly forth. In this matter, I feel like one of the kenjutsu instructors of the samurai of old, who was trying to get them to realize that you do not cut your enemy with your sword, you cut them with your entire life.

------ divider ------

* Rear-Echelon Mother Fucker

Re: that closing line. I really do believe that: the well-rounded warrior (or expert in anything) begins to operate at a strategic level, whereas the technician who only learns a minimum will never surpass being a tactician. There’s nothing wrong with being purely a tactician (it worked fine for Marbot) but it means that some avenues are closed off to you. Battlefield command can be re-factored as a problem in outsourcing to various contractors and sub-contractors, and governance (command). In that framework, it’s suspicious that most armies that have succeeded have not been outsourced – though it’s worked out OK for the outsourcers (ask the Swiss landsknechts)!

… and now you can see why I am so glad to be out of IT security. I was in the field too long and my perspective became so distorted from the current practices that all I wanted to do was scream at people while shaking them with my hands around their throats.

Have you ever argued hard drive performance with a cloud-head? It’s surreal. They’ll tell you about how fast AWS’ drive arrays are and then they don’t understand you when you point out that The Internet, which is between them and their data, has pretty high latency compared to a fucking SATA CABLE.

Comments

  1. says

    In computing it has always been a truism that if they’re centralized their computing, you sell the decentralized computing. If they’re decentralized, you sell the centralization.

    The cloud pitch is just the centralization pitch turned up to 11. What I don’t see is how the pendulum backswing is gonna work. I think amazon may have done it, you can’t get your shit back out because you have too much shit and they won’t rent you the semitrailers full of disks for the other direction (will they?)

    Some businesses are going to realize that all that Big Data shit isn’t actually accomplishing anything and decentralize again on schedule, but that’s going to be a bitter pill across most of industry.

  2. Sunday Afternoon says

    I work for one of the major HDD companies and our IT folks, to use Marcus’s metaphor, jumped off the cliff some time ago and took our computational physics group with them into AWS.

    I would like to point out one huge benefit for our cpu-based simulation codes (gpu based is another story) – the ability to scale to an impractical local cluster size. Rather than keep a narrow local cluster constantly filled with tasks with a completion time from defining a question measured in weeks, we now have the luxury of scaling to a wide cluster basically on-demand and a completion time measured in days.

  3. says

    #2 — yeah, the point as I see it is that the rent vs. buy question is pretty well worked out. In a lot of cases you can just plug in your parameters and the answer pops out, and it definitely says RENT sometimes.

    Avis has a pretty nice business built around servicing a category of problems for which RENT is the right answer.

    AWS makes a great deal of money on problems for which the answer is almost certainly BUY. Flickr, for instance, was recently acquired (again) and was moved to AWS by the new buyers, presumably because “running data centers isn’t our business” when, in fact, it is. SmugMug thinks of itself as something-something photography, but they’re basically wrong. Coping with large numbers of photographs *is* a data problem. They’re giving all their profits to Bezos who is, literally, shooting them in to space.

  4. says

    Andrew Molitor@#3:
    Avis has a pretty nice business built around servicing a category of problems for which RENT is the right answer.

    That’s a great example. There are a large number of customers who need a car in a place for a while, but not long enough to justify buying or leasing one. Back when I travelled a lot, I spent a lot on cabs and rental cars and, even then, I was trying to weigh my options: a rental might cost $120 for 2-3 days but how much would I pay taxis/ride shares if I didn’t have a car, or could I walk? (I spent a lot of time in places like LA or Dallas, where walking is not an option and may actually be illegal) These are decisions that can be made fairly rationally, but that’s only because we think we understand the factors behind the decision pretty well.

    I don’t think a lot of IT executives really ask “do we actually need near-infinite elastic storage?” and I know for a fact that a lot of design teams never ask about the breakdown between local bandwidth and remote and the read/write mix and whether things can be cached, and whether concurrency is required. Back in the early 00’s I built a website where we did a distributed data system based on 4 completely different servers (1: main, 2: secondary, 3: off-site hot backup, 4: complete archive) and since concurrency requirements were not strict, the main copied updates to the secondary and then queued a copy to the off site backup, and the secondary copied updates to the main and then queued a copy to the off site backup, and the off site backup queued a copy to the archive. Some of the transactions were near-concurrent and some were batch and we accepted that it might take “hours” for an update but in practice the whole system came out near real-time because networks are faster now than I thought. The queue manager and concurrency control system operated by analyzing and propagating transactions by following the system log, so it was possible to reconstruct a failed system by playing the syslogs of the off-site backup machine at it. Anyhow; that’s not the point. The point is that it seems to be much easier to spend $10,000/month with Amazon than to own a software architecture that is not predicated on having ultra-scalable hardware all the time. I saw this dynamic back when I was at DEC: we had some customers who would cheerfully spend a gigantic amount of money on a fault-tolerant VAX instead of writing their software to be fail-tolerant and recoverable. It was the same kind of thing. They’d go “we need two FT VAXes! Ring ’em up!” and spend $500,000 to solve a problem that could be fixed with a caching controller and some dual-ported IPI disks and a pair of $7500 DECsystems.

    That circles me around to the “big data” problem and I’ll probably have to comment on that someday, but it’s the same thing: people will spend gigantic amounts of money to be able to find stuff in their data – except they’re not willing to pay the cost of actually understanding their data and figuring out what’s worth finding and what’s not. Same problem: some stuff you want on a fast machine with a database, some of it in an inverted index, and some of it in an archive. But, which? “Screw that I don’t know, let’s get a data lake!” says the CTO.

  5. says

    Sunday Afternoon@#2:
    I would like to point out one huge benefit for our cpu-based simulation codes (gpu based is another story) – the ability to scale to an impractical local cluster size. Rather than keep a narrow local cluster constantly filled with tasks with a completion time from defining a question measured in weeks, we now have the luxury of scaling to a wide cluster basically on-demand and a completion time measured in days.

    There are definitely some work-loads for which cloud computing is great. If I needed to build render-farms and the rendering was bursty, it’d be great to be able to bring a bunch more engines online for a while and just forward the bill to someone. Storage can be bursty, too, but tends to be a bit slower in terms of change rate.

    Back in 2002 (I think it was) Bill LeFevre from CNN did a talk at USENIX about how they scaled their web server farms on 9/11. It involved taking a company credit card, getting in a convoy of cars, and buying everything at CompUSA that could boot FreeBSD then stacking it up in the hallway, hooking it up to the core switches, and dd’ing over a file system image onto the root partition and /usr. (Not even worrying if the partition table made sense; it just had to boot) That was a fun talk; there were pictures. [usenix]

  6. Dunc says

    “Big data”: optimising the problem of finding needles in haystacks by building the biggest haystack you can. (Not always, obviously there are many legitimate big data problems, but shovelling ads is probably not one of them.)

  7. says

    Dunc@#7:
    “Big data”: optimising the problem of finding needles in haystacks by building the biggest haystack you can.

    I see you are familiar with this topic. ;)

  8. Allison says

    My company has moved pretty much all of its enterprise stuff to cloud computing, with the result that all the applications that have to access the cloud are 5 to 10 times slower than before. It routinely takes 30 seconds or more to update a page (i.e., a view to a database record), whereas before it might have taken 5 seconds on a bad day.

    Plus, everything goes over the Internet, so we’re dependent upon not only the corporate networks working, but our gateways to the Internet (and on the part of the Internet that our data travels though.) The administration of which has all been out-sourced (and they just switched to a different outsourcing company this year.)

    Finally, despite my company making all of us sit through a 30-minute video training course on security, they’ve outsourced the security of pretty much all of our documents (including sales docs, corporate strategy, etc.) to MicroSoft. Whose competence in computer security is — um — “legendary.”

    I fail to see how this is an improvement, or even improves the bottom line. I can only conclude that it’s another case of top executives chasing after the latest management fad, wanting to seem “modern” and “up-to-date.”

  9. dangerousbeans says

    hey want to say “we don’t develop software” and not have someone say, “then you’re not a fucking ‘cyber warrior’ you’re a button-clicker I could replace with a 2 line shell script.”

    Billion dollar script kiddies with nukes

  10. dashdsrdash says

    Or, to shorten this all down to a phrase I use whenever talking about [clouds, off-shore development…]: If you can outsource your core competency, you don’t have a viable business model.

    Then people say stupid things like “managing databases isn’t our core competency” and I glare at them for a bit and ask them how much of their business they would have left if their databases evaporated tonight and the restitution was a refund of the charges they paid for the last month?

  11. Dunc says

    But database management is hard! [pouts]

    On the other hand, I do think there’s a perfectly decent argument that many core IT functions should be more like utilities, at least for most people… Most people can’t run their business without electricity either, but they don’t generate their own, and their reliability would almost certainly get worse if they tried.. It’s the data that matters, not the storage, and most people’s DR is… disastrous.

    Very often, the choice is not between outsourcing or doing the job properly yourself, it’s between outsourcing and hopelessly half-arsing the job yourself. There’s a lot to be said for being able to recognise when you’re out of your depth and you need to pay somebody else to do it right, and that’s as true of IT as it is of construction or medicine.

Leave a Reply