The Three Little Pigs


Once upon a time, there was an old mother pig who had three little pigs and not enough food to feed them. So she hit on the idea of sending them off with a little capital (she kept 40% of the equity and a seat on the board) to go build successful websites and get rich and famous and support themselves.

The first little pig named his website something terribly witty (somethingterriblywitty-app.com) and outsourced all of the development. The pig went to a “programmers bid for work” site and spec’d out how the site should look, then asked on some web designer forums what was the most popular content management system and cloud hosting service, and pretty quickly had a “mash up” that was slick, functional enough for a public beta-test, and spent the rest of the capital on a marketing ramp-up with clever online ads and a catchy jingle.

The three little pigs sang “DevOps!”

The second little pig had some experience with operations and programming, and used WordPress for the content management, Apache httpd, MySQL for the database backend, and hosted it on a Linux instance out at AWS. The second pig knew that reliability was paramount, but so was security, so they read a bunch of “how to” about setting up the iptables rules in their AMI instance, and configured SSH for their login using a really long password, and figured out how to ‘chroot’ the webserver. By the time the second little pig had done all that, there was a lot of swearing about config files, different versions of online documentation, and testing and the first little pig and the second little pig went out for pho and beer and high fived eachother. “Market Penetration!” yelled the first little pig, and “DevOps!” cheered the second little pig.

The third little pig sat down and did a detailed design, identifying which components of the system had to be reliable, which had to be attack-resistant, and which parts had to be redundant. Then, they separated their database into two instances – one of which had relatively static and unimportant data that could be cached in the front edge system, and the other of which held credit cards, password hashes, and sensitive data. Between the two databases, the third little pig put a proxy running on a very carefully configured system; the proxy only knew how to do specific operations with the back-end database, and was full of error-checking and tamper-detection routines. The system the back-end database ran on was behind a layer-7 firewall that logged traffic and generated alarms if it saw any traffic aimed at the back-end that didn’t come through the proxy.

It was a lot of work, and the other two little pigs occasionally stopped by where the third little pig was grunting away, and said the third little pig was “paranoid.” But the third little pig explained that, if they set the system up with a good enough design, it ought to be able to weather a great deal of attacks without requiring any intervention; the third little pig said that they wanted to set it up and then they could spend the rest of their career doing valuable work instead of messing with website issues. Finally, the third little pig decided to minimize the system build on the site, and went with a dramatically stripped-down version of the operating system, running on a little set of SSDs in a RAID5, and configured the SELinux Mandatory Access Control (MAC) layer on the front-end and proxy machines so that the web server process had MAC separation governing its filesystem access, and was incapable of spawning any processes except the components that made up the content management system, and couldn’t issue network connections except to the database (via shared memory, also governed by MAC) or the proxy (ditto).

Figuring this all out took the third little pig several weeks of research and poking but by the time it was all designed, documented, built, and regression-tested, the third pig had gained a solid understanding of the tools they were using, and their capabilities, as well as failure modes and what they looked like. Finally, the third little pig joined the other two, who were trying to raise an ‘A’ round on Sand Hill Rd and who were bragging about their large user-base and the slick interfaces they had. Both the first and second pig were planning to do something with Big Data, probably Machine Learning (AI) when they got a break from their hectic travel schedules on the lecture circuit.

Now, it happened that a Big Bad Wolf happened by the three pigs’ websites, and thought, “my, there’s gotta be a whole lot of personal data in there!” So he went to the first little pigs’ website and knocked on the door:

Wolf: “Let me in!”
First Little Pig: “Not by the hairs on my chinny chin chin!”

And the Big Bad Wolf replied, “Have you ever heard of an SQL injection attack?”

First Little Pig: “No, what?”

And the Big Bad Wolf took a few seconds to paw through the cut-and-paste PHP mashup code that the first little pig’s lowest bidder rental coders had used for the site, then thoroughly pwnzored the site and wolfed down the entire customer database. The first little pig was horrified and did a breach announcement, told the site’s users “sorry, we leaked your data” and filed Chapter 11 as he ran and hid with the second little pig.

The Big Bad Wolf sold all the customer data on the darkweb, and leaked a bunch of it (making it look like it was done by Russians) then sauntered down to the second little pig’s site.

Wolf: “Let me in!”
Second and First Little Pig: “Not by the hairs on our chinny chin chins!”

The Big Bad Wolf said, “OK, I’ve inventoried all the stuff you’re using. BRB.” The two pigs were mystified and laughed at the Big Bad Wolf’s departing backside, fist-bumping and high-fiving and cheering “DevOps!” and “Market Presence!” and generally having a good time.

Meanwhile, the Big Bad Wolf remembered what modules the second little pigs’ site depended on and – sure enough – along came a vulnerability in one of them (let’s call it Apache STRUTS) and BOOM! the Big Bad Wolf suddenly blitzpwn’d the second little pig’s site and leaked some user credentials. The next thing they knew, the first and second little pig’s meetings on Sand Hill Rd had all dried up and they weren’t getting invited to speak at any conferences – but they were spending a lot of time talking to lawyers. And the lawyers were decidedly unfriendly; in fact the lawyers were also wolves.

Finally, the Big Bad Wolf came up to the third little pig’s site and port-knocked:

Wolf: “Let me in!”

The third little pig didn’t say anything; he was working on another project for another company.

Wolf: “I’ll huff and I’ll puff and I’ll pwn your site!”

The Big Bad Wolf started PEEKing and POKEing around at the content management system, and – as soon as there was a flaw in one of its modules – he tried to exploit the flaw. Nothing happened; the exploit crashed because the content management system wasn’t allowed to run shell commands. This activity triggered an alarm on the third little pig’s operations console, so the third little pig pushed a button that blocked the IP address the Big Bad Wolf was attacking from. This enraged the wolf, who tried another attack, and another, and another. The wolf kept hypothesizing things (“maybe that damn pig is using SSH for remote administration!?”) (nope: point-to-point management from the back-end system only) and each hypothesis turned out to be a dead end. Every time the wolf tried something, it turned out that there was a rule, or another layer of blocks, designed right behind the first layer. It was maddening. Eventually, the wolf got bored and fucked right off.

In Witcher 3 Geralt gets to play Big Bad Wolf; that’s the 3rd pig’s house.

------ divider ------

In this story, Equifax is perhaps the second little pig. The third little pig would be a MULTICS system like DOCKMASTER.ARPA, which ran without a patch or an incident for decades, because: that was what it was designed to do. When you look at a lot of today’s systems, they are not designed to run without flaws, they are designed to be patched constantly. Not surprisingly, they need constant patches. At this point, I can only invoke Feynman:

The software is checked very carefully in a bottom-up fashion. First, each new line of code is checked, then sections of code or modules with special functions are verified. The scope is increased step by step until the new changes are incorporated into a complete system and checked. This complete output is considered the final product, newly released. But completely independently there is an independent verification group, that takes an adversary attitude to the software development group, and tests and verifies the software as if it were a customer of the delivered product. There is additional verification in using the new programs in simulators, etc. A discovery of an error during verification testing is considered very serious, and its origin studied very carefully to avoid such mistakes in the future. Such unexpected errors have been found only about six times in all the programming and program changing (for new or altered payloads) that has been done. The principle that is followed is that all the verification is not an aspect of program safety, it is merely a test of that safety, in a non-catastrophic verification. Flight safety is to be judged solely on how well the programs do in the verification tests. A failure here generates considerable concern.

That kind of good design is expensive. But: go ask Equifax how expensive it can be to use code that has no fallback/failure resistant design – especially if you forget to patch it. Even “forgetting to patch” is not the whole problem: a sufficiently motivated enemy just has to be a little faster with their attack than the target is with their patch. When Feynman was writing about the solid rocket boosters in the Challenger one of the points he makes is that:

If a bridge is built to withstand a certain load without the beams permanently deforming, cracking, or breaking, it may be designed for the materials used to actually stand up under three times the load. This “safety factor” is to allow for uncertain excesses of load, or unknown extra loads, or weaknesses in the material that might have unexpected flaws, etc. If now the expected load comes on to the new bridge and a crack appears in a beam, this is a failure of the design. There was no safety factor at all; even though the bridge did not actually collapse because the crack went only one-third of the way through the beam. The O-rings of the Solid Rocket Boosters were not designed to erode. Erosion was a clue that something was wrong. Erosion was not something from which safety can be inferred.

If you’ve caught yourself wondering why computer security is such a sucking pit of despair, it is because the kind of engineering discipline Feynman describes has been turned on its head with regard to computing: instead of expecting things to be good by design, we expect them to be bad and deployed. Then we expect to somehow be able to repeatedly bash them into something good. If we were approaching computing as a design discipline, we’d stop using ${whatever} as soon as ${whatever} was demonstrated to be poorly designed and poorly implemented. Of course, that’s unthinkable. But, why?

I’ve been to this dance many times in the course of my career. One client I discussed some system design with, had a problem that begged for the third little pig’s system: they had a database that had customer billing data, customer cryptographic keys, and customer inventory and contact data. As we discussed their system, it turned out that nobody ever had a legitimate reason to get at the cryptographic keys or billing data, but there were certain operations that were necessary that involved that information: has the customer paid up, and is the cryptographic key correct? My advice to them was to have multiple databases, with the really sensitive stuff more or less inaccessible behind what (in Trusted Computing land) is called a “trusted guard.” A trusted guard is a piece of carefully designed and coded software – in the case of the third little pig’s system I called it a “proxy” that can do only the minimal necessary operations. In other words, you might have a proxy that you can connect to and tell it a customer number and it will reply “paid up until ${date}” or “in arrears $####.##” The people who maintain the billing information can do that on the other side of that proxy, on a private network with its own protections. The same technique can be used for “is the cryptographic key correct?” You could have a proxy that you send the device-ID and a guess at a key. Obviously, that proxy would have all sorts of rate-limiters that would trip alarms if it started getting queries for devices that didn’t exist, or queries at an unusual rate.

That’s just an example, but whenever I’ve suggested that sort of thing to a client, they’ve said, “that all sounds really good but…” and then they just stuff everything, together, into a big SQL database with no separations between any of the tables. If you do a “select * from customers” you get their cryptographic keys, their billing information, their password hash, their contact information, etc. Does that sound familiar? It should, because whenever you hear about a data breach, that’s almost always exactly what happened. The system was:
a) important
b) written lazily
c) buggy

I have rarely seen this done right, but I have seen it, in some strange places. There was one retailer I did a design review of, and they had brilliantly separated their customer databases into a “web facing customer database” and a “sensitive customer database.” In their design, they didn’t have a proxy that made the linkage; they didn’t need one: they sat down and looked at it and realized that they almost never updated the sensitive customer data, and they had a fairly secure access network for the systems that performed those updates. So they implemented the sensitive customer database so that if there was a fragment of data that needed to go out to the web facing customer database, it got queued up and cross-posted out through a firewall.

If you think about it, you’ll see that once such a system is set up, it doesn’t cost anything more to operate; the web facing database is disposable – it’s an automatically-generated subset of the actual customer database. That also improved the transaction/load handling of the entire system – most of the queries were being done against the web-facing copy, read-only, so the real database (which got occasional updates) was less loaded most of the time. So, part of what’s going on here is that if you understand your data, and your data update model/query load, you can distribute the data to where the queries happen and make the whole thing a lot faster, for free, too. The reason I say that this was a “strange place” was because the retailer was small and the databases were smallish, but they had unusually thoughtful systems designers.

Probably the canonical example of doing this wrong is the RSA breach of 2011. [ars] The way that happened is a direct consequence of embarrassingly bad design decisions: they had a customer database that contained an inventory of all the keyfob-IDs that the customer had been issued, and the keyfob’s encryption key. The customer database also had things like point-of-contact, keyfob battery expiration, etc. Some brain rocketeer in RSA’s business side realized that if the sales force could get things like key expiration and contact data, they could make well-timed sales calls and sell more stuff. Rather than have a split database where the sales force got a subset of the customer database to play with, someone just gave the sales force the ability to query the whole database – then, the whole disaster was one piece of malware away.

-- divider --

The screenshot of Geralt blowing down the little piggies’ house is not mine. I actually re-ran that bit of the game over and over until I was able to correctly time a screen-grab in Ansel, and I have a pretty amazing 1.2gb screenshot of bricks and roof flying as the building explodes. Unfortunately, that’s on my server at home and I’m in Oregon and I do data separation on my network: there is some data I cannot access remotely, for exactly the kinds of reasons that this posting is about.

Comments

  1. jrkrideau says

    So most systems are designed in the spirit of the Titanic when one should be using the Great Eastern approach?

    In a way, this sort of thing reminds me of crooks I have known who are totally surprised when they are victims of similar crimes.

  2. says

    jrkrideau@#1:
    The Titanic was designed; the design failed because the damage to the ship exceeded the designer’s model (the exact opposite problem from Feynman’s example of “engineering overhead”) At least the designers of the Titanic tried to think of what could go wrong and prepare for it.

    To stick with the ship analogy, most sites are built, floated, and leave harbor. Then, as problems (of whatever sort) are encountered, they try to repair it while they are under way.

  3. Dunc says

    So most systems are designed in the spirit of the Titanic when one should be using the Great Eastern approach?

    Most systems aren’t really designed as such… They’re just sort of thrown together, then shaken up until they kinda look like they more-or-less work.

  4. Some Old Programmer says

    Dunc @4

    Most systems aren’t really designed as such… They’re just sort of thrown together, then shaken up until they kinda look like they more-or-less work.

    Then they bring in the consultants to fix the problems. I made a good living from this–I privately described my position as “janitor”.

    I vividly recall one time where I’d worked on the bugs in a protocol stack for a client for a month or so, then they shipped me off to Europe with a fancy Sun SPARC development system to run the code through their customer’s validation suite. It was there, in front of the customer and seeing the protocol interactions that the full horror dawned on me. The system was doing a lot of processing on the interrupt, but this protocol was doing a fair amount in the scheduled process as well. They had no interrupt lockouts in the process code–none. A new packet could come in on the input interrupt, write all over the various packet queues and return, introducing weird and wondrous behavior in the scheduled process. This left me outwardly smiling and chatting with the customer, and inwardly cursing a blue streak. That night in my hotel room I shook a large bag of Interrupt Disable/Interrupt Enable pairs over the code.

    In the meantime, the original authors had gone off to write more new code. It’s more interesting, donchaknow.

  5. Dunc says

    SOP @5: All together now: “But it works fine on my development machine!”

    Me, I live in a nice, comfortable world of managed code, so I generally don’t have to worry about such things too much. However, the last couple of versions of .Net and C# have been heavily pushing asynchronous processing, so I suspect that we’re going to start seeing a lot of bugs from people who are basically used to scripting (by which I mean “Googling up bits of other peoples’ scripts and mashing them all together”) suddenly hitting the world of multi-threading without even really being aware of it. Fortunately, most of it isn’t actually doing anything asynchronously…

  6. says

    Dunc@#4:
    Most systems aren’t really designed as such… They’re just sort of thrown together, then shaken up until they kinda look like they more-or-less work.

    Yes. It’s particularly bad with web apps: the way the programming model works, you are pretty much stuck with “code some stuff and hit ‘reload’ in your browser until it works.” There are debuggers (sort of) but mostly they’re useful for simple runtime state exploration and that’s about it. Everything looks sort of like a badly done remote procedure call, because nothing really runs in the same address space (or language, or machine) so you’re using programs to talk to programs over restricted and sometimes baroque programming interfaces. For example, much of the web uses Javascript as object code and SQL as a remote procedure call. So, you’ve got to write programs in parallel across multiple machines – you can’t really stop and inspect the full run-state of your application because it’s scattered all over the place.

  7. says

    Some Old Programmer@#5:
    The system was doing a lot of processing on the interrupt, but this protocol was doing a fair amount in the scheduled process as well. They had no interrupt lockouts in the process code–none. A new packet could come in on the input interrupt, write all over the various packet queues and return, introducing weird and wondrous behavior in the scheduled process.

    Aw, man.
    I should start a “tell your computing horror story” thread. That’s pretty nightmarish.
    So, the whole system behaved predictably under low load but as soon as it got busy the probability went up that it would get sort of random. Well, not exactly random, because it would break in the same couple thousand ways.

  8. says

    Dunc@#6:
    However, the last couple of versions of .Net and C# have been heavily pushing asynchronous processing, so I suspect that we’re going to start seeing a lot of bugs from people who are basically used to scripting (by which I mean “Googling up bits of other peoples’ scripts and mashing them all together”) suddenly hitting the world of multi-threading without even really being aware of it.

    I hate that so much. It’s pretty easy if you diagram all the states your system has to transition into through the main loop, then just do all your routines in finite states with nice clean transitions. The results work fine in a single process with internal threading, or in an asynchronous loop (as long as you know which states need to serialize) Finite state machines are so easy to extend and easy(ish) to debug. But, yeah, the scripters won’t ever even think of using that technique.

    I hear Node.js is super fast, though. (eyeroll)

  9. Dunc says

    I hear Node.js is super fast, though. (eyeroll)

    The idea of running Javascript on the server gives me cold sweats. It’s a fucking horrible language, barely fit for what it was originally intended for, and I absolutely hate the way it’s metastasising all over the place like a particularly invasive cancer.

    I still remember the massive sense of relief when ASP.NET gave me a proper, compiled, statically-typed server-side programming model, and we finally got to throw away all the schonky scripting shit and stop fucking around with building stuff client-side. It’s like the last few years of web development have been a determined effort to rediscover all the horrors of the early years, only with more baroque frameworks. Don’t get me started on the whole business of watching the JSON guys slowly discovering that you actually need all that stuff we used to have in the XML stack, that they were so eager to throw away because it was all too hard… Maybe one day they’ll even come up with a standard for representing date / time data.

  10. bmiller says

    Here is an interesting personal security case study from today. I locked my keys out of my car. Ah….I wonder if the dealer (local) has the code for the keypad alternative locking system? YES! I gave him my VIN from the registration, he gave me the key, and I unlocked my car.

    Great stuff. Until a coworker pointed out the VIN is inscribed in the front window of the car. Someone could write it down, call the dealer, answer yes to the question “is this you” and get access.

    Yikes.