The Real History of a Time » « Not Your Usual Landing

More Competence Porn

Since 2004 or so, I’ve done a column over at SearchSecurity [ss] which started out as me doing a point/counterpoint with Bruce Schneier and ended with me interviewing interesting people from all over the field. I’m stopping doing the column, finally, this fall, due to “internet security fatigue” triggered by decades of saying the same thing.

It’s a real privilege to interview some of the people I’ve managed to get hold of, and the last interview posted was a special thrill: I interviewed Tom Van Vleck. [multicians] Tom’s been a personal hero of mine since the late 80s, when I encountered some of his commentary on software development. It was around that time that I was asking “why is so much code so bad?” but I hadn’t seen nothin’ yet. You know how every device you carry seems to be constantly updating itself? That’s because the software engineers writing the crap that runs on it are incapable of writing stuff that’s not so buggy that it can’t hold itself together and stagger along for a year or even two. When I started doing development in the 80s it was expensive to release software so we tended to do things on an annual cycle. Now, it appears to be weekly – which has severe consequences for system dependability.

The interview is here: [thvv] and it’s (unfortunately) just a slice of a fascinating 25-minute discussion.

Tom Van Vleck: Let me start by saying that my understanding has evolved. One of the things that I worked on was the Multics system administration facilities — and they were quite elaborate. They had many commands and tech manuals describing how to use all those commands, and we built a special-purpose subsystem with different commands for the system administrators and another set of different commands for the system operators. Remember when systems had operators who were trained to operate the computer, instead of having everyone operate their own computers with no training at all?

It was big mistake, in retrospect. It used to be that after the Multics operating system crashed — in large part due to the hardware, which was much less reliable than it is now — the operator would have to go through a very complex set of recovery steps to get the system back up and all the files happy again. Over time, we realized that every place where the operator had to make a choice and type the right thing was a chance for them to type the wrong thing. Over time, we evolved to a thing where — when the system crashed — you said start it up again, and if it turned out that you had to run some recovery step, the system would decide whether or not to do it, and we designed the recovery steps so they could run twice in a row with no negative effect. We aimed toward a completely lights-out, ‘no chance for mistakes’ interface.

Tom’s comment is on my mind because I was recently involved in an unnecessarily exciting incident response that was a result of an open-ended administrative interface. This was a fairly important system (the resulting data breach included tens of millions of people…) and the system administrator was comfortable setting it up so that a lot of the operational stuff was run from his personal account. It was easier than reaching for an administrative credential. When the time came to set up servers at AWS cloud, he did it the same way. And he appears to have used the same password for one of his personal accounts on some other website, as for the AWS and corporate instances. Comparing that to what Van Vleck describes above, that was an unrestricted ‘any mistake is possible’ interface.

Ranum: There’s some relationship between administration, reliability and security. It’s always seemed to me that the reason we have security as a field at all is since the ’80s we’ve screwed up system administration so badly that our systems are neither reliable nor secure.

Van Vleck: It’s sort of true. The cause and effect that you paint is arguable. We’re building code that only works sometimes; this is the big mistake. Whether those failures produce insecurity or unreliability is a second-order issue. We’re not building code to a high enough standard of doing what it’s supposed to do.

Some of it is that we’re vague about what we want it to do; some of it is that we ship and install code that we never should have shipped and installed.

When you talk to these ‘old school’ programmers, they see software as the output of a process that begins with a purpose (strategy) then migrates to a plan (strategy) that is considered in terms of its inputs, outputs, and expected behaviors. Then and only then is the plan broken up into functional components (strategic design) which get implemented as sequentially smaller planned and designed modules. If it’s an important design, the modules will generally include administrative interfaces, maintenance interfaces, diagnostic interfaces, test harnesses and test inputs.

Back in the day, when I used to code something, I identified the subsystems and wrote test harnesses that automatically did a regression test as part of my build process. The inputs and outputs shouldn’t change unless I expected it, so the test harness was responsible for exercising the system with some reasonable operations: insert 50,000 items into the database counting duplicates, then count the total number of items in the database (should be 50,000 minus duplicates) then delete 12,001 items and read them back in order (order should match the results of ‘sort’ and item count should be 37,999 minus duplicates) You get the idea: the system should check itself against its own specifications.

If you don’t have a specification, you don’t have a system, you just have a pile of code.

Go read the rest of the interview, and the rest of the stories on Tom’s page. I used to assign new-hire programmers to read his stuff. This one is a classic [three]. Three questions to ask about every bug you find:

Is this mistake somewhere else also?
What next bug is hidden behind this one?
What should I do to prevent bugs like this?

One hoary old programmer once said, “the premise of ‘software engineering’ today is ‘how to write software, for those who cannot.'” and that’s pretty much right. There was a brief period around 1987 where structured software development (an over-reaction to poor quality) was the rage, but since August 9, 1995, “more code faster” is the order of the day. “Better code” would only matter if it was somehow a market proposition. You can remember that the next time Yelp’s app downloads (it seems to update a gigantic bolus of data onto my iPhone about once a week) – this is not just ‘shovelware’ it’s ‘big shovel’ware.

------ divider ------

August 9, 1995 was the Netscape IPO. I was in a data-centre in Dallas, working on a project, and we watched the ticker go crazy as Netscape’s stock rocketed up and up. Kent L, who was standing next to me, said, “this will change everything.” And it did. The software market suddenly was no longer valued based on problems solved, but on size of customer base. It heralded in the era of beta-test software masquerading as production. We are now in the final stages of that, in which developers are now managing production systems – they call it ‘DevOps’ and some of us call it ‘DevOoops.’

The Real History of a Time » « Not Your Usual Landing

Comments

Dunc says

August 8, 2018 at 8:36 am

Fortunately, most software these days doesn’t actually do anything that really matters to anybody…
Marcus Ranum says

August 8, 2018 at 10:04 am

Dunc@#1:
Fortunately, most software these days doesn’t actually do anything that really matters to anybody…

It’s cheering to think that the militaries of the future will run on software.
Some Old Programmer says

August 8, 2018 at 11:25 am

Marcus@#2:
It’s cheering to think that the militaries of the future will run on software.

Uh, cheering or terrifying? I heard scuttlebut many years ago that US Navy destroyers had a system running Windows–complete with the occasional Blue Screen of Death. I have no way of verifying the information, but given the way military acquisition works, it wouldn’t surprise me that there are a bunch of antiquated systems running on any complex weapons platform designed in the last 20 years.
colinday says

August 8, 2018 at 12:00 pm

If you start with 50,000 distinct records and remove 12,001 of them, wouldn’t you end up with 37,999 plus duplicates? If you had exactly one duplicate removed, wouldn’t you have 38,000 records left?
Dunc says

August 8, 2018 at 12:03 pm

it wouldn’t surprise me that there are a bunch of antiquated systems running on any complex weapons platform designed in the last 20 years.

You mean like the nuclear weapons systems still using 8″ floppies? (Although I see the article say they were due to be replaced in 2017…)
komarov says

August 8, 2018 at 12:38 pm

Cheering indeed.

Error in line 192319403485: Function call Doomsday(Target, Yield) has raised an exception: Yield must an integer in string format not containing the following: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9″

But if my Python skills had anything to do with it you’d never be able to connect to the nuclear silo in the first place because there’d still be an indelible instance of it floating around in memory from the last botched script test.

—

Re: Some Old Programmer (#3):

I remember being told about even older systems running on some ships. Resetting a crashed system apparently involved squeezing into the ships bowels and jiggling something to physically restart it. Which isn’t the worst thing, if it is stable enough. Nor is it, in all likelihood, all that uncommon. You can’t simply plug new hardware into a complicated, integrated system and expect it to work. Especially not while we’re stuck in an infinite loop of security / stability patches. So the hardware stays and the software sticks around, too.
A modern military vessel actually running something like Windows 10 might decide, in media res, that NOW is the time to install an update, that there SHALL be NO delays, no matter what anyone says. Given the official but never stated purpose of military ships that, too, might not be the worst thing.
Marcus Ranum says

August 8, 2018 at 12:57 pm

colinday@#4:
If you start with 50,000 distinct records and remove 12,001 of them, wouldn’t you end up with 37,999 plus duplicates? If you had exactly one duplicate removed, wouldn’t you have 38,000 records left?

I’m trying to think if I got that right…. I was using an example from an old b+tree implementation I did back in 1989 so I may be forgetting.

In an index you sometimes don’t allow duplicates. So if I insert: A, A, B, C, A I will wind up with 3 index entries and two ‘duplicate entry’ errors. Some index schemes allow duplicates, though. You want to check carefully and make sure that the correct results occur when you do duplicates and delete duplicates. What do I have in my index if I insert A, A, B, C, A and then delete ‘A’? Is it (A, B, C) or is it (B, C)? In a classical ISAM or B-tree it’s the latter, although there exist trees that contain duplicate indexes, or that reference-count index entries.

It’s confusing! In fact, it’s still confusing me! Which is why the correct behavior should be specified and the software should be validated to behave the correct way.

I once saw an implementation of a system that was using a b-tree index (my code, in fact, which is why I saw it) and the programmer had assumed that duplicate indexes were handled one way, but they were handled another. That’s one of the risks of using poorly specified code. I remember one time I used a certain library and started doing a certain operation that was in the manual page and it didn’t work. Finally, I loaded the source for the library into a debugger and started digging around, and discovered that at the bottom of the manual for the library, under Bugs: it said “feature ${that marcus was depending on} is not implemented yet.”
Curt Sampson says

August 8, 2018 at 1:05 pm

Over time, we realized that every place where the operator had to make a choice and type the right thing was a chance for them to type the wrong thing.

Yup, that’s it. Unnecessary flexibility is a huge source of error. But good luck trying to convince someone to redesign even a tiny part of the system so that it can’t break, rather than just having a particular circumstance where it would probably work.

(And yes, Windows for Submarines is a thing.)
Marcus Ranum says

August 8, 2018 at 1:06 pm

Some Old Programmer@#3:
I heard scuttlebut many years ago that US Navy destroyers had a system running Windows–complete with the occasional Blue Screen of Death.

It was worse than that.
It was an Arleigh Burke-class Aegis missile boat, and it had to be towed back to base so they could debug it; the entire command network on the ship had shut down. I know a guy who was involved in that incident, trying to figure out what had happened – they had computer security people involved because naturally someone was screaming it was ${Russia|China|North Korea|Illuminati} hacking it. What appears to have happened is that someone plugged a personal laptop into an open RJ-45 port, searching for internet access, and DHCP requested an address and somehow got the address of one of the ship’s control systems* (which was running on Windows XP) – the XP machine went “whoah! duplicate IP address!” and stopped talking to the network.

I believe (I have not verified) that the Aegis boats are still running Windows XP because the requirement for forwards compatibility was not in the development spec. There are many, many, things wrong with that whole story. People I have asked have said that it’s far from isolated.

For a custom application like a warship control system, the system architecture should take into account maintenance and future-proofing. For code, that would almost certainly imply using only open source development environments. I don’t recall when I first started yelling about this but it was probably close to 20 years ago: the US needs a strategic software reserve. [ranum]

(* Most likely wouldn’t happen. Most likely the guy with the laptop picked an IP address at random and used it, then tried to say ‘it just happened like that’)
Marcus Ranum says

August 8, 2018 at 1:10 pm

Curt Sampson@#8:
Unnecessary flexibility is a huge source of error. But good luck trying to convince someone to redesign even a tiny part of the system so that it can’t break, rather than just having a particular circumstance where it would probably work.

One of the topics Van Vleck and I got into was the options for fsck. Given that pretty much nobody ever uses any option except fsck -y perhaps that’s the way it should always work. Because everyone always types that, what happens if you type fsck -P instead? If it’s in a system start-up script, you’ve just hosed everything if something that is usually used non-interactively starts waiting for console input.
Marcus Ranum says

August 8, 2018 at 1:22 pm

You mean like the nuclear weapons systems still using 8″ floppies?

I need a Tshirt that says “what if they gave a war and none of the weapons worked?”

(I suspect sword-makers would experience a surge in work orders)
Curt Sampson says

August 8, 2018 at 1:27 pm

In an index you sometimes don’t allow duplicates.

Actually, this is a classic mistake in thinking that has caused an enormous amount of misery and debugging. In an index you never have duplicates.

If you’re looking at something in a set and thinking that it’s a duplicate, you’re not looking at the whole thing. If you can distinguish three as in [a, a, b, c, a], you’ve actually got some implicit values there that you’ve not written down. Rewrite that to show those values explicitly, { (0,a), (1,a), (2,b), (3,c), (4,a) } and it will instantly become clear that you have no “duplicates”; (0,a) and (4,a) are not the same thing.

This is exactly the horrible mess SQL fell into when they deviated from the relational model to allow “duplicate” rows. In relational database theory, a tuple’s existence in a relation is a predicate: “there is a user identified by key U#12345, and his first name is “Curt.” What does it mean to say that a second time? Does my name become more “Curt”? If someone deletes that row, are they trying to say that I’m now less “Curt”, but still “Curt,” or did they really want to negate that proposition, in which case they didn’t do it by deleting only one of the two rows.

By the way, that “hoary old programmer” was Dijkstra. His archive is well worth reading. The quote you’re thinking of is from EWD1036, “On the cruelty of really teaching computing science”:

A number of these phenomena have been bundled under the name “Software
Engineering”. As economics is known as “The Miserable Science”, software
engineering should be known as “The Doomed Discipline”, doomed because
it cannot even approach its goal since its goal is self-contradictory.
Software engineering, of course, presents itself as another worthy
cause, but that is eyewash: if you carefully read its literature and
analyse what its devotees actually do, you will discover that software
engineering has accepted as its charter “How to program if you cannot.”
Curt Sampson says

August 8, 2018 at 1:50 pm

Ah, one more point: DevOps really is a good thing, if it’s really properly making the developers responsible for helping operations work properly, as opposed to the usual standard method: “I got my code working on my machine; now I can throw it over the wall to the sysadmins and it’s their problem to make it work on the production servers.”

“Release-driven development” is the obvious next step after “test-driven development”: start out not by writing the tests, but by writing the release system so that you know you can get the darn code off a developer’s computer and on to another system and have it still work. (I’ve been doing this for years.) But most developers don’t care and don’t want to spend the effort, and the Operations folks are perfectly happy to install Chef and claim that they’re now doing “DevOps” while still spending days on sysadmin hacks that never had to exist in the first place if the developers had designed their systems differently.

A startup I worked at around 1999-2001 had no separate operations team or syadmins; we just had a tech team that responsible for making everything work, end-to-end. Every line of code we wrote was with the thought in the back of our minds that, if it was fragile, we’d end up being paged in the middle of the night to deal with whatever problem it couldn’t deal with by itself. It was brilliant.
Pierce R. Butler says

August 8, 2018 at 7:10 pm

How do you pass “stuff” to the person on your left while holding hands with people on both sides?
Marcus Ranum says

August 8, 2018 at 8:15 pm

As someone pointed out, xkcd is on point as usual:
Marcus Ranum says

August 8, 2018 at 8:16 pm

Pierce R. Butler@#14:
Don’t move your feet.
Marcus Ranum says

August 8, 2018 at 8:39 pm

Curt Sampson@#12:
Actually, this is a classic mistake in thinking that has caused an enormous amount of misery and debugging. In an index you never have duplicates.

I know that, and you know that, but a lot of developers don’t.

I hate to seem cynical but I think SQL did that so that developers could continue to do inserts without checking return values. If you insert 10 records and one is a duplicate, it means you can’t do bulk inserts, you have to handle an error condition where there’s a dupe. The horror. To be fair, it does get interesting with race conditions if you don’t want to deal with atomic transactions. (here I am going back to the days when Ingres SQL was written in a preprocessor that injected C code into your code so it could be compiled and the database calls looked like some kind of bletcherous remote procedure call.

It actually does turn out that for some purposes, it’s easier to let the index enlarge with duplicates, if the index has multiple data pointers to a key. In that case you can implement something like an inverted index with one fewer level of indirection – if you don’t mind making the index uglier and slower. I actually did a bit of heinousness like that once by concatenating fields into a key then using the primary index as a sort of giant bloom filter. Aah, those were the days.
Marcus Ranum says

August 8, 2018 at 9:16 pm

Curt Sampson@#13:
DevOps really is a good thing, if it’s really properly making the developers responsible for helping operations work properly, as opposed to the usual standard method: “I got my code working on my machine; now I can throw it over the wall to the sysadmins and it’s their problem to make it work on the production servers.”

If it’s done right, sure. But I’ve seen more DevOps disasters than successes. It seems to me to be “developers bypassing system adminstrators and any kind of quality checks and just doing development on production systems.” I do about 5 calls a year with clients (including some major SAAS products that you have doubtless heard of) that no longer have a gap between production and development – the developers do everything. That works – as you say – but only when the developers take a very responsible ownership of the environment. I’d hesitate to try it with more than 20-30 people on the entire product team, once you go past the scale of personal stove-pipes things get overlooked. (As I mentioned, a multimillion-dollar mistake – probably a bankruptcy causer – that I recently did response on, was a result of DevOps gone wrong) The cut-out between R&D and production is testing and staging and it’s really important. One company I am on a TAB for, wants to go DevOps so naturally I asked how many experienced systems administrators were in the dev team. None. OK so you want to take over deployment and vulnerability management and system administration and you’ve got a bunch of javascripters who propose to do that?

There are some things that can be done, and if your developers take production deployment into account (it should be a product requirement anyhow) then that is great. At NFR we had a requirement that all components of the system require no configuration at all. It meant that the system needed to do a bit more work to figure itself out but everything maintained sensible defaults and offered a minimum of knobs for the user to play with. Oddly, that sort of deployment doctrine is the best if you’re deploying into a cloud environment. And, if you do things that way, your server side doesn’t “care” if it’s in a particular environment. What scares me when I look at DevOps is stuff like the developers who put internet-facing SSH on production systems because it’s convenient for them, not because it’s a good idea.*

(* it may be a good idea but that’s an engineering discussion to have with the system architect, who’d damn well better understand the trade-offs)
Dunc says

August 9, 2018 at 5:01 am

Came here to post that XKCD…

I hate to seem cynical but I think SQL did that so that developers could continue to do inserts without checking return values.

Actually, I think it’s even worse – they did that so that developers could “design” tables without bothering to specify keys. I admit, I’m guilty here too, I’ve thrown together a few heaps in my time – but it was (almost) always a design decision, rather than sheer laziness… I’ve seen databases with hundreds of tables, and not one key (either primary or foreign) or index defined. Funnily enough, they mostly didn’t perform very well for some reason. Then there’s the marvellous fun of trying to reverse-engineer the schema without any foreign keys (or documentation)…

In my experience, most (non-specialist) developers just don’t really grok relational databases – hence the whole “NoSQL” thing, which has always struck me as a way of letting people use databases without understanding anything about them. Most devs also write fucking horrible SQL… On more than one occasion, I’ve reduced procedure run times from hours to seconds simply by understanding the idea of set operations. Kind of a fundamental concept…

I’ve seen more DevOps disasters than successes. It seems to me to be “developers bypassing system adminstrators and any kind of quality checks and just doing development on production systems.”

Uh huh…

One company I am on a TAB for, wants to go DevOps so naturally I asked how many experienced systems administrators were in the dev team.

I bet they don’t have a DBA either… As I keep telling people, databases are like puppies – you can’t just stick ’em in a box and forget about them. You wouldn’t believe (OK, actually, you probably would…) the number of times I’ve seen production SQL Server boxes die because they’d been installed with all the defaults (so all the databases were in Full Recovery mode) but had no maintenance plan, so the log files (which were on the system disk, naturally) just grew and grew until they ate the entire disk. Or because nobody had set a maximum memory limit on the SQL Server instance, so it just grabbed more and more memory until there wasn’t enough spare RAM for the OS to even page properly… OK, you can argue that this is at least partly a problem with horrible defaults in the SQL Server installer (and I’d agree!) but still, you shouldn’t be setting up and administering production databases if you haven’t got the first fucking clue about what you’re doing, and those two issues in particular just scream “I literally don’t know the first damn thing about administering SQL Server”.
Curt Sampson says

August 9, 2018 at 10:27 am

In an index you never have duplicates.

I know that, and you know that, but a lot of developers don’t.

Which pretty much sums up the problem.

It actually does turn out that for some purposes, it’s easier to let the index enlarge with duplicates….

Well, if you’re doing that as an implementation strategy, but the layer managing the index is presenting a proper index to whatever’s above it and dealing with keeping its hidden data logically consistent, that’s fine.

I hate to seem cynical but I think SQL did that so that developers could continue to do inserts without checking return values.

And that’s yet more failure.

If you look (again) at the relational theory, you’re creating a set of propositions. (An RDBMS is really just an inference engine: you put in propositions and, given your queries, it produces further valid statments using the basic rules of logic.) If you are tracking whether an email address currently appears deliverable, and you just failed to deliver a message to joe@example.com, you simply want to store the tuple (address=joe@example.com, delivery-attempt=failed) without worrying about whether there currently exists a proposition about joe@example.com. So-called ‘UPSERT’ (I would call it ‘ENSURE’) is the basic operation you need here. (SQL did finally get MERGE in 2003, but taking thirty years and then coming out with that mess is pretty typical of the SQL world.)

Or perhaps a better example in this context is a relation that blog software should use: (blog-post-id, comment-author, comment-text, comment-timestamp DEFAULT now()). Post a comment with ENSURE comments CONTAINS (thispost, 'Curt J. Sampson', 'Blah blah blah.') and presto! no more duplicate. But no, we ignore the warning from Archer Sterling.

And @Dunc, you’re right, of course: the NoSQL thing mostly made things worse. Though be careful in your terminology: all those tables did actually have defined keys, albeit not defined by and perhaps not even visible to the developers. As with my example in comment 12, the keys are there whether you know and/or admit it or not.

And so far we’ve not even touched on the type-checking side of the relational model. When I first discovered that MySQL would happily accept Feb. 29th in a non-leap year as a date, I thought, ‘Hmm. Lazy programmers.’ When I discovered it would accept Feb. 30th, I thought, ‘Uh oh.’ When I discovered it would accept Feb. 0th, well, that was when I collapsed in despair.

I’d expect you’ve seen plenty of DevOps failures; it went from ‘there’s a word for this’ to ‘we’re not doing it, we’ve just pasted the word on it’ much faster than ‘Agile’ did. I did enjoy your story though since it’s quite the opposite of (and no doubt an even bigger disaster than) what I’ve mostly seen, where they just relabel the Ops team ‘DevOps.’ (I know of one company with a CTO managing the developers and independent CIO managing the ‘DevOps’ team.)

(I suppose it was a lot easier back in even the late 90s, when I first started doing full-time programming professionally (after years as a sysadmin and even more years a hobbyist programmer) because most of the folks around who had at least some experience in not-huge companies did end up understanding a fair amount of detail about the whole system, end to end. But life, or at least computing, really was simpler then.)

You’re absolutely right that software ‘[not caring] if it’s in a particular environment’ is the right way to go, anywhere you can manage it. Anything that it does care about is effectively a global variable, after all, and lord knows who will tweak it. But then the Twelve Factor App guys come along with ‘store config in the [process] environment’ because, well, if global variables are great in a single process, a global variable table shared amongst multiple processes and for which you have no revision history must be truly wonderful, right?

What scares me when I look at DevOps is stuff like the developers who put internet-facing SSH on production systems because it’s convenient for them, not because it’s a good idea.

Well, look at the bright side: at least they are looking at the usually-ignored side of the security equation. I’m a pretty sensitive to convenience right now due to some recent experience in an environment where ‘moar firewalls! and especially make sure to block SSH!’ was the rule, which inevitably led not only to a multitude of (official) ‘bounce’ hosts at which anybody with root access could make use of your handy key agent forwarding but also a plethora of hacked-up and utterly insecure proxies set up by individual developers so that they could just get the access they needed to do their work. I almost wish I’d been the attacker there, given all the places where they (often officially) broke end-to-end security with self-inflicted MITM. The best part was the standard solution for dealing an HTTPS Git remote failing certificate validation: git config --global http.sslVerify false. (Needless to say, they ended up with dozens of developer machines with that global setting….)
Dunc says

August 9, 2018 at 10:37 am

all those tables did actually have defined keys, albeit not defined by and perhaps not even visible to the developers.

OK, explicitly defined keys. ;)
Marcus Ranum says

August 9, 2018 at 12:33 pm

Dunc@#19:
I bet they don’t have a DBA either…

A what?

Seriously, though. Back when I was doing databases on machines that were powered by might Motorola 68020 CPUs you actually did need to think about indexes. I remember the fun I used to have setting up different database layouts on a quiescent system then doing a standard set of operations and watching the I/O transaction rates. Sometimes a well constructed query or an index in the right place could drop the I/O to nearly zero (yay caching!) for some of my benchmarks. That was how I learned how to rig a benchmark, which was very very handy when I went to Digital. And that’s another whole kettle of pain: people who don’t understand system performance, who write benchmarks. We had one customer that was concerned with a peak insert-rate for a certain application and said “we will buy from whoever does our benchmark fastest.” So I ran the benchmark on a DECstation 3100 with a RAID cluster of Rz23 drives (which were small, but had big caches) so the total amount of I/O being done all went to cache. I mean “don’t turn the machine off suddenly” always applies to transaction systems, right? (The RAID software had a ‘soft updates’ option that I enabled, so most of the writes were overlapping cache updates.) (I admit that this was probably evil. But I’ll cop to “mildly evil.”
Marcus Ranum says

August 9, 2018 at 12:36 pm

Curt Sampson@#20:
Or perhaps a better example in this context is a relation that blog software should use:

You really really really don’t want to look at how blog software like WordPress use SQL. Really. It will make you want to hammer nails into your own eyeballs using a hardcover copy of Codd and Date as a hammer.

‘moar firewalls! and especially make sure to block SSH!’ was the rule, which inevitably led not only to a multitude of (official) ‘bounce’ hosts at which anybody with root access could make use of your handy key agent forwarding but also a plethora of hacked-up and utterly insecure proxies set up by individual developers so that they could just get the access they needed to do their work.

When Van Vleck and I were talking about specified interfaces, that’s what that’s intended to overcome. If there’s a need for operators to be able to (say) invoke an emergency backup, or perform an emergency halt, then there are two programs (with manual pages) called “emergency_backup” and “emergency_halt” that have a basic access control atop them, but which are responsible for negotiating the necessary permissions, synchronizing what must be synchronized, and making sure that the things which must be made sure of are made sure of. I suggested that approach to one of my clients that was wrestling with “the developers need root on all the production systems” and they stared at me in puzzlement.

Diana Kelley (who is a senior consultant at IBM, which means she is some kind of the shit) [interview] and I used to joke about how we could walk into a client and do a security audit without having to ask any questions at all – all we needed to know was a few target points, like: are they doing Agile, are they doing DevOps. It’s like talking to a race car driver who tells you they don’t need safety gear because they’re so good: you know they’re going to be dead within 6 months to a year.
Marcus Ranum says

August 9, 2018 at 12:43 pm

wasn’t enough spare RAM for the OS to even page properly…

Why is everything page’s fault?
Curt Sampson says

August 9, 2018 at 7:32 pm

Why is everything page’s fault?

I don’t recall; that memory’s been flushed. But I still demand page fix it.

(My favourite moments with indexes are the ones where someone has heard about them but doesn’t understand how they work beyond, “Indexes good! Moar indexes!” They go and add several, find that they’re not being used on queries due to low selectivity, and then tweak the SQL to force an index to be used, completely ignoring the fact that they’ve now made their query slower. If you explain to them that “table scan” isn’t a bad word when that’s actually the access method for your query with the least number of block reads, maybe they get it, maybe they don’t. It’s not as bad now that the days of rotating disks have mainly passed, but still….)
thvv says

August 14, 2018 at 9:47 am

When we were working on the Multics operator interface,
we used to talk about how a car with 4 steering wheels, one for each wheel,
would be much more flexible… and unusable.