Self-Modifying and Machine-Generated Code


In a comment to my previous post, John Morales asked my opinion of self-modifying code.


Answering the question as asked

That’s an easy one:  it’s a maintenance nightmare.  Don’t do it.


More generally

Over the years, and I’m old enough to remember punching Hollerith cards1 and sticking them in sorting machines (wired boards were before my time), I’ve developed a very explicit coding style, principally because a week, a month, or a year from now, I don’t want to be wasting any time trying to puzzle out what the hell I was thinking about.  I always use meaningful identifiers (names of things that programmers make up), although I’ll break down and use abbreviations at block scope; and I avoid all the known anti-patterns (“magic numbers” come easily to mind).  When a function has more than one big thing to do, it’s time for refactoring.  I even prefer Allman-style curly brace placement precisely because it puts more white space in the code and so separates bits of code in ways that are immediately visible.

Back in my newbie days, I made all the newbie mistakes; I even thought that self-modifying code was Really Cool.  The only reason that I’m not still a newbie is that I learned from my mistakes (and I hope that I never stop learning).

Just for fun, I’ll put a cute little self-modifying PDP-8 program at the end of this post.


A related issue:  machine-generated code

This is a completely different thing and not particularly scary.  Programmers know how to write source code, which is just text; and they know how to write text to a file.  No big deal.  Indeed, the two programs that I wrote to generate simple Amtrak timetables and on-time performance statistics spit out complete Web pages2; and my timezone code comes with a couple of utilities that read Zoneinfo data and generate array initializers that get #included in other programs.  That’s all really simple stuff.

It could even be argued that this is what C++ templates are:  when you write a class template or function template, you’re telling the compiler how to write a class or a function for you.  Yes, really.  And if reflection makes its way into C++26, which is highly likely, we’ll have lots more compile-time code generation.3


1If I might stretch the meaning of “program” a bit beyond the breaking point for a moment, my first program was an 026 drum card.  That was back in the Viet Nam era when Sgt. Seymour was Base Fuels Accountant at March Air Force Base.  There was a very complicated daily report that got run overnight at 15th Air Force HQ; and it didn’t take me long to figure out that, in order to get the cards punched right the first time, I had to do it myself.

2It turns out that all I needed was good old HTML1 with no anchors or scripts…easy.

3But this kind of code generation is something that compiler authors do, not something that J. Random Coder like me does.  I know some compiler authors, and they’re way smarter than I am.  If those folks are the big leagues, I’m like an acceptable AAA baseball player when I’m having a good day.


Here’s a bit of machine-language PDP-8 code (not written by me) that sets all the bits in a memory field, including itself, to zero.

A PDP-8 field had 4096 12-bit words, so all the addresses and data are four octal digits.

ADDR  DATA  MNEMONIC
----  ----  --------
0004  1005  TAD 5
0005  3410  DCA I 10
0006  5004  JMP 4
0007  5404  JMP I 4
0010  0011  (data)
0011  2010  ISZ 10

TAD:  Two’s complement add.  Add the contents of the referenced location to the accumulator.

DCA:  Deposit and clear the accumulator.  Store the accumulator in the referenced location and set all the accumulator bits to zero.  The “I” in the mnemonic means “indirect”; and when absolute addresses 108 through 178 are used as indirect addresses, they pre-increment; so the first time the DCA I 10 instruction gets executed, it stores the accumulator in address 128, then 138, and so on.

JMP:  Jump to the referenced location.

ISZ:  Increment, skip if zero.  Add 1 to the referenced location and skip the next instruction if the sum is zero.

That first three-instruction loop just sticks 3410 all over memory until it finally wraps around to location 6 where we continue to location 7 and JMP I 4 to location 3410 and start executing 3410 instructions. 😎  Since DCA clears the accumulator, at this point we’re storing zeros all over memory.  0000 is an AND instruction:  load the bitwise AND of the referenced location and the accumulator into the accumulator.

I’ve forgotten what that final ISZ instruction is for, and I’m not in the mood to puzzle it out, so I’ll leave that as the dreaded exercise for the reader. 😎

You’ll notice that that program is not an algorithm because it doesn’t halt; it just keeps on executing AND 0 instructions.  When all the lights on the front panel stop flashing, you press the STOP switch. 😎

I’ve written a little paper about the PDP-8 if anybody is interested.

Comments

  1. billseymour says

    It just occurred to me that embedded systems programmers who write for Von Neuman architectures might write some self-modifying assembler code when efficiency is important.  If they can save a few memory cycles inside nested loops, then they can get the same performance at a lower clock rate, and so run on less power and extend battery life.  That’s a really big deal.  They do way more testing than most folks do, though.

    IIUC, that wouldn’t actually be possible on Harvard architecture chips (code is code and data is data and never the twain shall meet).

    My one small foray into embedded systems was back in the 1970s 1980s when I was still really a wires-and-pliers guy, and I designed and built an electrocardiograph with a microprocessor in it for Allegheny General Hospital in Pittsburgh.  (I was implementing my boss’ PhD thesis in a more practical way.)  We used TI’s TMS9900 CPU, so that was a Von Neuman machine.

  2. Alan G. Humphrey says

    In my formative years as a programmer, we were taught that self-modifying code was forbidden. This was in the early 1970’s in high school where we had an IBM 1130 (4K core memory and an enormous 1MB hard drive) along with some unit record equipment in house, so patch boards were used for some of those. Fortran IV was the language we learned on that computer, but I was exposed to COBOL in my senior year where remote computing was done by sending our Hollerith cards via automobile to a local college and getting the compiler and test – if compilation was successful – results back the next day.

    Now I contemplate how the changes in computing technology over the past 50 years are so great while keying this comment using the same keyboard layout I used back then is mirrored in our culture and politics. As the saying goes, “The more things change….”

  3. xohjoh2n says

    I *think* the ISZ is there primarily for the side-effect of incrementing (0010) at precisely the point where it holds the value 0007 (and will have just been executed as “AND 7” if I infer the instruction pattern correctly based on what you’ve described.)

    This has the effect that the next DCA we meet doesn’t zero out location 0010, thus preserving our current location in the loop. We then race round executing DCAs which zero the word just preceding the program counter (starting with the ISZ so it only happens once.)

    As far as I can tell we then end up (after a whole loop round) with:

    0000 0000 AND 0
    0001 0000 AND 0
    0002 0000 AND 0
    0003 3410 DCA I 10
    0004 3410 DCA I 10 <=== pc
    0005 0000 AND 0
    0006 0000 AND 0
    0007 0000 AND 0
    0010 0002 AND 2
    0011 0000 AND 0
    0012 0000 AND 0

    So we inc (0010) to 3, clear (0003) and loop round once more until we hit our last remaining DCA. we inc (0010) to 4 and clear that. Now we have memory full of ANDs, but (0010) still contains 4 and I can’t see how that can get zeroed at that point.

  4. xohjoh2n says

    Hmm. Slight tweak: the JMP I 4 sends up off into the DCA weeds, but that keeps hitting and zeroing 0010 which means it will keep zeroing out 0001-0010 but can never proceed further until we loop round top of memory. 0000 still contains a DCA at that point and conveniently this just happens to leave (0010) containing 0007 and lets us hit the ISZ which skips over 0010 and then into the run of DCA as described above.

    The next time we hit top-of-memory 7777 clears 7776, 0000 clears 7777, don’t we then end up executing 7777 from 0010?

    (Oh, I see you’ve explained that at the link. Well, if it’s a no-op on the current model then we will loop round memory (all ANDs now) one final time and clear the last DCA in 0000 when we hit it again, incrementing the last remaining non-zero word in 0010 to zero in the process.)

  5. billseymour says

    xohjoh2n:  when I said that I’d leave the ISZ business “as the dreaded exercise for the reader”, I didn’t mean that anyone had to work on it.  Sorry.

    OTOH, I can easily imagine that you were having fun, in which case, you’re welcome. 😎

    I think you got it right:  your explanation tickled some memory cells from back in the ’80s when I had figured it out myself.

    BTW, my first comment got the dates wrong.  It was in the ’80s, not the ’70s, when I worked at Allegheny General.  If you click on the drawing in the About the Author section to get the whole thing, you’ll see that it’s dated 4-28-80.  I picked that one because it was the smallest.  Most of the others are 17×22 blueprints.  (Yeah, I still have them and still know what they mean.)

  6. billseymour says

    Alan G. Humphrey @2:  yeah, Fortran IV.  We had that on a PDP-11/60 at Allegeheny General, and I used it to write a disassembler for the TMS9900.  Feed it machine code and it spit out a kind of line-by-line assembler (still with absolute addresses).

  7. Peter B says

    Your mention of self-modifying code brought back memories.

    IIRC, I was 21 at the time. A student at the boarding house where I lived was taking a programming class in the TUTAC language. The first “T” meant typical and the final “AC” meant automatic computer. This completely useless ‘computer’ featured 10-digit instructions and 1000 10-digit memory locations. It had the basic add, subtract, multiply, and divide instructions along with some conditional jump instructions.

    It did not have any index registers. Adding all the numbers in a list required modifying instructions.

    Students visited the computer center and learned TUTAC using 70mm microfilm readers. Code bars rated answers and provided help for incorrect answers. Although it was not part of my chemistry degree, I learned TUTAC. It was boring. I skipped ahead by several lesson spools and was presented with a quick review of TUTAC division. I picked the worst possible answer and was quickly given all I needed.

    Concurrently I was self-directed in learning the IBM 1620 CPU. I punched its 5-digit addresses and 12-digit instructions into “self-loading” tab cards. Drum cards were useful because they auto-copied the self-loading boilerplate from card to card.

    After writing a simple math exercise in 1620 machine code I wrote a TUTAC emulator. The director of the computer center had also written an emulator. But it didn’t work after TUTAC Lesson 12 where instruction modification was covered. He had “compiled” TOTAC into subroutine calls. After lesson 12 my version was used.

    I eventually learned IBM 1620 assembly language. My prior stubbornness against assembly language paid off as I had no preconceptions about what assembly language did. I also wrote a trace that punched a card as each 1620 instruction was executed. Much later that trace helped others by finding a bug in an IBM 1620 emulator (division remainder) written for some early Control Data system.

  8. says

    There is a definite use case for self-modifying code — albeit not a very big one.

    British home micros of the 8-bit era did not have fancy graphics chips; they just used bit-mapped displays that were effectively one enormous sprite, and for various practical reasons, these were often laid out in a non-linear fashion rather than being one continuous scanline after another in the direction of the CRT beam.

    Altering an address in an instruction in place each time around a loop was often the fastest way of implementing a loop to draw graphics on the screen. Speed is of the essence when you are racing the aforementioned beam, especially on a processor like the Z-80 which lifts one foot off the floor at a time.

    And the best place to store a value from some address that will eventually need to be written back there, but needs to be overwritten just temporarily — while switching in a bank of paged memory, for example — is the operand field of a later, load-immediate instruction. That way, it doesn’t take up any extra room.

    Also, I’d bet the number of bootstrap loaders with self-modifying code in them is greater than zero; just because you have limited space for code, the whole machine to yourself, and you know you are running from RAM.

Leave a Reply

Your email address will not be published. Required fields are marked *