Self-Modifying and Machine-Generated Code

In a comment to my previous post, John Morales asked my opinion of self-modifying code.


Answering the question as asked

That’s an easy one:  it’s a maintenance nightmare.  Don’t do it.


More generally

Over the years, and I’m old enough to remember punching Hollerith cards1 and sticking them in sorting machines (wired boards were before my time), I’ve developed a very explicit coding style, principally because a week, a month, or a year from now, I don’t want to be wasting any time trying to puzzle out what the hell I was thinking about.  I always use meaningful identifiers (names of things that programmers make up), although I’ll break down and use abbreviations at block scope; and I avoid all the known anti-patterns (“magic numbers” come easily to mind).  When a function has more than one big thing to do, it’s time for refactoring.  I even prefer Allman-style curly brace placement precisely because it puts more white space in the code and so separates bits of code in ways that are immediately visible.

Back in my newbie days, I made all the newbie mistakes; I even thought that self-modifying code was Really Cool.  The only reason that I’m not still a newbie is that I learned from my mistakes (and I hope that I never stop learning).

Just for fun, I’ll put a cute little self-modifying PDP-8 program at the end of this post.


A related issue:  machine-generated code

This is a completely different thing and not particularly scary.  Programmers know how to write source code, which is just text; and they know how to write text to a file.  No big deal.  Indeed, the two programs that I wrote to generate simple Amtrak timetables and on-time performance statistics spit out complete Web pages2; and my timezone code comes with a couple of utilities that read Zoneinfo data and generate array initializers that get #included in other programs.  That’s all really simple stuff.

It could even be argued that this is what C++ templates are:  when you write a class template or function template, you’re telling the compiler how to write a class or a function for you.  Yes, really.  And if reflection makes its way into C++26, which is highly likely, we’ll have lots more compile-time code generation.3


1If I might stretch the meaning of “program” a bit beyond the breaking point for a moment, my first program was an 026 drum card.  That was back in the Viet Nam era when Sgt. Seymour was Base Fuels Accountant at March Air Force Base.  There was a very complicated daily report that got run overnight at 15th Air Force HQ; and it didn’t take me long to figure out that, in order to get the cards punched right the first time, I had to do it myself.

2It turns out that all I needed was good old HTML1 with no anchors or scripts…easy.

3But this kind of code generation is something that compiler authors do, not something that J. Random Coder like me does.  I know some compiler authors, and they’re way smarter than I am.  If those folks are the big leagues, I’m like an acceptable AAA baseball player when I’m having a good day.


Here’s a bit of machine-language PDP-8 code (not written by me) that sets all the bits in a memory field, including itself, to zero.

A PDP-8 field had 4096 12-bit words, so all the addresses and data are four octal digits.

ADDR  DATA  MNEMONIC
----  ----  --------
0004  1005  TAD 5
0005  3410  DCA I 10
0006  5004  JMP 4
0007  5404  JMP I 4
0010  0011  (data)
0011  2010  ISZ 10

TAD:  Two’s complement add.  Add the contents of the referenced location to the accumulator.

DCA:  Deposit and clear the accumulator.  Store the accumulator in the referenced location and set all the accumulator bits to zero.  The “I” in the mnemonic means “indirect”; and when absolute addresses 108 through 178 are used as indirect addresses, they pre-increment; so the first time the DCA I 10 instruction gets executed, it stores the accumulator in address 128, then 138, and so on.

JMP:  Jump to the referenced location.

ISZ:  Increment, skip if zero.  Add 1 to the referenced location and skip the next instruction if the sum is zero.

That first three-instruction loop just sticks 3410 all over memory until it finally wraps around to location 6 where we continue to location 7 and JMP I 4 to location 3410 and start executing 3410 instructions. ๐Ÿ˜Ž  Since DCA clears the accumulator, at this point we’re storing zeros all over memory.  0000 is an AND instruction:  load the bitwise AND of the referenced location and the accumulator into the accumulator.

I’ve forgotten what that final ISZ instruction is for, and I’m not in the mood to puzzle it out, so I’ll leave that as the dreaded exercise for the reader. ๐Ÿ˜Ž

You’ll notice that that program is not an algorithm because it doesn’t halt; it just keeps on executing AND 0 instructions.  When all the lights on the front panel stop flashing, you press the STOP switch. ๐Ÿ˜Ž

I’ve written a little paper about the PDP-8 if anybody is interested.

The Meeting is Over

We finished the ISO standards committee meeting that I sponsored a bit before noon on Saturday, then some medical stuff got in the way of this post.  I’ll probably write about that tomorrow or Friday.

I was freaking out most of the week worried that everything would go well, in part because my hearing aids give me hardly anything but distortion when trying to listen to amplified sounds, so I spent most days working from home via Zoom.  That was OK because I have a box that I can plug into the headphone jack on my laptop that generates a bluetooth signal that feeds my hearing aids and gives me good enough quality to understand human speech.  I did check into the hotel Sunday night to help with the setup and to make sure I’d be there for the Monday morning plenary, and again Friday night to help with packing up and to be there for the Saturday plenary.

I guess I needn’t have worried so much because I got quite a few very nice thank-you notes from several movers and shakers on the committee.  Herb Sutter, the WG21 Convenor (the ISO word for chairperson), has a better post about the meeting than I could write on his own blog.  There’s a really good summary of the meeting at the beginning if anybody is wondering how this ISO committee works.  After that, it gets pretty technical and maybe not of much interest to folks who aren’t computer programmers.

I was fortunate to be able to give the committee a bit of payback for everything I’ve gotten from it over the years, but I don’t know that I’ll be able to do that again.

The next face-to-face meeting will be in Wrocław, Poland; and I’ve written about that already with a link to one possible itinerary.  We’ll see whether I’m able to attend.  All the trains I’d ride in the U.S. except the trains between New York and Boston have checked baggage service, all the stations where I’d be changing trains have red cap service (help with luggage), and I get wheelchair assistance at airports*; but I’d be lugging my luggage around on all the European trains.  A lot depends on whether I can get wheelchair assistance from the Frankfurt airport’s terminal two to the airport’s regional train station in terminal one.  The bus from terminal two to terminal one, which I’ve used before, would probably be a major hassle given my current mobility issues.  We’ll see…

Update:  2024-07-06:  I’ve found that I can indeed get wheelchair assistance between the two terminals in Frankfurt.


*I’ve found out that wheelchair assistance at airports gives me license to cut in line at security, immigration, and the boarding gates.  I’m not sure I deserve that; but I guess it makes sense to allow the folks pushing the wheelchairs to spend less time with me and so serve more customers.

My Next Excuse for Riding Trains

I won’t be blogging about riding trains until November when I’ll be traveling to Wrocław, Poland to attend a meeting of the ISO C++ standards committee, but I’m starting to think about it, and I’ve worked up a possible itinerary that includes a three-day conference in Berlin the week before and a one-day conference in Wrocław afterwards.

As I’ve said before, I like to fly Icelandair across the Pond because I like to get off the plane and stretch my legs in Keflavík.  Also, because travel to these meetings is the only thing I spend my fun money on, I’m fortunate to be able to afford Icelandair’s Saga class if I don’t try to afford other stuff that I don’t really want that much anyway.  (Business class on other airlines would probably be out of my price range; and besides, I wouldn’t want to sit in an airplane long enough to get all the way to Europe in one fell swoop.)

Unfortunately, Icelandair serves Berlin only five days per week, so the eastbound trip doesn’t work well.  I’m currently thinking about going a day late and missing almost all of the first day of Meeting C++.  I could fly into Frankfurt instead and take an ICE directly from the airport to Berlin; but then I’d have to return from Frankfurt to get the round-trip air fare; and getting from Wrocław to Frankfurt by train doesn’t look easy.

Update 2024-04-20:  I think I’ve found a way to get from Wrocław to the Frankfurt airport by train, and I like that better.  The link above is to the new version.  The version flying into and out of Berlin is still available here.

Nokia will be sponsoring a one-day conference called code:dive that’s still not officially announced, so I don’t know when or where it’ll be.  My rough itinerary assumes that it’ll be in the same hotel the Monday after the ISO meeting, which could be wrong; so the westbound trip is still subject to change.

Update, 2024-04-17:  I’ve added another option for the first leg that saves a long layover in Chicago.  That train originates in Kansas City and is often delayed on the former MoPac west of St. Louis, so I’m a bit leery about counting on it:  missing the very first connection would likely destroy the whole trip.

I checked out train 318’s arrival times in Chicago and the likelihood of making my connection back to the 1st of October, and it doesn’t look too dangerous; but I’m still not sure I’d want to chance it.  (If I look only at Mondays, which is the weekday I’ll be traveling, I never miss the connection on any of the 29 days; but it’s not clear whether the weekday actually has any effect.)  I probably won’t be making any reservations until the end of July or so, so I’ll check again then.  I’ll probably just stick with the Texas Eagle in any event since that will allow checking a bag all the way to New York.  (The Missouri-Illinois corridor trains don’t have checked baggage service.)

I’ve also been told that Nokia’s code:dive conference will definitely be on Monday, November 25th, but not at the Double Tree where the WG21 meeting will be.

Time Zones in C++

I think I have my timezone class ready for prime time, so I should be just about ready to finish the larger civil time library (which I put on hold because a couple of the classes depend on timezone).

But as I write this, I remember that there’s one thing I haven’t tested yet:  the option of reading TZ and TZ_ROOT environment variables using std::getenv() before the timezone class gets used for anything.  I do that magically with some tricky code that has the odor of the poltergeist anti-pattern; so I need to create the TZ and TZ_ROOT environment variables on my Windows box and make sure that that works.  I don’t see any reason why it won’t, so I’ll say tentatively that I’m done.

Questions for Mac Users

I’ve been slow to finish testing my C++ time zone code because I’ve had other things to do; but I think I have it ready for prime time.

But now, in the spirit of the analysis paralysis anti-pattern ๐Ÿ˜Ž (I’m having fun in my retirement, don’tcha know), I’ve decided that I’d like to make it magically portable to the Mac if I can.

1.  Does the Mac have the Zoneinfo data somewhere?

2.  Do you have POSIX-style environment variables (possibly called TZ_ROOT and TZ) for
  a.  the directory where the Zoneinfo compiled binaries are found, and/or
  a.  your local time zone?

If Macs are at all Linux-like except for having the environment variables, TZ_ROOT could be /usr/share/zoneinfo”; and TZ could be localtime or, in the U.S. central time zone for example, either America/Chicago or CST6CDT,M3.2.0,M11.1.0”.

And for those doing C++ work with some version of GCC,

3.  is there some macro or other predefined identifier that says we’re compiling for a Mac rather than some other POSIX-like O/S?

I could probably go to an Apple store, fire up a Korn (or other) shell, and find out what I need for myself; but I’m hoping that there’s somebody reading this blog who already knows the answers off the top of their head.

Thanks.

Update:  thanks to robert79 for some good information about Macs.

It looks like they work pretty much like Linux, except that the filesystem has no symbolic link called localtime, so I still have no clue how a program can discover what the local time zone is.  There’s certainly some way to set the time zone through the UI, but I still need to find out how to discover that setting programmatically.  Maybe Google will help.

Update2:  well, that was easy.  I should have just Googled from the get-go.

It seems that, in POSIX systems generally, /etc/localtime is usually a symbolic link to the real file.  There should also be a file called /etc/timezone.

My Debian Linux box, a VPS actually, is somewhere in England; so /etc/localtime is a symlink to /usr/share/zoneinfo/Europe/London and /etc/timezone is a plain text file that contains just Europe/London”.

I don’t need any code changes, just some remarks in the documentation that the library will work on a Mac; and I still have to proofread that anyway.

Thanks again to robert79.

Time Zones

I’ve been debugging my timezone stuff somewhat casually, but I think I have it to the point where I’m willing to admit that I wrote it. ๐Ÿ˜Ž

This paper also references a trivial library that I’ve mentioned before.

If anybody who cares can think of any cool features that I should add, please let me know, either in the comments or in a private e-mail message.

I can now finish my larger civil time library which contains classes that are intended to be the C++ equivalents of SQL’s datetime types, which in turn will be part of a database access library.  (I’m still having fun in my retirement.)

Some Fun Time Zone Geekiness

There were a few e-mail messages on the FtB backchannel a little while ago.  An FtB blogger who lives in Ireland was wondering when the next FtB Podish-Sortacast will happen (https://freethoughtblogs.com/pharyngula/2024/03/09/in-my-prime/, scroll down a bit), probably just under an hour from when I get this posted.

That got me to thinking about civil time in the Irish Republic.  It winds up that they observe the same time as Great Britain and Northern Ireland, except they get there with rather tortured reasoning.

In the U.K., they switch from “Greenwich Mean Time” (GMT, same as UTC+0) to “British Summer Time” (BST, UTC+1) on the last Sunday in March at 01:00:00 local wall clock time, and they switch back from BST to GMT on the last Sunday in November at 02:00:00 local wall clock time.

In Ireland, they switch from ”Irish Standard Time” (IST, UTC+1) to GMT on the last Sunday in November at 02:00, and they switch back from GMT to IST on the last Sunday in March at 01:00.

Yes, really. Go figure. ๐Ÿ˜Ž

The POSIX TZ environment variable for Europe/London:  GMT0BST,M3.5.0/1,M10.5.0
The POSIX TZ environment variable for Europe/Dublin:  IST-1GMT0,M10.5.0,M3.5.0/1

Why did they bother?

A Couple of C++ Quickies

I wrote two fairly trivial libraries to support my development of the timezone class that I’ve mentioned before.

1.  Reading Directories in C++ describes a class that loops through directories.  It’s portable to both POSIX and Windows so that you don’t have to do that in two very different ways.

2.  Getting Zoneinfo Data on Windows describes a simple way to do what the title says along with a couple of functions that let you create and delete symbolic links on Windows.

Both of those are really old news, and neither has any great ideas of my own, so all that code is in the public domain.

The timezone class is finished, but I still need to do some testing before I release the code to the world and, perhaps, embarrass myself. ๐Ÿ˜Ž  I hope to have that done later today.

Reading Directories in C++

This one is trivial.

For testing my timezone code, I often need to recurse through all the directories where the Zoneinfo binary files are.  I got tired of rewriting the programs depending on whether I was testing on Linux or Windows, so I decided to write a little framework that would be portable to both operating systems.  It doesn’t solve all the problems (the two systems provide different information about directory entries), but at least I don’t have to completely restructure the main loops. ๐Ÿ˜Ž

There are no new ideas here:  how to loop through directories is really old news; and since it’s not particularly original on my part, all the code is in the public domain.

Time Zones in C++

I’m back to working on a database access library in C++ and, at present, I’m developing a library of civil time classes that would mimic SQL’s datetime types.  In the next two or three days, I hope to have a timezone class ready for prime time; and I’ve decided to document it separately from the rest of the civil time library because there’s no reason why it couldn’t be a stand-alone class in its own right.

I don’t have the final code ready to share yet, but I wanted to make the design available while the class is still under development in case there’s anybody out there who would like to suggest any additional features that I haven’t thought of.

Is there anything else you think I need?

Update:  I just found out that Windows’ filesystem does indeed support symbolic links.  (I’m not sure what planet I’ve been living on.)  I’ve also figured out an easy way to create a .zip archive with the symlinks in it* and unzip that on my Windows box.  I guess I have a bit of a redesign to work on. ๐Ÿ˜Ž

*I can also create a .tar.gz that’s about one fifth the size of the .zip file, but I don’t see how to get the links as links (rather than copies of the file), and the .zip file is only about 2.5Mb.