I’ll likely be taking my next Amtrak trip starting on Friday; but a few days ago, a server for Amtrak’s Positive Train Control system crashed and stayed down for at least three days requiring that, basically, all trains except on Amtrak’s own Northeast Corridor were cancelled. Scuttlebutt has it that the server is back up, and it seems that trains are departing from Chicago again.
But what really amazes me is that, apparently, there wasn’t a backup. The system I worked on before I retired was actually four complete systems, one for development, one for system integration testing, one for customer acceptance testing, and one for production. Each of those had three servers: one for the Web component, one for the database, and a third for running background tasks. Each of those servers had at least one backup. The DEV and SIT systems had one backup for each server; the CAT and PROD systems had four of each server constantly sharing data back and forth so that they were pretty much exact copies of each other. That’s the way you do it. It’s really old news and well understood.
Another possibility, which also wouldn’t surprise me, is that security was so lax that a hacker could have brought the whole thing down for a ransomware attack. If that’s what happened, they probably won’t admit it.
In any event, Amtrak could have done better by getting a server from the folks who set up my own little website. They’re much more professional, it would seem.