On Data and Backups


“Hey, Marcus, those hard drive recovery services: how good are they?”

The answer is: pretty good.
The longer-form answer is: it depends on if the drives you were using are still available on Ebay. How old is your storage?
The snappy answer is: it’s an IQ test. If you need a drive recovery service you don’t understand computers well enough to own one, and you failed the IT IQ test.

Sign of a bad day

Sign of a bad day

Working in computer security, I get that sort of question all the time. The worst is when it’s something like a cryptolocker virus. Cryptolocker takes over your hard drive, encrypts it with a random key that it sends to its controller somewhere out in the cloud, and then you can buy the key back for a bitcoin (about $600 right now)

Hollywood Presbyterian Medical Center paid a $17,000 ransom in bitcoin to a hacker who seized control of the hospital’s computer systems and would give back access only when the money was paid, the hospital’s chief executive said Wednesday.

Apparently Hollywood Presbyterian Medical Center has no idea how to use computers. That should be comforting to their patients. $17,000 in bitcoin gets their data back but doesn’t solve the basic problem that apparently they have no backups, automation of system administration, or disaster recovery plan. When I say “don’t know enough IT to operate a computer” that’s what I’m talking about.

The cryptolocker attack doesn’t apply to hard drive recovery – since the data is overwritten encrypted, there’s nothing a drive recovery company can do for you. But, if you have good backups, you’ll just laugh at cryptolocker.

Let me teach you Marcus’ iron laws of data:

  1. If you don’t have three copies of it, you barely have one.
  2. If your data backup approach doesn’t have some scheduled way of making sure your files are preserved, you will neglect them and you will later regret neglecting them.
  3. Hard drives get old.
  4. Segregate disposable from important.
  5. Recovery is much more expensive than backups.

I’m going to walk through those points and lay out my home system backup approach. It’s not expensive, onerous or complicated. And I have only lost data twice in my life: (so far!)

  • The first time, when a Seagate 20meg hdd blew up on me and two month’s worth of coding on my first ever consulting project blew up with it. That was when I decided to get serious about not losing data ever again.
  • The second time, when I left a company where I worked and because I was afraid they might sue me, I turned over all the media I had that held data. (Hillary Clinton, that’s how you’re supposed to do it) and asked them to be careful with it. In a fit of pique they dumpstered the drives – including email archives going back to when I first got on the internet in 1981. They were trying to irritate me and it succeeded beyond their wildest imaginings.

Point #1: Have 3 Copies

The first copy is your working copy. In my case, that’s my work desktop (I have a game desktop that is 100% disposable) which has all my emails, code, reports, invoices, 15 years worth of digital photography and video, CDrom images of rare applications from 1985, etc. It is divided into 3 drives, each of which holds about 4tb. In this case, size doesn’t matter; I’m just bragging – it really is what you do with it that counts.

I have 2 external USB hard drives for each of the drives. That’s a total of 9 drives, in other words. Whenever I need to increase the size of one of the 3 drives in my system I buy 2 more USB hard drives and copy the internal drive to the newer, bigger drives, then replace the internal drive with the larger one, copy a copy back, and move on.

External drive #1 sits, powered off, in a pelican case on a shelf in my office. It is powered off and in a pelican case in honor of my friend Olaf, a fine photographer, who had a system (I built for him back in my case-modding phase!) with pull-out drive racks and all sorts of nice stuff, and a goodly array of USB backup drives that were powered up when the goon down the hall in the building he shared managed to trigger the sprinkler system. Powered up hard drive meets water, data goes “bye!”   Olaf called me and asked “Hey, Marcus, those hard drive recovery services: how good are they?” and I told him:
“Think of this as an opportunity to re-invent your photography.”

Another reason to keep it powered off: if some hacker gets into your system and decides to overwrite your drive (or you get cryptolocker) it can’t touch the powered-off drive sitting on the shelf in the pelican case. This is a Sun Tzu-approved technique I call “security through pure laziness.”

External drive #2 sits in a safe deposit box at my bank, 15 miles away. The nice thing about bank vaults, other than that they’re probably pretty hard to get into, is that they all have fire suppression systems. Any disaster that wipes out both my house and the bank …  well, I live 2000 feet above sea-level in a stable zone, it’d probably be a disaster so bad I’ll be busy for a while and may not even survive. I’ll be ready to re-invent myself after that, I bet.

Point #2: Trigger Your Backup Schedule

About once a month I go to town and get groceries, deposit checks, whatever. Most of the time, I try to synchronize my external drives (remember: I only have one at home at any given time) with the internal drives. Not every drive needs to get synchronized, I have a pretty good memory for that, but it doesn’t take very long anyway, unless I’ve been ripping a lot of movies or music, doing photo shoots, or collecting game-play videos (easy to eat 100gb that way)

Directory Toolkit

Directory Toolkit

Make sure you label your backup drives. I use great big labels and a post-it note. The label says what it is: “b-photo” or whatever, and the post-it is the last time I synchronized that particular drive.

It takes more time to describe it than to do it.

Synchronizing drives: do it however you want to. Some operating systems have built-in tools for it. If you’re trapped in Windows-land you can use Microsoft Sync-Toy. I use Funduc’s Directory Toolkit because it has worked fine for me since 1994 or so.

I like to manually synchronize, because of my friend Norm. Norm had a super cool backup system – it was a network attached storage server running FreeBSD and ZFS, which lived in his studio – a separate building – a couple hundred feet from his house. He set up his systems using some UNIX tools to  automatically synchronize his desktop with the storage server. It was a very spiffy arrangement, until the time he was installing a new drive and zeroized the file system on the wrong drive and the automatic synchronizer process he’d set up mirrored the empty filesystem onto the storage server. (Hint: rsync is not always your friend)

Another trigger for my backup schedule is my travel schedule. I have a laptop. I’ve segregated my email and working files into a separate drive (a small one, about 200gb) and I have a matching 200gb encrypted partition on my laptop. Before I go somewhere, I synchronize the working file area with the laptop. When I get home, I synchronize it back. A nice side effect of that is that I always have a pretty fresh emergency copy of my working files that’s generally no more out of sync than a couple days.

Point #3: Hard Drives Get Old

Solid state drives get old, hard drives get old, even DVDs get old. My old ASR-33 punch tapes got old. The only way to keep your data from ageing away on you is to keep refreshing it to new media every so often. Then throw the old media away. Keeping a forward-rolling archive means you will never find yourself staring at Ebay listings for TRS-80 external floppy drives, and wondering “how can I read this thing?”  That means: spend the money on the space and don’t worry too much about organizing it. I have a filesystem called “archives” and I’m pretty sure that if I created it after 1992 it’s in there. It just may take me days to find it.

There is nothing worse than coming to depend on a hard drive and discovering it’s erroring. Most drives give some warning before they outright die. If you get a drive error, now is a good time to re-assess your storage needs!

DVDs get old. Remember Olaf, the guy I told you about earlier? He had stacks of DVDs on spindles and tried to recover some of his lost stuff from those. Unfortunately his old DVD writer appeared to be kind of unreliable, and the disks age just sitting there. He had a lot of data errors and corrupted files.

The way serious storage people do things nowadays, they use a file database as an index to a bunch of heirarchical storage. That’s how Amazon can sell space so cheaply – it’s a tiered array of caches starting with system RAM then solid state drives then hard drives and finally petabyte tape robots. The big storage systems checksum the files when they are accessed and make sure the files haven’t lost a bit somewhere. That’s actually a thing! When you are storing petabytes and hundreds of millions of files, the probability that a system RAM error or a drive mis-read will flip a bit goes up – we’re talking astronomical numbers in a mega-storage array and even things with a very low probability happen all the time.

Hard drives try to tell when they have bad sectors but sometimes they can’t and you’ve now got a corrupted file. ZFS internally keeps several copies, checksummed, and reconstructs the correct one if something goes wrong – and it tells you to get a new drive. My friends who play storage harder than I do, use ZFS.

Point #4: Segregate Data

If you’re installing an operating system carve a partition out of the drive and put all the O/S and your apps there. Then carve another partition for your working files. My home hard drive is carved into:

Windows (disposable)
Work files (back up weekly)
Temporary files (back up weekly)
Music (back up once a month)

When I synchronize the drives I synch work files and music and that’s it. If my O/S install takes a bullet (it’s Windows, so that happens about once a year) I don’t have to worry about it – I just blow the contents of that partition away and reinstall Windows then re-populate the other partitions.

If you get your personal stuff mixed in with Windows or whatever you are going to have to start backing up that crap, which means you’re backing up the temporary install files, yadda yadda all the glarp the O/S wants to pull down. Don’t do that. Set “my documents” to point to your own partition and don’t ever deliberately store anything in C:\Windows.

Consider taking segregation to the point of physical separation. My systems are now built with two 100gb Kingston SSDs in a mirrored drive configuration. I boot Windows on SSD and my machine runs like your favorite metaphor for “really fast.” That way, the SSDs can wear out and blow up or whatever and/or if I want to do a Windows upgrade I just pull the old SSDs, do the upgrade onto new SSDs, and when that’s all done I put the old SSDs in an external case and use them for giving movies to friends or upgrading laptops.

The other reason segregating is good is because you can decide what to take with you and what not and when.

Point #5: Recovery Is More Expensive Than Backups

Hard drive recovery is a really cool thing. Unless you’re the person who needs it. Hard core hard drive recovery shops do things like buy duplicates of a damaged drive, then swap the logic boards to see if that’s the problem. If that doesn’t work they go into a clean-box and dismantle the platters and try to move them to a donor drive. Sometimes it even works. It’s really cool when it does.

Let’s look at some costs:

  1. Pelican Case, $32 – keeps the USB external drives dry and safe. Looks cool. You can use a Zero briefcase with cut foam if you’re really styling. Think waterproof.
  2. Triplicate Storage, $350 – That’s the cost of a triplicated 4tb, approximately. Seagate external 4tb expansion drives (right now) are $109. So you need 2 of those and an internal 4tb drive
  3. Safe Deposit Box, $40/year – I have one of the large ones and it turned out to be handy for storing car titles, deeds, stock certificates, and blackmail materials. Just kidding about the stock certificates.

So for less than $500 you can have 4tb of guaranteed storage. You can skip the safe deposit box if you have a friend you see once a month – just swap the pelican case and you’ll look like a spy or something cool like that. You can be someone else’s offsite backup.

The Cloud

If you have better internet bandwidth than I do, you can get cloud backup solutions that just work, and work great. Those services are about $20/ month, or $250/year.

I do not know what the automatic cloud sync things do with cryptolocker (which overwrites and encrypts files in place) – my guess is that you get a copy of the encrypted data synchronized to your cloud backup, but then you have the option of going back to retrieve versions from before the attack. You hope.

One Last Thing: More Segregation

Look ma, a screenshot

Look ma, a screenshot – do NOT use twofish, as this example does

If you sell a device or give it to someone else, be careful what happens to the existing storage in the device. Back in the early 00’s I did some consulting for a company that produced cell phone storage encryption. We bought a bunch of used phones on Ebay and nearly every one of them had something interesting on it. Only one out of the dozen we got was wiped (and it was just formatted using the O/S’ format command, which is reversable with forensic software) I have purchased used laptops on Ebay and found people’s files in them.  It’s disappointing how banal most people’s data is.*

The only way to keep that sort of thing from happening is to destroy the disk or encrypt your stuff and then de-key the encrypted volume when you hand the device to someone else.

On my laptop, I use Truecrypt (Free, but discontinued and getting hard to find) to create a virtual volume that’s the size of my working files partition. That way, I have C:\windows and inside C:\windows\tc I have a 200gb Truecrypt volume. If I ever give someone my laptop all I need to do is delete that volume to free up the space – as long as the recipient hasn’t got the key and the volume is unmounted, I don’t need to worry about wiping the drive.

I do this for data segregation more than security but I do cross a lot of international borders and I don’t like DHS’ habit of occasionally looking on people’s hard drives. So if you dismount the Truecrypt volume, it’s just a big file and if DHS images your disk they’ll think you’re probably into child porn or drugs and they’ll freak out, but they won’t notice the big file right away and you’ve probably passed on through. Virtual volumes are a great way of segregating your data. Since I work in information security, I periodically get annoyingly helpful people lecturing me about Truecrypt’s flaws**, and I have to patiently tell them that – for me – it’s a system administration tool and my way of securing customer data is to keep it off my laptop.


At my first real job, I was a user support consultant at a university, and had taken to hanging out with another guy, Kevin, who worked in the CS department, who was pretty cool. One day a woman came into my office, tears pouring down her face, incoherent. It turns out she had been trying to format a floppy disk  Format B: and had typed Format C: instead. The hard drive had the only copy of her mostly-finished dissertation on it. “Distraught” was an understatement. Kevin had a copy of Norton Utilities and we went over and looked at her hard drive – it turned out that she had stopped the format before it got very far – it had just wiped the file allocation table and a bunch of the data blocks. Kevin used Norton to reconstruct a file allocation table consisting of the blocks of the document, and eventually re-arranged the document into the correct order. He was the great hero of the hard drive that week, for sure! If she’d had backups, Kevin never would have had such a chance to be a star. By the way, when I’m editing a document I really don’t want to possibly lose, I periodically just email myself copies. That way my in-box has a nice sequence of the various revisions of the document.

(* Anyone want a framework for a novel? Someone buys a laptop on Ebay, gets it, and discovers there’s data on it. The data appears to include video of someone being murdered by the police. The hero falls into a web of awfulness as they try to figure out the owner of the laptop, and what happened. Stay tuned…)

(** Infosec practitioners tend to be very big on secret squirrel “NSA backdoor” legends. In the case of Truecrypt, the author appears to have come under pressure from the US Government to put a backdoor in, refused, been threatened with dire consequences, and dropped developing the software entirely.)

Comments

  1. noneya says

    The cloud service I use (crashplan) is $5/month, and has revision history. Sorry, but your 20th century method is absurd. Please travel forward in time to the present.

  2. kestrel says

    This is great. I had been backing up to thumb drives but I can see I need to do more. And I use the cloud but I might be trusting it too much. That’s OK, I can learn new stuff.

  3. Mano Singham says

    Currently I use a MacBook Air and attach an external drive onto it whenever I am using it at my desk (which is pretty much most of the time). Time Machine makes backups every hour, updating whatever has changed within the last hour. I have two external hard drives that I switch every week so that if I lose my laptop and the external drive that is attached, I still have the other one and have lost at most one week’s work.

    Should I be doing even more, other than keeping the spare drive outside the home in case the house burns down with everything in it?

  4. Johnny Vector says

    I have two Time Machine drives that I rotate daily, for primary backup in case my drive fails or I accidentally delete a version of some file I should have kept. Then, in case the house burns down or I get burgled (they’ll take the drives too I imagine), I use one of those cloud services. I use CrashPlan, which runs nicely in the background once you figure out the UI, which is a little baroque. It’s only 60 bucks a year. It took a few days for the first backup, but incrementals are plenty fast. And if you need to recover faster than the network will do, I think they will (for a fee) send you a hard drive with all your files. Although I don’t see that option any more on the website.

  5. Pierce R. Butler says

    … when I’m editing a document I really don’t want to possibly lose, I periodically just email myself copies.

    That must work a treat for backup, but it doesn’t sound all that secure in a privacy sense. (And it reminds me of a really lame Lee Child “Reacher” novel in which the hero’s mysteriously murdered buddy daily snail-mailed his own PO box with key files on flash drives :-P .)

    Some years ago, working in a clinic where patient privacy was a prime priority, I used to recycle hard drives by erasing everything, then copying (say) a Fonts folder over and over again until the drive was full, rinse & repeat & repeat (plus some big copy-pastes to flush temp storage, etc). That probably wouldn’t’ve done more than slow down the NSA, if they really wanted our data, but I never heard of any problems in the actual fact.

    Fwliw, the pffft! disagrees with part of your post:

    A common misconception is that a damaged printed circuit board (PCB) may be simply replaced during recovery procedures by an identical PCB from a healthy drive. While this may work in rare circumstances on hard disk drives manufactured before 2003, it will not work on newer drives.

  6. says

    noneya@#1:
    The cloud service I use (crashplan) is $5/month, and has revision history. Sorry, but your 20th century method is absurd. Please travel forward in time to the present.

    Crashplan’s a great option. You might want to consider having a near-line copy, in case. Odds are nothing will happen like Amazon’s EBS crash (here) and I’m sure the people who used megaupload.com for their backups (yes, there were a few!)

    Cloud is a great option, I’d assume it’s always on and I’d still sync to a local offline copy.

    With regards to your suggestion regarding time-travel: I’d be thrilled if you’d interceed with Verizon to remove bandwidth-capped LTE wireless as the only broadband option in my area. Until, then, why don’t you think things through a bit more thoroughly and give me a call and cry on my shoulder if you ever lose your precious data.

  7. says

    kestel@#2:
    I had been backing up to thumb drives but I can see I need to do more. And I use the cloud but I might be trusting it too much.

    I wouldn’t rely on flash drives, a seldom-written SSD is better, but …

    I did say “Cloud is OK” because it’s … well, OK. I totally trust Microsoft, Apple, Google, and Oracle to have my best interests at heart all the time totally yes I do.

  8. says

    Mano@#3:
    Currently I use a MacBook Air and attach an external drive onto it whenever I am using it at my desk (which is pretty much most of the time). Time Machine makes backups every hour, updating whatever has changed within the last hour. I have two external hard drives that I switch every week so that if I lose my laptop and the external drive that is attached, I still have the other one and have lost at most one week’s work.

    That’s pretty good, as long as your house doesn’t burn down or get renovated by the DoD. I understand that TimeMachine is an excellent product; several of my friends use it with external SSDs and SSD-enabled HDDs (which tend to do OK with large-block powere-on writes)

    If you drive someplace regularly, you might want to keep rotating one of your backup time-capsules between another location.

  9. says

    Pierce R. Butler@#5:
    That must work a treat for backup, but it doesn’t sound all that secure in a privacy sense.

    Yeah, that’s more in the context of “stuff I am writing on my laptop in an airport departure lounge” and I don’t work on client reports or sensitive information in that context (it never gets on/goes on my laptop at all).

    When I was working on chapters for my last book, I spent 2 days trapped at the Goethe Bar in Frankfurt airport – a great opportunity to encourage focused writing – I did use Email backup because I have had Microsoft Word crashes produce an unrecoverable recovery file. Periodically exiting the editor and emailing myself the saved file seemed like a good hedge against losing a day’s work.

    I know a fellow who used to use PGP to email archives to a ‘just in case’ address under a different PGP key than his own. As it happened, “they” never got him and he was able to log in and delete the emails, eventually.

    With respect to donor PCBs – yes – modern hard drives are a problem. The current crop of people experiencing hard drive mechaniacal failures are using older drives. But the newer drives include things like SSD on the logic board – so you’re swapping dirty drive blocks when you swap the drive board.

  10. says

    By the way:
    The NSA rather amusingly recommended hard drive wiping techniques that were just below their capabilities. Peter Guttmann presented a fun paper on that at USENIX in 1996; presumably the NSA has kept up to date. In my cynical moments I floated a proposal to In-Q-Tel for a free cloud storage service with an FBI backdoor buried in the terms of service agreement. There was nervous laughter. Do you read your terms of service? I don’t have to.

  11. Owlmirror says

    Hollywood Presbyterian Medical Center paid a $17,000 ransom in bitcoin to a hacker who seized control of the hospital’s computer systems and would give back access only when the money was paid, the hospital’s chief executive said Wednesday.

    Huh.

    I’m trying to think of everything that is wrong with this scenario — not just no backups, but no malware/virus protection on critical systems; no application of system patches (assuming a known vulnerability was used to gain access); possibly failure to lock down the administrative account(s) (anything I missed?) — and now I’m wondering if maybe someone at the hospital decided to just give themselves a $17,000 bonus.

    But perhaps I am misunderstanding, and the hospital just decided they didn’t need anyone at all administrating the system, leaving all the vulnerabilities as they were when the system was installed.

  12. says

    Owlmirror@#13:
    Yeah, it’s kind of mind-boggling. For one thing, systems that are certified as medical devices (e.g: the controller in an MRI) ought’n’t be connected willy-nilly to public networks – so there’s one category of systems that may be unprotected but (in principle) isolated. Then there are the server systems, which should be protected and isolated.

    Protection implies availability, availability implies redundancy and reliability. Together those are security.

  13. says

    I’ve been using a system very similar to yours for 15 years now, weekly on the local drives (in a fireproof safe) and monthly on the remote drives. I’m probably on my fourth set of backup hard drives now and just about ready to upgrade to 4TB drives.

    I’ve been slow to get a cloud service because of metered internet, but I’ll probably set something like that up eventually.

    With that said, for only $600 I would be tempted to pay Cryptolocker anyway, just to save the time and effort of loading from backup. I think only spite would keep me from doing that.

  14. magicthighz says

    I like to manually synchronize, because of my friend Norm. Norm had a super cool backup system – it was a network attached storage server running FreeBSD and ZFS, which lived in his studio – a separate building – a couple hundred feet from his house. He set up his systems using some UNIX tools to automatically synchronize his desktop with the storage server. It was a very spiffy arrangement, until the time he was installing a new drive and zeroized the file system on the wrong drive and the automatic synchronizer process he’d set up mirrored the empty filesystem onto the storage server. (Hint: rsync is not always your friend)

    If your friend Norm had taken zfs filesystem snapshots he could have just reverted the backup storage to a previous state and recovered his data.

    Automatic and manual synchronisation aren’t mutually exclusive either. I have a cronjob (cron is the automatic scheduler on UNIX and UNIX-like systems) automatically take snapshots once a day, and I manually create snapshots whenever I see fit. The nice thing about them is that they don’t really take up much disk space unless you often add and delete large files.

    Yes, I’m paranoid about my data :P

    I use ZFS storage (on proper systems, with spare drives in case one or more fail, and ECC memory to prevent bitrot), and external drives that are stored securely when I’m not actually backing up data to them.
    People are welcome to use cloud storage for backups, but I myself simply don’t trust cloud storage providers more than I trust myself.

  15. Pierce R. Butler says

    Marcus Ranum @ # 11: … mechaniacal failures …

    Dunno if you did that on purpose, but it’s good enough I’m a-gonna steal it anyway.

    … older drives.

    How many people do you know who use HDs old enough to enter puberty? Fer crysake, in ’03 we useta measure HDs in mere megabytes!

  16. says

    Pierce Butler@#15:
    Other than the phone company and the Department of Defense?
    Hell, some of those guys are probably still running RA-81s.

    I don’t know what the drive recovery guys are doing for a living, these days. Maybe they’re working for the FBI trying to reconstruct the clintonemail.com server’s disk.

  17. Aaron Mason says

    My organisation is considering cloud backup – I’m doing my best to resist and use storage at one of our other (well-connected) sites because of the fact that we have some 900GB of data to restore which, even at 100mbit, will take the best part of a whole day to restore, and that’s not counting our incremental backups. If we store them at one of our sites, and the worst were to happen, I could just pack it into my car and drive it over there at approx. 5x the data rate (or 32x if I tried to do it from where I am).

    Of course, if there’s a cloud storage mob in short driving distance of the office where I could collect them, that would be even better. I could get someone to run down there and collect the backups and bring them in until I can get there to rebuild and restore (if needed).

    For crypto, I have a process where it’ll actually firewall off the offending system if it trips a file block on the server, hopefully stopping an attack in its tracks. That’s my plan now, but in future I’ll be rolling out Software Restriction Policies to combat things that people download.

  18. inquisitiveraven says

    I had a lovely experience with CompuServe once, where a forum server crashed, and when the sysadmins went to restore from backup, they discovered that the last viable backup they had was from three months earlier. Apparently, they’d been doing nightly backups, but they’d failed to verify that they were good backups.

  19. Aaron Mason says

    @inquisitiveraven Ah, good ol’ Schroedinger’s backups – all backups have both succeeded and failed at the same time and collapse into one upon an attempt to restore them.

  20. Curt Sampson says

    …when I’m editing a document I really don’t want to possibly lose, I periodically just email myself copies. That way my in-box has a nice sequence of the various revisions of the document.

    Just store your documents in a local Git repo, synchronising with a private repo on GitHub or, a cheap-o cloud server, or a server at home accessed via a static IP or tunnel, depending on level of paranoia. (If you’re seriously paranoid, commit encrypted files.) Revision control, synchronisation and conflict management (to a greater or lesser degree depending on the file format) will all be taken care of for you, and it’s a lot more secure than email.