I’ve always been suspicious of power.* One of the warning signs I’m especially alert to is when I see language being bent or waterboarded in the interest of obscuring facts, rather than clarifying them. When a new word suddenly begins to take on a heavily-freighted meaning, e.g: “ethnic cleansing” instead of “genocide” I immediately ask myself “why that word, and not the other perfectly useable words?” The sudden promotion of or carpet-bombing with a new term is often an indicator that someone has decided to start using a new word with a subtly different definition – basically, lying by redefining the truth.
When Edward Snowden disclosed a tremendous amount about how the NSA had been collecting a great deal of domestic traffic, “metadata” suddenly became the word du jour. The word wasn’t being used to make it easier for people to understand; it was misdirection. When you watch Penn and Teller do a show, you watch Teller because Penn’s job is to wave his hands and catch your eye while Teller does the work; that’s misdirection.
Let me give you a quick example to get started: the NSA’s charter originally was foreign surveillance. In principle, NSA was not supposed to be doing domestic surveillance, at all. The first misdirection was to blow past the fact that it’s been doing domestic surveillance and start arguing that it’s not really surviellance because they’re just analyzing “metadata.” So “foreign surveillance” (1970s) morphs into “surveillance of communications that has at a foreign end-point” (1980s) into “surveillance of all communications, and only select/flag what has a foreign end-point” (1990s) to “surveillance of all communications because – LOOK A SHINY THING” (2000s).
You’ll also notice that the language is “analyzing” not “collecting” or “looking at” or “surveillance of” – yet more misdirection:
1. close observation, especially of a suspected spy or criminal.
“observation”, “looking at” – the definitions of “surveillance” seem to be tailor-made to not address the question:
Can a machine engage in surveillance?
My answer is “of course it can” because we’re machines, too. If a computer made of meat and eyeballs and neurons can do “surveillance” so can a computer made of silicon and hard drives and networks. The difference is that computers are substantially faster and not (yet) quite as flexible. This question of “do you need to have a brain and eyes to do ‘surveillance'” is one that the intelligence apparatus has danced around a great deal. Whenever you are reading discussion about surveillance, look closely at the language use surrounding things involving actual human eyeballs. The intelligence apparatus has defined “surveillance” as “a pair of human eyeballs looks at it deliberately and knowingly.” Go back and read what people from the government are saying about surveillance and you’ll realize it’s almost as if they’ve been carefully coached to use a standard set of talking-points that define “surveillance” in a particular way – a way that happens to be particularly convenient for them. We will return to that point, later.
It Doesn’t Count if There Are No Eyeballs
What is “metadata”?
plural but singular or plural in construction meta·da·ta \-ˈdā-tə, -ˈda- also -ˈdä-\
1. data that provides information about other data
Well, that certainly clears things up, doesn’t it?
Let’s look at some “metadata” about messaging; for example Email. Email moves around the internet using an application protocol called SMTP (Simple Mail Transfer Protocol – RFC 821) There are a lot of other protocols involved, but this is what SMTP mail transfers look like if you analyze inside the traffic going back and forth on TCP port 25:
S: is the server (the computer that is receiving the email on behalf of one of its users) C: is the client (the computer that is sending the email, either the originating system of the message, or a forwarder) In old-school SMTP parlance, the fields exchanged in the SMTP protocol are called the “envelope” (per RFC 822) you can see at lines 4 and 6 the client sends “MAIL From:” and “RCPT To:” commands – this is where the protocol communicates who the message is for and from, so that the server can quickly reject or accept the message. In the example above, if the recipient didn’t exist, the protocol would stop at the SMTP stage.
SMTP/RFC822 mavens call that metadata the “envelope” because it serves exactly the same purpose as a paper envelope with a letter in it: it allows the mail system to get the letter to the right place, without having to disclose all of the contents of the letter. At least, that’s the case in paper mail. In internet terms we might have an encrypted envelope inside the SMTP envelope, etc. But: where’s the “metadata”?
There’s a spectrum of metadata about that message above, for example:
- A TCP port 25 connection occurred at date/time from client C to server S, and N bytes were transferred from C to S
- The connection was an SMTP transaction
- The SMTP transaction included MAIL From: email@example.com and RCPT To: firstname.lastname@example.org, RCPT To: email@example.com
- Inside the message ‘envelope’ the message data contained Subject: Test message
- Inside the message ‘envelope’ the message data contained a communication from bob to alice including the string “body” – it may be a message about murdering someone
- Inside the message ‘envelope’ the message data appears to be in English North American corporate dialect
Notice that some of the “metadata” can be derived only by “reading” the message content. I quoted “reading” because of that pesky question of whether something can be “read” without eyeballs and a brain. Guess what the NSA’s lawyers would say?
The message headers contain other interesting things, such as the system of origin, if it’s a multi-hop message. The headers are part of the SMTP data transaction (e.g: not the RFC822 “envelope”) but are they “metadata”? That X-Originating-Ip: field sure must be yummy to the NSA.
Systems like PRISM must, clearly, go beyond even that. They are looking for string patterns within messages, then tagging messages as “contains text of interest” – that’s the metadata: a computer ‘analyzed’ the message (we can’t say “looked at” or “read” because computers have no eyeballs) and added to the metadata that the message contained arabic phrases, or whatever.
You can stretch the question of “what is the metadata?” pretty far, especially if you combine it with the definition of “surveillance” that requires human eyeballs connected to a brain: the message can be parsed for language nuances and flagged as “language=english” and firstname.lastname@example.org can be whitelisted as “two hops from a person of interest” and flagged “interest_level=5” You can, in other words, keep a straight face while doing just about anything to the contents of someone’s traffic as long as there are no eyeballs involved and the analysis is based on computerized scoring of computer-generated metadata.
When the intelligence apparatus says they only analyze “metadata” they often specifically refer to pen registers in telephony applications, because a lot of the case-law their actions depend on is based on copper land lines and how they functioned. A pen register, or “Dialed Number Recorder” is an old device that recorded the phone numbers that a particular line called. They were developed back in the day before when phone companies were expected to provide ubiquitous “transactional” records – in other words, pen registers were what the FBI used before they made the phone companies build pen registers into all of their systems. Along with pen registers was a body of legal analysis that determined that while a person’s conversation was private, the number they called wasn’t exactly as private. That was before cell phones and “stingrays” and a new body of legal analysis that argues that since your phone traffic is going over radio it’s being broadcasted and therefore you have no reason to expect it to be private. Note the important distinction they make between “broadcasting” and “transmitting” data. Guess which might be protected? Guess which term they use, in order to not invoke those protections.
Do I seem paranoid about this? You have to be the judge. But watch this section of Booz Allen Hamilton executive Mike McConnell (former Director of the NSA) in a debate with Bruce Schneier and Mark Rotenburg: listen to the minute at 52:58 –
McConnell misdirects into a lie, by saying “It is against the law. It is against the law to tap Mark’s telephone. Unless he is guilty of a crime.” – there are two lies in McConnell’s statement:
- He is careful to say “telephone” because the legal protections on land lines are different from those on cell phones.**
- Tapping a phone has nothing to do with whether or not someone is guilty of a crime; the question is whether there is a warrant or not.
Never mind the fact that people obviously use their cell phones all the time, expecting that their conversations are private, the definition of “private” has also morphed and squizzed around: under the new distended definition of “surveillance” your conversation is “private” as long as no human ears attached to a brain hear it deliberately. McConnell – as Director of the NSA – is eminently positioned to understand that this is not about telephones: it’s about communications.
Meanwhile “metadata” about other things you might think are private gets collected because, well, it’s a) “only” metadata and b) you don’t really think it’s private, do you? This argument is what I call the “if it’s not nailed down, it’s OK to take it” argument.*** If you drive your car down the street, you can hardly say that your location is private, right? You may think you’re going about your private business but you’re doing it in a car on a public street, so it’s OK for law enforcement to put a GPS tracker on the outside of your car – just, as long as it’s not on the inside of your car, which is “private.” But your location isn’t. If you ship packages via the postal service, or UPS or FedEX, the label is public but the contents are private. If you aren’t aware of it, let me break it to you: metadata about packages shipped within the US is collected and turned over to the intelligence/law enforcement apparatus. Why do you think that the USPS finally implemented barcoded package tracking?
In 2007, Bush did a “signing statement” to the postal reform bill to the effect that the PATRIOT act provisions applied to paper mail, too. Many privacy advocates misunderstand that to mean that the government wants to be able to look at mail – what they really want is the metadata: the To/From/weight and tags of any package sent.
We should be deeply concerned about business records that are being voluntarily turned over to the intelligence community. PATRIOT required that banks turn transactional data over to law enforcement/intelligence agencies. But what about voluntary turn-over? That’s a topic that’s very seldom discussed, for obvious reasons. The FBI famously tried to get public libraries to turn over book access data, and even more famously tried to learn what Monica Lewinsky was reading during her affair with President Clinton. Credit bureaus, as private companies, are selling their customers’ data to the FBI, as are EZ-pass tollways, airport parking companies, and more. Since many of those are regulated private entities, they are particularly susceptible to having their business’ lifeline cut if they don’t play nicely with authority.
The new thing is face recognition databases. If you’re in an airport or a shopping mall, the airport is a private entity, not an agent of the government. They can do whatever they want with the feeds from their cameras. And that’s without even getting into the security checkpoints. If you use a public parking lot in any interesting place, take a look at the license-plate scanners at the exit-point. We’re busy watching Yahoo! and Apple squabbling about FBI National Security Letters, but we don’t even hear about the airports and parking lots because they’re much, much easier to pressure.
As I described here, traffic analysis is the key: once you can simultaneously place a cell phone handoff transaction with a license-plate scan and a high-likelihood facial database hit, you can be pretty sure that someone was someplace at a certain time. Privacy? What were they doing? It depends! If they were in the Apple store, they were probably looking at Apple stuff. But if they were at the same restaurant as a person of interest, arrived at more or less the same time as they did, left at more or less the same time as they did, and swapped emails with them the night before: it was a meeting.
The Razor In The Apple
Here’s one way to convince yourself that the surveillance is vastly greater than they are trying to misdirect you into believing it is: just the metadata isn’t conclusive. The metadata is being used for broad analytics and data clustering, so they can determine what events are interesting, and then the actual eyeballs attached to actual brains actually “read” the data. When you consider that the FISA court has never rejected a warrant request, and got 1,500 requests last year, you’re looking at the tip of an iceberg. Because the FISA requests are the cases where the metadata triggered an alert that some analyst concluded required more data than they already had in the system. Remember: the data was collected but wasn’t “looked at” so one possibility is that the NSA/FBI analyst clicks a “get me a warrant” button on their interface and it unlocks a FISA request and then they can see the data. Another possibility is that the analyst queries the long-term store, which contains the messages and phone conversations, etc, looks at them, and concludes that something more is needed, in which case the event is logged and a more detailed request is sent. My guess would be the situation looks more like the latter, especially with PRISM.
Here’s why: imagine you’re an analyst and your algorithms pop up a sequence of message-IDs that are flagged as containing communications regarding making a bomb, from someone who has a plane ticket to Washington for the inauguration. And you click on the “get more information” button and nothing comes back because the system collected only the metadata and the actual emails, ticket data, and phone conversations – were not collected. Yeah, right. Such a system would be not just useless, it would be irritatingly useless.
They’re collecting it all, tagging it and extracting metadata, and archiving the source content.**** They have redefined “surveillance” as “human eyes looking at the data” so the collected data is just – latent, not really surveillance data. But it’s still there. The system would be useless, otherwise. All the discussion about metadata is misdirection.
There is a lot more than this, obviously. One reason the “cyberwar” angle is important is because it justifies ubiquitous surveillance. Ditto “hacking” – for the last 20 years the intelligence community has been setting up the idea that hacking is a major threat, because that justifies surveillance of all internet traffic, because all internet traffic is potentially hacking. That’s a topic for another column.
I expect some potential argument about my assertion that a machine can engage in surveillance. Here’s another reason: if a bunch of algorithms can weigh and combine metadata about me, and put me on a TSA watchlist, then the algorithms are not merely doing analysis – they’re doing surveillance and (indeed) enforcement. That’s where we are heading right now. Remember the scene at the beginning of “Brazil” where the civil servant squashes the bug, it falls into the printer, and changes “Tuttle” to “Buttle” and thereby ruins someone’s life? Life is imitating art, thanks to the FBI.
RFC 821 – Simple Mail Transfer Protocol, first version
(* Based on my observation, published elsewhere, that power has no value unless it’s abused. Therefore anyone who wants power over others plans – whether they realize it or no – to abuse it. Thus, they are my enemy.)
(** I’ve seen McConnell and other NSA spokespeople, as well as an FBI spokesperson, very carefully choose their words in this manner. I referenced this particular instance because I remembered McConnell pulled it – and got away with it – in this debate. Rothenburg and Schneier are basically decent and honest people, therefore are fairly easily snookered in this manner.)
(*** And if I can pry it up, it’s not nailed down.)
(**** How the hell else do you think the FBI was able to get things like David Petraeus’ romantic txt messages, years after they were sent?)