I’ve always been suspicious of power.* One of the warning signs I’m especially alert to is when I see language being bent or waterboarded in the interest of obscuring facts, rather than clarifying them. When a new word suddenly begins to take on a heavily-freighted meaning, e.g: “ethnic cleansing” instead of “genocide” I immediately ask myself “why that word, and not the other perfectly useable words?” The sudden promotion of or carpet-bombing with a new term is often an indicator that someone has decided to start using a new word with a subtly different definition – basically, lying by redefining the truth.
When Edward Snowden disclosed a tremendous amount about how the NSA had been collecting a great deal of domestic traffic, “metadata” suddenly became the word du jour. The word wasn’t being used to make it easier for people to understand; it was misdirection. When you watch Penn and Teller do a show, you watch Teller because Penn’s job is to wave his hands and catch your eye while Teller does the work; that’s misdirection.
Let me give you a quick example to get started: the NSA’s charter originally was foreign surveillance. In principle, NSA was not supposed to be doing domestic surveillance, at all. The first misdirection was to blow past the fact that it’s been doing domestic surveillance and start arguing that it’s not really surviellance because they’re just analyzing “metadata.” So “foreign surveillance” (1970s) morphs into “surveillance of communications that has at a foreign end-point” (1980s) into “surveillance of all communications, and only select/flag what has a foreign end-point” (1990s) to “surveillance of all communications because – LOOK A SHINY THING” (2000s).
You’ll also notice that the language is “analyzing” not “collecting” or “looking at” or “surveillance of” – yet more misdirection:
sur·veil·lance sərˈvāləns/
noun: surveillance
1. close observation, especially of a suspected spy or criminal.
“observation”, “looking at” – the definitions of “surveillance” seem to be tailor-made to not address the question:
Can a machine engage in surveillance?
My answer is “of course it can” because we’re machines, too. If a computer made of meat and eyeballs and neurons can do “surveillance” so can a computer made of silicon and hard drives and networks. The difference is that computers are substantially faster and not (yet) quite as flexible. This question of “do you need to have a brain and eyes to do ‘surveillance'” is one that the intelligence apparatus has danced around a great deal. Whenever you are reading discussion about surveillance, look closely at the language use surrounding things involving actual human eyeballs. The intelligence apparatus has defined “surveillance” as “a pair of human eyeballs looks at it deliberately and knowingly.” Go back and read what people from the government are saying about surveillance and you’ll realize it’s almost as if they’ve been carefully coached to use a standard set of talking-points that define “surveillance” in a particular way – a way that happens to be particularly convenient for them. We will return to that point, later.
It Doesn’t Count if There Are No Eyeballs
What is “metadata”?
noun metadata
plural but singular or plural in construction meta·da·ta \-ˈdā-tə, -ˈda- also -ˈdä-\
1. data that provides information about other data
Well, that certainly clears things up, doesn’t it?
Let’s look at some “metadata” about messaging; for example Email. Email moves around the internet using an application protocol called SMTP (Simple Mail Transfer Protocol – RFC 821) There are a lot of other protocols involved, but this is what SMTP mail transfers look like if you analyze inside the traffic going back and forth on TCP port 25:
S: is the server (the computer that is receiving the email on behalf of one of its users) C: is the client (the computer that is sending the email, either the originating system of the message, or a forwarder) In old-school SMTP parlance, the fields exchanged in the SMTP protocol are called the “envelope” (per RFC 822) you can see at lines 4 and 6 the client sends “MAIL From:” and “RCPT To:” commands – this is where the protocol communicates who the message is for and from, so that the server can quickly reject or accept the message. In the example above, if the recipient didn’t exist, the protocol would stop at the SMTP stage.
SMTP/RFC822 mavens call that metadata the “envelope” because it serves exactly the same purpose as a paper envelope with a letter in it: it allows the mail system to get the letter to the right place, without having to disclose all of the contents of the letter. At least, that’s the case in paper mail. In internet terms we might have an encrypted envelope inside the SMTP envelope, etc. But: where’s the “metadata”?
There’s a spectrum of metadata about that message above, for example:
- A TCP port 25 connection occurred at date/time from client C to server S, and N bytes were transferred from C to S
- The connection was an SMTP transaction
- The SMTP transaction included MAIL From: bob@example.org and RCPT To: alice@example.org, RCPT To: theboss@example.com
- Inside the message ‘envelope’ the message data contained Subject: Test message
- Inside the message ‘envelope’ the message data contained a communication from bob to alice including the string “body” – it may be a message about murdering someone
- Inside the message ‘envelope’ the message data appears to be in English North American corporate dialect
Notice that some of the “metadata” can be derived only by “reading” the message content. I quoted “reading” because of that pesky question of whether something can be “read” without eyeballs and a brain. Guess what the NSA’s lawyers would say?
The message headers contain other interesting things, such as the system of origin, if it’s a multi-hop message. The headers are part of the SMTP data transaction (e.g: not the RFC822 “envelope”) but are they “metadata”? That X-Originating-Ip: field sure must be yummy to the NSA.
Systems like PRISM must, clearly, go beyond even that. They are looking for string patterns within messages, then tagging messages as “contains text of interest” – that’s the metadata: a computer ‘analyzed’ the message (we can’t say “looked at” or “read” because computers have no eyeballs) and added to the metadata that the message contained arabic phrases, or whatever.
You can stretch the question of “what is the metadata?” pretty far, especially if you combine it with the definition of “surveillance” that requires human eyeballs connected to a brain: the message can be parsed for language nuances and flagged as “language=english” and bob@example.com can be whitelisted as “two hops from a person of interest” and flagged “interest_level=5” You can, in other words, keep a straight face while doing just about anything to the contents of someone’s traffic as long as there are no eyeballs involved and the analysis is based on computerized scoring of computer-generated metadata.
When the intelligence apparatus says they only analyze “metadata” they often specifically refer to pen registers in telephony applications, because a lot of the case-law their actions depend on is based on copper land lines and how they functioned. A pen register, or “Dialed Number Recorder” is an old device that recorded the phone numbers that a particular line called. They were developed back in the day before when phone companies were expected to provide ubiquitous “transactional” records – in other words, pen registers were what the FBI used before they made the phone companies build pen registers into all of their systems. Along with pen registers was a body of legal analysis that determined that while a person’s conversation was private, the number they called wasn’t exactly as private. That was before cell phones and “stingrays” and a new body of legal analysis that argues that since your phone traffic is going over radio it’s being broadcasted and therefore you have no reason to expect it to be private. Note the important distinction they make between “broadcasting” and “transmitting” data. Guess which might be protected? Guess which term they use, in order to not invoke those protections.
Do I seem paranoid about this? You have to be the judge. But watch this section of Booz Allen Hamilton executive Mike McConnell (former Director of the NSA) in a debate with Bruce Schneier and Mark Rotenburg: listen to the minute at 52:58 –
McConnell misdirects into a lie, by saying “It is against the law. It is against the law to tap Mark’s telephone. Unless he is guilty of a crime.” – there are two lies in McConnell’s statement:
- He is careful to say “telephone” because the legal protections on land lines are different from those on cell phones.**
- Tapping a phone has nothing to do with whether or not someone is guilty of a crime; the question is whether there is a warrant or not.
Never mind the fact that people obviously use their cell phones all the time, expecting that their conversations are private, the definition of “private” has also morphed and squizzed around: under the new distended definition of “surveillance” your conversation is “private” as long as no human ears attached to a brain hear it deliberately. McConnell – as Director of the NSA – is eminently positioned to understand that this is not about telephones: it’s about communications.
Meanwhile “metadata” about other things you might think are private gets collected because, well, it’s a) “only” metadata and b) you don’t really think it’s private, do you? This argument is what I call the “if it’s not nailed down, it’s OK to take it” argument.*** If you drive your car down the street, you can hardly say that your location is private, right? You may think you’re going about your private business but you’re doing it in a car on a public street, so it’s OK for law enforcement to put a GPS tracker on the outside of your car – just, as long as it’s not on the inside of your car, which is “private.” But your location isn’t. If you ship packages via the postal service, or UPS or FedEX, the label is public but the contents are private. If you aren’t aware of it, let me break it to you: metadata about packages shipped within the US is collected and turned over to the intelligence/law enforcement apparatus. Why do you think that the USPS finally implemented barcoded package tracking?
In 2007, Bush did a “signing statement” to the postal reform bill to the effect that the PATRIOT act provisions applied to paper mail, too. Many privacy advocates misunderstand that to mean that the government wants to be able to look at mail – what they really want is the metadata: the To/From/weight and tags of any package sent.
Other Sources
We should be deeply concerned about business records that are being voluntarily turned over to the intelligence community. PATRIOT required that banks turn transactional data over to law enforcement/intelligence agencies. But what about voluntary turn-over? That’s a topic that’s very seldom discussed, for obvious reasons. The FBI famously tried to get public libraries to turn over book access data, and even more famously tried to learn what Monica Lewinsky was reading during her affair with President Clinton. Credit bureaus, as private companies, are selling their customers’ data to the FBI, as are EZ-pass tollways, airport parking companies, and more. Since many of those are regulated private entities, they are particularly susceptible to having their business’ lifeline cut if they don’t play nicely with authority.
The new thing is face recognition databases. If you’re in an airport or a shopping mall, the airport is a private entity, not an agent of the government. They can do whatever they want with the feeds from their cameras. And that’s without even getting into the security checkpoints. If you use a public parking lot in any interesting place, take a look at the license-plate scanners at the exit-point. We’re busy watching Yahoo! and Apple squabbling about FBI National Security Letters, but we don’t even hear about the airports and parking lots because they’re much, much easier to pressure.
As I described here, traffic analysis is the key: once you can simultaneously place a cell phone handoff transaction with a license-plate scan and a high-likelihood facial database hit, you can be pretty sure that someone was someplace at a certain time. Privacy? What were they doing? It depends! If they were in the Apple store, they were probably looking at Apple stuff. But if they were at the same restaurant as a person of interest, arrived at more or less the same time as they did, left at more or less the same time as they did, and swapped emails with them the night before: it was a meeting.
The Razor In The Apple
Here’s one way to convince yourself that the surveillance is vastly greater than they are trying to misdirect you into believing it is: just the metadata isn’t conclusive. The metadata is being used for broad analytics and data clustering, so they can determine what events are interesting, and then the actual eyeballs attached to actual brains actually “read” the data. When you consider that the FISA court has never rejected a warrant request, and got 1,500 requests last year, you’re looking at the tip of an iceberg. Because the FISA requests are the cases where the metadata triggered an alert that some analyst concluded required more data than they already had in the system. Remember: the data was collected but wasn’t “looked at” so one possibility is that the NSA/FBI analyst clicks a “get me a warrant” button on their interface and it unlocks a FISA request and then they can see the data. Another possibility is that the analyst queries the long-term store, which contains the messages and phone conversations, etc, looks at them, and concludes that something more is needed, in which case the event is logged and a more detailed request is sent. My guess would be the situation looks more like the latter, especially with PRISM.
Here’s why: imagine you’re an analyst and your algorithms pop up a sequence of message-IDs that are flagged as containing communications regarding making a bomb, from someone who has a plane ticket to Washington for the inauguration. And you click on the “get more information” button and nothing comes back because the system collected only the metadata and the actual emails, ticket data, and phone conversations – were not collected. Yeah, right. Such a system would be not just useless, it would be irritatingly useless.
They’re collecting it all, tagging it and extracting metadata, and archiving the source content.**** They have redefined “surveillance” as “human eyes looking at the data” so the collected data is just – latent, not really surveillance data. But it’s still there. The system would be useless, otherwise. All the discussion about metadata is misdirection.
There is a lot more than this, obviously. One reason the “cyberwar” angle is important is because it justifies ubiquitous surveillance. Ditto “hacking” – for the last 20 years the intelligence community has been setting up the idea that hacking is a major threat, because that justifies surveillance of all internet traffic, because all internet traffic is potentially hacking. That’s a topic for another column.
I expect some potential argument about my assertion that a machine can engage in surveillance. Here’s another reason: if a bunch of algorithms can weigh and combine metadata about me, and put me on a TSA watchlist, then the algorithms are not merely doing analysis – they’re doing surveillance and (indeed) enforcement. That’s where we are heading right now. Remember the scene at the beginning of “Brazil” where the civil servant squashes the bug, it falls into the printer, and changes “Tuttle” to “Buttle” and thereby ruins someone’s life? Life is imitating art, thanks to the FBI.
RFC 821 – Simple Mail Transfer Protocol, first version
(* Based on my observation, published elsewhere, that power has no value unless it’s abused. Therefore anyone who wants power over others plans – whether they realize it or no – to abuse it. Thus, they are my enemy.)
(** I’ve seen McConnell and other NSA spokespeople, as well as an FBI spokesperson, very carefully choose their words in this manner. I referenced this particular instance because I remembered McConnell pulled it – and got away with it – in this debate. Rothenburg and Schneier are basically decent and honest people, therefore are fairly easily snookered in this manner.)
(*** And if I can pry it up, it’s not nailed down.)
(**** How the hell else do you think the FBI was able to get things like David Petraeus’ romantic txt messages, years after they were sent?)
Dunc says
Of course, the other way around this is “intelligence sharing”… “Sharing” sounds good, right? Surely nobody could be against sharing intelligence? But it means that the NSA can spy on British citizens, and GCHQ can spy on US citizens, and then they pool the results… Hey presto! Nobody is technically breaching their restrictions on domestic surveillance, but the result is the same.
anat says
In King County, WA people are up at arms because the county bought commercially available info in order to guess who might be the owner of an unregistered pet, and send them a letter threatening them with a fine if they don’t register their pet. Obviously some people got letters despite not owning pets (maybe they buy pet food to donate?) or despite their pets being registered (perhaps under the name of another member of the household). It might have gone quietly if the letter had been worded more softly. What people aren’t talking about? Their information is out there. Anyone can use it for any purpose imaginable.
Dunc says
anat: As Jello Biafra once put it:
For every spy in government
There’s 50 private eyes
Who round up dirt on you to keep on file
Then sell the file
Marcus Ranum says
Dunc@#1:
Of course, the other way around this is “intelligence sharing”…
You are, of course, referring to the “5 eyes” UKUSA treaty group, in which CSE (Canada) GCHQ (UK) NSA (USA) ASIS (Australia) and New Zealand all share intelligence. That has been going on for some time. In fact, in some cases, the NSA hosted collection on UK soil “denyably” collecting communications between England and Ireland, which they then shared back to GCHQ.
I have had the pleasure of chatting about this stuff with Duncan Campbell, who has done good work reporting on some of the weirder goings-on.
https://theintercept.com/2015/08/03/life-unmasking-british-eavesdroppers/
He’s the fellow who disclosed the NSA tower that was right in the path of the microwave communications link between England and Ireland.
It’s another of those ways that the various agencies deliberately bypass inconvenient laws, by parsing around them. It’s as if they’re a bunch of kids, or something, that simply won’t take “no” for an answer from their parents. Except, in this case, the parents deliberately look the other way. So they can pretend to be shocked.
Marcus Ranum says
anat@#2:
What people aren’t talking about? Their information is out there. Anyone can use it for any purpose imaginable.
There was a similar case in Oklahoma, in which the state police proposed to send speeding citations to any owner of a car that went between tollbooths faster than they could be expected to if they were following the speed limit. It’s a trivial application of data reduction but – as you can imagine – there was a tremendous, uh, “shit fit” and the upshot was that: the data is still collected, it’s just not used in that particular way at this particular time.
Marcus Ranum says
Dunc@#3:
There was some discussion a few years ago about a fellow that had purchased a variety of customer databases (this was in the UK) and had then cross-scrubbed them against a few stolen databases collected off pastebin. Then, using a set of bayesian classifiers, extracted subsets of the data that matched various profiles – it was damn scary, e.g.: “cancer patients (by cancer)” “people who are likely dead” “people who are pregnant” the results were extremely accurate – less than 1 in 10,000. There are certain signifiers that are highly indicative which can be combined with multiple others.
I believe the project was buried. The originators of the dataset proposed to sell subsets of the data for marketing purposes: that way an insurance company could say “Oh we didn’t do all that correlation – that’d be wrong – we just bought a dataset from a commercial company that specializes in that kind of thing… (flutters eyelashes)” Nowadays the “big data” folks have realized that to avoid regulators you have to do it all in-house.
Pierce R. Butler says
The intelligence apparatus has defined “surveillance” as “a pair of human eyeballs looks at it deliberately and knowingly.”
No doubt the NSA equipment locker contains multiple sets of eyepatches to allow key staff to function within regulations while receiving information they might Need to Know.
militantagnostic says
There is currently a big scandal in Quebec over the provincial police and the Montreal police tracking reporters via their cell phones in attempt to find the reporters’ sources.
Marcus Ranum says
militantagnostic@#8:
Were the systems put in for some other “appropriate” use? I.e.: terrorism or whatever?
These systems always get put in place for the best reasons. And then they get used for something entirely different. But it’s important to understand that the people who put those systems in place always planned to abuse them. Unless they were so remarkably stupid that it somehow never occurred to them that cell phone tracking could be used to track reporters, only drug dealers. The manufacturers of the systems definitely know they are designed to be abused.
Jake Harban says
The problem is that power always exists. You can’t get rid of it. And preventing the abuse of power requires power.
Marcus Ranum says
Jake Harban@#10:
The problem is that power always exists. You can’t get rid of it.
I’m not aware of any conservation law of power; It seems to me that power is a potential latent in any situation, and it does not always automatically need to be teased out to where it can be grasped and manipulated. Further, it seems to me that it can be gotten rid of, by coupling it with responsibility, or better still by redistributing it so that it’s no longer accessible without checks and balances. The “rule of law” is one example of a fair attempt to redistribute power – though, admittedly, powerful people seem to always be eager to place themselves above the law.
And preventing the abuse of power requires power.
Maybe… First off, the best way to prevent the abuse of power is to distribute it among multiple agendas (that’s a roundabout way of describing democracy) Secondly, I argue that leadership and power are two different things – in some situations it might be necessary to temporarily invest a sort of power in a leader, in the sense that “we are willing to follow your lead as long as we agree with you, during this crisis.” When the crisis is over, the willingness of the people to follow evaporates along with it. So I’d say it’s plausible that a people might temporarily form a power-bloc to disempower someone who was attempting to sieze power; that’s exactly what happened in Rome, to Tiberius Gracchus: “Now that the consul has betrayed the state, let every man who wishes to uphold the laws follow me!” and they beat him to death in the Senate. Following that moment of leadership, Scipio Nasica did not attempt to maintain power, it evaporated.