With a great deal of nervousness, Ton Siedsma agreed to an experiment. He would load an app on his smartphone that would send all its activity metadata for one week to Dimitri Tokmetzis who works on datajournalism projects and who would in turn forward it to the iMinds research team of Ghent University and Mike Moolenaar, owner of Risk and Security Experts. All three would analyze the metadata to see what they could learn about Siedsma.
The amount they learned was shocking.
What is metadata? It is not the content of the communications but the envelope in which that data is enclosed.
Metadata is not the actual content of the communication, but the data about the communication; like the numbers he calls or whatsapps, and where his phone is at a particular moment. Whom he e-mails, the subject of the e-mails and the websites he visits.
Thanks to Edward Snowden, government officials from President Obama downwards have been forced to acknowledge that they have been collecting people’s metadata but pooh-poohed this as something minimal, even trivial, that should not raise privacy concerns. But what this limited experiment revealed is that the metadata reveals almost everything about our lives.
This is what we were able to find out from just one week of metadata from Ton Siedsma’s life. Ton is a recent graduate in his early twenties. He receives e-mails about student housing and part-time jobs, which can be concluded from the subject lines and the senders. He works long hours, in part because of his lengthy train commute. He often doesn’t get home until eight o’clock in the evening. Once home, he continues to work until late.
His girlfriend’s name is Merel. It cannot be said for sure whether the two live together. They send each other an average of a hundred WhatsApp messages a day, mostly when Ton is away from home. Before he gets on the train at Amsterdam Central Station, Merel gives him a call. Ton has a sister named Annemieke. She is still a student: one of her e-mails is about her thesis, judging by the subject line. He celebrated Sinterklaas this year and drew lots for giving gifts.
Ton likes to read sports news on nu.nl, nrc.nl and vk.nl. His main interest is cycling, which he also does himself. He also reads Scandinavian thrillers, or at least that’s what he searches for on Google and Yahoo. Other interests of his are philosophy and religion. We suspect that Ton is Christian. He searches for information about religion expert Karen Armstrong, the Gospel of Thomas, ‘the Messiah book Middle Ages’ and symbolism in churches and cathedrals. He gets a lot of information from Wikipedia.
Ton also has a lighter side. He watches YouTube videos like ‘Jerry Seinfeld: Sweatpants’ and Rick Astley’s Never Gonna Give You Up. He also watches a video by Roy Donders, a Dutch reality TV sensation. On the Internet, he reads about ‘cats wearing tights’, ‘Disney princesses with beards’ and ‘guitars replaced by dogs’. He also searches for a snuggie, with a certain ‘Batman Lounger Blanket With Sleeves’ catching his eye. Oh, and he’s intensively looking for a good headset (with Bluetooth, if possible).
We also suspect that he sympathises with the Dutch ‘Green Left’ political party. Through his work (more about that later), he’s in regular contact with political parties. Green Left is the only party from which he receives e-mails through his Hotmail account. He has had this account longer than his work account.
There is a whole lot more about his private life that they found out and is in the article. Using the metadata, the analysts were able to obtain key passwords.
But that’s not all. The analysts from the Belgian iMinds compared Ton’s data with a file containing leaked passwords. In early November, Adobe (the company behind the Acrobat PDF reader, Photoshop and Flash Player) announced that a file containing 150 million user names and passwords had been hacked. While the passwords were encrypted, the password hints were not. The analysts could see that some users had the same password as Ton, and their password hints were known to be ‘punk metal’, ‘astrolux’ and ‘another day in paradise’. ‘This quickly led us to Ton Siedsma’s favourite band, Strung Out, and the password “strungout”,’ the analysts write.
With this password, they were able to access Ton’s Twitter, Google and Amazon accounts. The analysts provided a screenshot of the direct messages on Twitter which are normally protected, meaning that they could see with whom Ton communicated in confidence. They also showed a few settings of his Google account. And they could order items using Ton’s Amazon account – something which they didn’t actually do. The analysts simply wanted to show how easy it is to access highly sensitive data with just a little information.
So while Siedsma’s life has been so exposed, remember that they learned all this from just one week of collecting his metadata. Imagine what the NSA could do with their more sophisticated methods, practically unlimited resources at its disposal, the force of the government behind it, and unrestricted time to spy on people.
As the people who did the study say, “So the next time you hear a minister, security expert or information officer say ‘Oh, but that’s only metadata,’ think of Ton Siedsma – the guy you now know so much about because he shared just a week of metadata with us.”
This is the type of thing that is rarely, if ever discussed, and even if it were most people would not read it.
The following article is similar, although a bit tongue in cheek. I assign it for reading and discussion in one of my undergraduate classes. It always generates interest.
Marcus Ranum says
They are LYING.
They collect IT ALL. They only “look at” the metadata. Watch closely for the words they use and you’ll see there is very finely chosen language in play. From a technical side, they have to be collecting all the content because frequently the metadata is embedded in the content! We also know that they single out certain metadata as indicators of interest: using PGP or other encryption — which can only be based on the content of the message (PGP headers are in the message body not the SMTP envelope)
Lastly, we can infer they are lying because the system has no purpose otherwise. Imagine how pointless it would be for the metadata analysis to red-flag a message but when the analyst goes to investigate, there’s no content! Duh!
Nicely stated Marcus.
Not only did they learn all this from just one week of his “metadata”, they learned it from just his “metadata”. Had they had similar access to the “metadata” of all his contacts, what they could learn about him (and each of his family, social group, and work colleagues) would probably increase by an order of magnitude.
Also, he knew his data was being collected, was going to be analysed, and the results published. I’d imagine those analysing his activity agreed not to reveal anything too embarrassing, but he knew they would know. However much he tried to ignore that knowledge and behave exactly as he normally would, it had to have some effect.
The metadata collected can be reconstructed into, and presented to the public, as poll results. These results will be more accurate than polls. People lie to polls but their actions over time reveal more truths about what they believe.
The Associated Press reported on May 24, 2011 that Risen is being called as a witness in the Jeffrey Sterling trial for alleged leaks of classified information. Information illegally collected is reconstructed by using only legal means to eliminate the use of illegally collected information in prosecutions. Risen is under State pressure to reveal his sources to complete the creation of a legal prosecution known otherwise only by illegal means.
Big Brother is watching you.
Mano Singham says
That article was pretty clever!
No, No, No! This is DATA not ‘metadata’. The number he calls is a specific data item value. So is his location. So is the name of the person he calls. So is the address of the person he emails and the subject, and the addresses of the websites he visits. These are ALL data items with specific values relative to an individual, NOT metadata.
Metadata is ‘data about data’ such as:
telephone number Type AN, length 12, format ‘nnn-nnn-nnnn’
first name of addressee Type AN, length 20
middle name Type AN, length 20
last name Type AN, length 25
Calling it ‘metadata’ is just a government ploy to lake it sound more inocuous than it really is; they are trying to pretend that they don’t really collect actual data about an individual but they do, they do. Let’s be more accurate in our own terminology and stop letting them get away with it by default.
Let’s be clear, though: They only learned his password because he was foolish enough to use the name of his favorite band as his password. I won’t condemn him for it; I myself practice fairly poor password hygiene, if I’m being entirely honest. But if his password was a random phrase — even if it wasn’t particularly “strong” by modern infosec standards — they never could have guessed it using this approach.
They use “metadata” precisely because it’s an ambiguous term. In the sense they’re using it (not a wrong sense, incidentally, just a carefully chosen one), the “content” is the data, so those things that identify that particular piece of content are the metadata, therefore the sender, receiver, time, etc. The way you’re using it, sailor1031, is in a strict database sense, which is not how it gets used in my field (information management, knowledge engineering, (yes, that’s a stupid term, but it gets used), some senses of information architecture, semantic [various things], etc.) As Marcus has pointed out, they are still lying about only collecting the “metadata”.
sailor1031 @7 is right, of course. But we’re going to get used to this semi-false distinction between the message we want carried, or the information we want retrieved, and the ancillary data and meta-data necessary to make it work at all. What we now call meta-data is, in many ways, more revealing than the humdrum message it envelopes.