Sexism Poisons Everything

That black hole image was something, wasn’t it? For a few days, we all managed to forget the train wreck that is modern politics and celebrate science in its purest form. Alas, for some people there was one problem with M87’s black hole.

Dr. Katie Bouman, in front of a stack of hard drives.

A woman was involved! Despite the evidence that Dr. Bouman played a crucial role or had the expertise, they instead decided Andrew Chael had done all the work and she was faking it.

So apparently some (I hope very few) people online are using the fact that I am the primary developer of the eht-imaging software library () to launch awful and sexist attacks on my colleague and friend Katie Bouman. Stop.

Our papers used three independent imaging software libraries (…). While I wrote much of the code for one of these pipelines, Katie was a huge contributor to the software; it would have never worked without her contributions and

the work of many others who wrote code, debugged, and figured out how to use the code on challenging EHT data. With a few others, Katie also developed the imaging framework that rigorously tested all three codes and shaped the entire paper ();

as a result, this is probably the most vetted image in the history of radio interferometry. I’m thrilled Katie is getting recognition for her work and that she’s inspiring people as an example of women’s leadership in STEM. I’m also thrilled she’s pointing

out that this was a team effort including contributions from many junior scientists, including many women junior scientists (). Together, we all make each other’s work better; the number of commits doesn’t tell the full story of who was indispensable.

Amusingly, their attempt to beat back social justice within the sciences kinda backfired.

As openly lesbian, gay, bisexual, transgender, queer, intersex, asexual, and other gender/sexual minority (LGBTQIA+) members of the astronomical community, we strongly believe that there is no place for discrimination based on sexual orientation/preference or gender identity/expression. We want to actively maintain and promote a safe, accepting and supportive environment in all our work places. We invite other LGBTQIA+ members of the astronomical community to join us in being visible and to reach out to those who still feel that it is not yet safe for them to be public.

As experts, TAs, instructors, professors and technical staff, we serve as professional role models every day. Let us also become positive examples of members of the LGBTQIA+ community at large.

We also invite everyone in our community, regardless how you identify yourself, to become an ally and make visible your acceptance of LGBTQIA+ people. We urge you to make visible (and audible) your objections to derogatory comments and “jokes” about LGBTQIA+ people.

In the light of the above statements, we, your fellow students, alumni/ae, faculty, coworkers, and friends, sign this message.

[…]
Andrew Chael, Graduate Student, Harvard-Smithsonian Center for Astrophysics
[…]

Yep, the poster boy for those anti-SJWs is an SJW himself!

So while I appreciate the congratulations on a result that I worked hard on for years, if you are congratulating me because you have a sexist vendetta against Katie, please go away and reconsider your priorities in life. Otherwise, stick around — I hope to start tweeting

more about black holes and other subjects I am passionate about — including space, being a gay astronomer, Ursula K. Le Guin, architecture, and musicals. Thanks for following me, and let me know if you have any questions about the EHT!

If you want a simple reason why I spend far more time talking about sexism than religion, this is it. What has done more harm to the world, religion or sexism? Which of the two depends most heavily on poor arguments and evidence? While religion can do good things once in a while, sexism is prevented from that by definition.

Nevermind religion, sexism poisons everything.


… Whoops, I should probably read Pharyngula more often. Ah well, my rant at the end was still worth the effort.

Ridiculously Complex

Things have gotten quiet over here, due to SIGGRAPH. Picture a giant box of computer graphics nerds, crossed with a shit-tonne of cash, and you get the basic idea. And the papers! A lot of it is complicated and math-heavy or detailing speculative hardware, sprinkled with the slightly strange. Some of it, though, is fairly accessible.

This panel on colour, in particular, was a treat. I’ve been fascinated by colour and visual perception for years, and was even lucky enough to do two lectures on the subject. It’s a ridiculously complicated subject! For instance, purple isn’t a real colour.

The visible spectrum of light. Copyright Spigget, CC-BY-SA-3.0.

Ok ok, it’s definitely “real” in the sense that you can have the sensation of it, but there is no single wavelength of light associated with it. To make the colour, you have to combine both red-ish and blue-ish light. That might seem strange; isn’t there a purple-ish section at the back of the rainbow labeled “violet?” Since all the colours of the rainbow are “real” in the single-wavelength sense, a red-blue single wavelength must be real too.

It turns out that’s all a trick of the eye. We detect colour through one of three cone-shaped photoreceptors, dubbed “long,” “medium,” and “short.” These vary in what sort of light they’re sensitive to, and overlap a surprising amount.

Figure 2, from Bowmaker & Dartnall 1980. Cone response curves have been colourized to approximately their peak colour response.

Your brain determines the colour by weighing the relative response of the cone cells. Light with a wavelength of 650 nanometres tickles the long cone far more than the medium one, and more still than the short cone, and we’ve labeled that colour “red.” With 440nm light, it’s now the short cone that blasts a signal while the medium and long cones are more reserved, so we slap “blue” on that.

Notice that when we get to 400nm light, our long cones start becoming more active, even as the short ones are less so and the medium ones aren’t doing much? Proportionately, the share of “red” is gaining on the “blue,” and our brain interprets that as a mixture of the two colours. Hence, “violet” has that red-blue sensation even though there’s no light arriving from the red end of the spectrum.

To make things even more confusing, your eye doesn’t fire those cone signals directly back to the brain. Instead, ganglions merge the “long” and “medium” signals together, firing faster if there’s more “long” than “medium” and vice-versa. That combined signal is itself combined with the “short” signal, firing faster if there’s more “long”/”medium” than “short.” Finally, all the cone and rod cells are merged, firing more if they’re brighter than nominal. Hence where there’s no such thing as a reddish-green nor a yellow-ish blue, because both would be interpreted as an absence of colour.

I could (and have!) go on for an hour or two, and yet barely scratch the surface of how we try to standardize what goes on in our heads. Thus why it was cool to see some experts in the field give their own introduction to colour representation at SIGGRAPH. I recommend tuning in.

 

Continued Fractions

If you’ve followed my work for a while, you’ve probably noted my love of low-discrepancy sequences. Any time I want to do a uniform sample, and I’m not sure when I’ll stop, I’ll reach for an additive recurrence: repeatedly sum an irrational number with itself, check if the sum is bigger than one, and if so chop it down. Dirt easy, super-fast, and most of the time it gives great results.

But finding the best irrational numbers to add has been a bit of a juggle. The Wikipedia page recommends primes, but it also claimed this was the best choice of all:\frac{\sqrt{5} - 1}{2}

I couldn’t see why. I made a half-hearted attempt at digging through the references, but it got too complicated for me and I was more focused on the results, anyway. So I quickly shelved that and returned to just trusting that they worked.

That is, until this Numberphile video explained them with crystal clarity. Not getting the connection? The worst possible number to use in an additive recurrence is a rational number: it’ll start repeating earlier points and you’ll miss at least half the numbers you could have used. This is precisely like having outward spokes on your flower (no seriously, watch the video), and so you’re also looking for any irrational number that’s poorly approximated by any rational number. And, wouldn’t you know it…

\frac{\sqrt{5} - 1}{2} ~=~ \frac{\sqrt{5} + 1}{2} - 1 ~=~ \phi - 1

… I’ve relied on the Golden Ratio without realising it.

Want to play around a bit with continued fractions? I whipped up a bit of Go which allows you to translate any number into the integer sequence behind its fraction. Go ahead, muck with the thing and see what patterns pop out.

All the President’s Bots

Trump appears cranky. It’s raining New Jersey, so he can’t golf work, which leaves him with no choice but to hate-watch CNN. Vets are angry with him, his policies are hurting his base, the polls have him at his lowest point since taking office, foreign diplomats view him as a clown, and he has nothing to show for his first six months.

He still has friends, though.

"@1lion: brilliant 3 word response to Hilary's 'I'm With You' slogan. @realDonaldTrump twitter.com/seanhannity/"Aww, at least one person likes Trump!

ilion on Twitter: STILL hasn't made a single Tweet.

… or maybe not? As an old Cracked article pointed out, Trump had a habit of quoting Tweets that didn’t exist, from people who just joined Twitter or were obvious bots. It was an easy way to make himself look more popular than he was, and stroke his ego. He put this to rest after winning the presidency, but that appears to be changing.

In a tweet on Saturday, President Donald Trump expressed thanks to Twitter user @Protrump45, an account that posted exclusively positive memes about the president. But the woman whose name was linked to the account told Heavy that her identity was stolen and that she planned to file a police report. The victim asserted that her identity was used to sell pro-Trump merchandise.

Although “Nicole Mincey” was the name displayed on the Twitter page, it was not the name used to create the account. The real name of the victim has been withheld to protect her privacy.

The @Protrump45 account also linked to the website Protrump45.com which specialized in Trump propaganda. All of the articles on the website were posted by other Twitter users, which also turned out to be fakes. Mashable noted that the accounts were suspected of being so-called “bots” used to spread propaganda about Trump. Russia has been accused of using similar tactics with bots during the 2016 campaign.

The “Nicole Mincey” scam was remarkably advanced, backed up by everything from paid articles pretending to be journalism to real-life announcers-for-hire singing her praises.

So the latest thing in the Trump resistance is bot-hunting. It’s pretty easy to do, once you’ve seen someone else do it, and the takedown procedure is also a breeze. It also silences a lot of Trump’s best friends.

If only we could do the same to Trump.

Russian Hacking Videos

In the last part of my series on the DNC hack, I mentioned that I watched a seminar hosted by Crowdstrike on how it was done. Some Google searching didn’t turn up much at first, but it did reveal other videos from Crowdstrike and other security firms. I’m still shaking my head at the view counts of some of these; shouldn’t reporters have swarmed them?

Ah well. If you’d like to see how these security companies viewed the DNC hack, here are some videos to check out.

[Read more…]

Russian Hacking and Bayes’ Theorem, Part 4

Ranum’s turn! Old blog post first.

Joking aside, Putin’s right: the ‘attribution’ to Russia was very very poor compared to what security practitioners are capable of. This “it’s from IP addresses associated with Russia” nonsense that the US intelligence community tried to sell is very thin gruel.

Here’s the Joint Analysis Report which has been the focus of so much ire, as well as a summary paragraph of what the US intelligence agency is trying to sell:

Previous JARs have not attributed malicious cyber activity to specific countries or threat actors. However, public attribution of these activities to RIS is supported by technical indicators from the U.S. Intelligence Community, DHS, FBI, the private sector, and other entities. This determination expands upon the Joint Statement released October 7, 2016, from the Department of Homeland Security and the Director of National Intelligence on Election Security.

They aren’t using IP addresses or attack signatures to sell attribution, they’re pooling all the analysis they can get their hands on, public and private. It’s short on details, partly for reasons I explained last time, and partly because it makes little sense to repeat details shared elsewhere.

I agree with most experts that the suggestions given are pretty useless, but that’s because defending against spearphishing is hard. Oh, it’s easy to white list IP access and lock down a network, but actually do that and your users will revolt and find workarounds that a network administrator can’t monitor.

The reporting on the Russian hacking consistently fails to take into account the fact that the attacks were pretty obvious, basic phishing emails. That’s right up the alley of a 12-year-old. In fact, let me predict something here, first: eventually some 12-year-old is going to phish some politician as a science fair project and there will be great hue and cry. It really is that easy.

I dunno, there’s a fair bit of creativity involved in trickery. You need to do some research to figure out the target’s infrastructure (so you don’t present them with a Gmail login if they’re using an internal Exchange server); research their social connections (an angry email from their boss is far more likely to get a response); find ways to disguise the URL displayed that neither a human nor browser will notice; construct an SSL certificate that the browser will accept; and it helps if you can find a way around two-factor encryption. The amount of programming is minimal, but so what? Computer scientists tend to value the ability to program above everything else, but systems analysis and design are arguably at least as important.

I wouldn’t be surprised to learn of a 12-year-old capable of expert phishing, any more than I’d be surprised that a 12-year-old had entered college or ran their own business or successfully engineered their own product; look at enough cases, and eventually you’ll see something exceptional.

By the way, there are loads of 12-year-old hackers. Go do a search and be amazed! It’s not that the hackers are especially brilliant, unfortunately – it’s more that computer security is generally that bad.

And yes, the state of computer security is fairly abysmal. Poor password choices (if people use passwords at all), poor algorithms, poor protocols, and so on. This is irrelevant, though; the fact that house break-ins are easy to do doesn’t refute the evidence that someone burgled a house.

Hey, that was quick. Next post!

Hornbeck left off two possibilities, but I could probably (if I exerted myself) go on for several pages of possibilities, in order to make assigning prior probabilities more difficult. But first: Hornbeck has left off at least two cases that I’d estimate as quite likely:

H) Some unknown person or persons did it
I) An unskilled hacker or hackers who had access to ‘professional’ tools did it
J) Marcus Ranum did it

I’d argue the first two are handled by D, “A skilled independent hacking team did it,” but it’s true that I assumed a group was behind the attack. Could the DNC hack be pulled off by an individual? In theory, sure, but in practice the scale suggests more than one person involved. For instance,

That link is only one of almost 9,000 links Fancy Bear used to target almost 4,000 individuals from October 2015 to May 2016. Each one of these URLs contained the email and name of the actual target. […]

SecureWorks was tracking known Fancy Bear command and control domains. One of these lead to a Bitly shortlink, which led to the Bitly account, which led to the thousands of Bitly URLs that were later connected to a variety of attacks, including on the Clinton campaign. With this privileged point of view, for example, the researchers saw Fancy Bear using 213 short links targeting 108 email addresses on the hillaryclinton.com domain, as the company explained in a somewhat overlooked report earlier this summer, and as BuzzFeed reported last week.

That SecureWorks report expands on who was targeted.

In March 2016, CTU researchers identified a spearphishing campaign using Bitly accounts to shorten malicious URLs. The targets were similar to a 2015 TG-4127 campaign — individuals in Russia and the former Soviet states, current and former military and government personnel in the U.S. and Europe, individuals working in the defense and government supply chain, and authors and journalists — but also included email accounts linked to the November 2016 United States presidential election. Specific targets include staff working for or associated with Hillary Clinton’s presidential campaign and the Democratic National Committee (DNC), including individuals managing Clinton’s communications, travel, campaign finances, and advising her on policy.

Even that glosses over details, as that list also includes Colin Powell, John Podesta, and William Rinehart. Also bear in mind that all these people were phished over roughly nine months, sometimes multiple times. While it helps that many of the targets used Gmail, when you add up the research involved to craft a good phish, plus the janitorial work that kicks in after a successful attack (scanning and enumeration, second-stage attack generation, data transfer and conversion), the scale of the attack makes it extremely difficult for an individual to pull off.

Similar reasoning applies to an unskilled person/group using professional tools. The multiple stages to a breach would be easy to screw up, unless you had experience carrying these out; the scale of the phish demands a level of organisation that amateurs shouldn’t be capable of. Is it possible? Sure. Likely? No. And in the end, it’s the likelihood we care about.

Besides, this argument tries to eat and have its cake. If spearphishing attacks are so easy to carry out, the difference between “unskilled” and “skilled” is small. Merely pulling off this spearphish would make the attackers experienced pros, no matter what their status was beforehand. The difference between hypotheses D and I is trivial.

There’s even more unconscious bias in Hornbeck’s list: he left Guccifer 2.0 off the list as an option. Here, you have someone who has claimed to be responsible left off the list of priors, because Hornbeck’s subconscious presupposition is that “Russians did it” and he implicitly collapsed the prior probability of “Guccifer 2.0” into “Russians” which may or may not be a warranted assumption, but in order to make that assumption, you have to presuppose Russians did it.

Who is Guccifer 2.0, though? Are they a skilled hacking group (hypothesis D), a Kremlin stooge (A), an unknown person or persons (H), or amateurs playing with professional tools (I)? “Guccifer 2.0 did it” is a composite of existing hypothesis subsets, so it makes more sense to focus on those first then drill down.

I added J) because Hornbeck added himself. And, I added myself (as Hornbeck did) to dishonestly bias the sample: both Hornbeck and I know whether or not we did it. Adding myself as an option is biasing the survey by substituting in knowns with my unknowns, and pretending to my audience that they are unknowns.

Ranum may know he didn’t do it, but I don’t know that. What’s obvious to me may not be to someone else, and I have to account for that if I want to do a good analysis. Besides, including myself fed into the general point that we have to liberal with our hypotheses.

I) is also a problem for the “Russian hackers” argument. As I described the DNC hack appears to have been done using a widely available PHP remote management tool after some kind of initial loader/breach. If you want a copy of it, you can get it from github. Now, have we just altered the ‘priors’ that it was a Russian?

This is being selective with the evidence. Remember “Home Alone?” Harry and Marv used pretty generic means to break into houses, from social engineering to learn about their targets, surveillance to verify that information and add more, and even crowbars on the locks. If that was all you knew about their techniques, you’d have no hope of tracking them down; but as luck would have it, Marv insisted on turning on all the faucets as a distinctive calling card. This allowed the police to track down earlier burglaries they’d done.

Likewise, if all we knew was that a generic PHP loader was used in the DNC hack, the evidence wouldn’t point strongly in any one direction. Instead, we know the intruders also used a toolkit dubbed “XAgent” or “CHOPSTICK,” which has been consistently used by the same group for nearly a decade. No other group appears to use the same tool. This means we can link the DNC hack to earlier ones, and by pooling all the targets assess which actor would be interested in them. As pointed out earlier, these point pretty strongly to the Kremlin.

I don’t think you can even construct a coherent Bayesian argument around the tools involved because there are possibilities:

  1. Guccifer is a Russian spy whose tradecraft is so good that they used basic off the shelf tools
  2. Guccifer is a Chinese spy who knows that Russian spies like a particular toolset and thought it would be funny to appear to be Russian
  3. Guccifer is an American hacker who used basic off the shelf tools
  4. Guccifer is an American computer security professional who works for an anti-malware company who decided to throw a head-fake at the US intelligence services

Quick story: I listened to Crowdstrike’s presentation on the Russian hack of the DNC, and they claimed XAgent/CHOPSTICK’s source code was private. During the Q&A, though, someone mentioned that another security company claimed to have a copy of the source.

The presenters pointed out that this was probably due to a quirk in Linux attacks. There’s a lot of variance in which kernel and libraries will be installed on any given server, so merely copying over the attack binary is prone to break. Because of this variety, though, it’s common to have a compiler installed on the server. So on Linux, attackers tend to copy over their source code, compile it into a binary, and delete the code.

You can see how this could go wrong, though. If the stub responsible for deleting the original code fails, or the operators are quick, you could salvage the source code of XAgent.

“Could.” Note that you need the perfect set of conditions in place. Even if those did occur, and even if the source code bundle contains Windows or OSX source too (excluding that would reduce the amount of data transferred and increase the odds of compilation slightly), the attack binary for those platforms usually needs to be compiled elsewhere. Compilation environments are highly variable yet leave fingerprints all over the executable, such as compilation language and time-stamps. A halfway-savvy IT security firm (such as FireEye) would pick up on those differences and flag the executable as a new variant, at minimum.

And as time went on, the two code bases would diverge as either XAgent’s originators or the lucky ducks with their own copy start modifying it. Eventually, it would be obvious one toolkit was in the hands of another group. And bear in mind, the first usage of XAgent was about a decade ago. If this is someone using a stolen copy of APT28/Fancy Bear’s tool, they’ve either stolen it recently and done an excellent job of replicating the original build environment, or have faked being Russian for a decade without slipping up.

While the above is theoretically possible, there’s no evidence it’s actually happened; as mentioned, despite years of observation by at least a half-dozen groups capable of detecting this event, only APT28 has been observed using XAgent.* None of Ranum’s options fit XAgent, nor do they fit APT28’s tactics either; from FireEye’s first report (they now have a second, FYI),

Since 2007, APT28 has systematically evolved its malware, using flexible and lasting platforms indicative of plans for long-term use. The coding practices evident in the group’s malware suggest both a high level of skill and an interest in complicating reverse engineering efforts.

APT28 malware, in particular the family of modular backdoors that we call CHOPSTICK, indicates a formal code development environment. Such an environment would almost certainly be required to track and define the various modules that can be included in the backdoor at compile time.

And as a reminder, APT28 aka. Fancy Bear is one of the groups that hacked into the DNC, and is alleged to be part of the Kremlin.

Ranum does say a lot more in that second blog post, but it’s either similar to what Biddle wrote over at The Intercept or amounts to kicking sand at Bayesian statistics. I’ve covered both angles, so the rest isn’t worth tackling in detail.

  • [HJH: On top of that, from what I’m reading APT28 prefers malware-free exploits, which use existing code on Windows computers to do their work. None of it works on Linux, so its source code would never be revealed via the claimed method.]

Sometimes, Bugs are Inevitable

Good point:

“Hacking an election is hard, not because of technology — that’s surprisingly easy — but it’s hard to know what’s going to be effective,” said [Bruce] Schneier. “If you look at the last few elections, 2000 was decided in Florida, 2004 in Ohio, the most recent election in a couple counties in Michigan and Pennsylvania, so deciding exactly where to hack is really hard to know.”

But the system’s decentralization is also a vulnerability. There is no strong central government oversight of the election process or the acquisition of voting hardware or software. Likewise, voter registration, maintenance of voter rolls, and vote counting lack any effective national oversight. There is no single authority with the responsibility for safeguarding elections.

You run into this all the time when designing systems. One or more of the requirements are a dilemma, pitting one need against another. Ease-of-use vs. security, authentication vs. anonymity, you know the type. Fixing a bug related to that requirement may cause three more to pop up, and that may not be your fault. The US election system is tough to hack, because it’s a patchwork of incompatible systems; but it’s also easy to hack, because some patches are less secure than others and the borders between patches lack a clear, consistent interface. Solving this sort of problem usually means trashing the system and starting from scratch, with a long, extensive consultation session.

Oh yeah, and an NSA report provides evidence that Russia hacked some distance into US voting systems. The Intercept also outed their source, the reporters somehow forgot that all colour printers output a unique stenographic code while printing. That doesn’t speak highly of them, the practice is decades old, and they should have know this as the Intercept was founded on sharing sensitive documents.

[HJH 2017-06-19: A minor update here.]