XKeyScore

If you’re like me, you’re probably getting inundated with posts about how the latest revelations show that NSA specifically tracks Tor users and the privacy conscious. I wanted to provide some perspective of how XKeyscore fits into an overall surveillance system before jumping out of our collective pants. As I’ve written about before, the Intelligence Lifecycle (something that the NSA and other Five Eyes know all to well) consists more-or-less of these key phases: Identify, Collect, Process, Analyze, and Disseminate. Some of us are a bit up-in-arms, about Tor users specifically being targeted by the NSA, and while that’s a pretty safe conclusion, I don’t think it takes into account what the full system is really doing.

XKeyscore is part of the “Collect” and “Process” phases of the life cycle where in this case they are collecting your habits and correlating it to an IP address. Greenwald and others will show evidence that the NSA’s goal is to, as they say “collect it all” but this isn’t a literal turn of phrase. It’s true there is a broad collection net, but the NSA is not collecting everything about you. At least not yet. As of right now, the NSA’s collection initiatives lean more towards collecting quantifiable properties which have the highest reward and the lowest storage cost. That’s not as sexy of a phrase to repeat throughout your book tour though.

The conclusion may be (and it’s an obvious one) what you’re seeing of XKeyscore is a tiny fraction of the overall picture. Yes they are paying attention to people that are privacy conscious, yes they are targeting Tor users, yes they are paying attention to people that visit the Tor web page. But as the name implies, this may contribute to an overall “score” to make conclusions about whether you are a high value target or not. What other online habits do you have that they may be paying attention to. Do you have a reddit account subscribed to /r/anarchy or some other subreddit they would consider extremist. Tor users aren’t that special, but this section of the code is a great way to get people nervous.

As someone who has worked on a collection and analysis engine at one time, I can say that one of the first steps during the collection process is tagging useful information, and automatically removing useless information. In this case, tagging Tor users and dropping cat videos. It appears that XKeyscore is using a whitelist of properties to what they consider suspicious activity, which would then be passed on to the “Analysis” phase to help make automated conclusions. The analysis phase is where you get to make predictive conclusions about the properties you have collected so far.

Take the fact that your IP address uses Tor. Add it to a list of extremist subreddits you visit. Multiply it by the number of times you searched for the phrase “how to make a bomb” and now you’re thinking of what the analytics engine of the NSA would look like.

My point is this: If you were the NSA, why wouldn’t ‘you target the privacy aware? People doing “suspicious” (for some definition of the word) activities are going to use the same tools that a “normal” (some other definition) person would. We don’t have a good understanding of what happens to the information after it’s been gathered. We know that XKeyscore will log IP’s that have visited sites of interest or performed searches for “extremist” things like privacy tools. We know that there have been cases where someone’s online activities have been used in court cases. But can’t connect the dots. XKeyscore is just the collection/processing phase and the analytic phase is what’s more important. I think the people of the Tor Project have a pretty decent perspective on this. Their responses have generally just re-iterated that this is exactly the threat model they’ve always planned for and they will keep working on ways to improve and protect its users.