A traffic analysis exercise
3 August 2009
In the 1950’s Lambros D. Callimahos built the Zendian Problem [zip] as an integrated exercise in traffic analysis, cryptanalysis, and communications intelligence operations. The state of cryptology today has moved on, beyond the point where an analyst can rely on plaintext to drive operations. The state of traffic analysis techniques, and the availability of more computing power, requires a new generation of exercises to sharpen the tools and minds of those in the field.
Steven Murdoch and myself have been developing, on (and mostly) off, over the past year an exercise in traffic analysis, and in particular the long term disclosure attacks. The exercise was first presented and used at the Brno FIDIS summer school, and we are now using it as part of an industrial training curriculum.
The exercise consists of an anonymized trace of communications, that were mediated by an anonymity system, that a group of people used to message each other. The message traces are synthetic, but generated based on a real-world social network. Users have favorite communication partners, talk more or less according to the type of relationship and time of day, and may reply to each others messages.
The goal is to apply any disclosure attacks, and de-anonymize and trace as many messages as possible. An oracle is provided that outputs the success rate, and the instructor’s pack includes the original messages as well as the scripts used to simulate the messaging behaviours and the anonymization layer. We tried to keep the exercise and success rates realistic, so do not expect to ever get 100% — significantly better than random is already quite good.
The richness of the messaging behaviour is designed to stress the most advanced statistical disclosure techniques, that make use of social network, replies, and perfect matchings. The literature on statistical disclosure can be found on the exercise page, and an example implementing the simple SDA is provided in the bundle. The family of Disclosure Attacks (devised by Kesdoganet al.) might also be modified and applied to the exercise. Our new attack soon to be presented at PET, using Bayesian Inference, could also be applied:
- George Danezis and Carmela Troncoso. Vida: How to use Bayesian inference to de-anonymize persistent communications. Privacy Enhancing Technologies Symposium (PETS 2009), Seattle, USA.
A couple of caveats: this is an exercise, to help people learn about long term traffic analysis attacks, and allow them to implement the attacks on a rich, but safe, dataset. The objective is to learn.
- It is not a benchmarking tool between attacks. We are not sure that the traffic patterns are typical enough to make sure that when an attack performs better in the setting of our exercise it would perform better on real data.
- It is also not a competition or test. We publish all of the hidden state for instructors, and the random number generator used was not cryptographically strong. The point is not who can get the highest score, but the quality of understanding of the attacks.
Caveats aside, I do hope that the exercise opens a discussion about how we can exchange training or live datasets, formats for evaluating traffic analysis attacks, and a level of standardisation to the interfaces of attack scripts. These will probably be topics for debate over the Privacy Enhancing Technologies Symposium 2009, next week.
Both myself and Steven would be very interested to hear your experiences with the exercise, either if you take it yourself, or give it to a class as an instructor. If you extend the exercise, or generate particular bundles of anonymized datasets, we would also be happy to host them.