The Privacy Enhinacing Technologies Symposium for 2008 is getting closer, and many of us are looking forward to a program packed with anonymous communications and traffic analysisresearch. Paul Syverson and myself (George Danezis) will be presenting our recent work on “Bridging and Fingerprinting” (PDF), a family of attacks that affect mix or onion routing based systems in which users do not know all possible relays in the network.

The starting point of our analysis is that it is getting expensive for all clients in anonymity networks (like Tor) to know all anonymizing routers, along with their cryptographic keys and addressing information. There are about a few thousands right now, that their number will hopefully go up. So would anything bad happen if each client only knows a random subset of those routers, and builds anonymous circuits using those? We show that there is a new class of attacks that might be possible.

First there is path or circuit Fingerprinting, first introduced in a work with Richard Clayton. If Alice builds paths from a small subset of routers that she knows (that is also known to the adversary), it is likely that the resulting paths and path fragments uniquely identify her. No other client would know all nodes necessary to build them. If the adversary observed a circuit fragment, they can reduce the set of possible initiators to those  clients that know all relays in the fragment — a number that is smaller than all clients in the network.

Secondly there is Bridging, the novel attack proposed. The adversary observes connections or messages going through an honest anonymizing relay, and tries to infer which input correspond to which output. It must be the case that for any potential connection though the honest node the resulting path fragment should be known to at least one node in the network. Therefore the adversary can eliminate all potential routes that could not have been created by any client, and may end up `bridging’ this honest stage.

The paper is filled with theory and probabilities describing when these attacks succeed. But what do they mean for anonymous communication designers? First they are not really a direct threat for low-latency communications like Onion Routing or Tor, since the effort of collecting the data necessary to perform these attacks is outside the threat model of these systems — an adversary that can perform bridging or fingerprinting is likely to be also to otherwise break those systems. Still it is important to point out that the only information necessary to perform them is link level information, and not fine-grained traffic data necessary for timing analysis. They are still within the threat models of mix networks like mixminion.

Still this work shows that naively allowing users to know only part of the network reduces the security of anonymous communications. Non-naive strategies for achieving such architectures could include splitting the networks or learning a random subset of nodes without the adversary finding them out. The second architecture opens the way for a new type of service: a Private Information Retrieval based Directory Server. Clients can connect to it and retrieve some router records, without any observer or the service itself learning that set: this would make both Bridging and Fingerprinting much harder.

For a long time I have been sceptical about Private Information Retrieval (PIR) schemes and security schemes based on them. My first experience of PIR was in the single server setting, where communication and computation complexity makes them impractical. Re-reading the The Pynchon Gate I realized that multi-server PIR systems are computationally cheap, bandwidth efficient and relatively simple to implement.

The ‘only’ downside of multi-server PIR is that they are subject to compulsion attacks. A powerful adversary can force servers, after a query, to reveal the client queries, and can infer which document was retrieved. This is an inherent limitation of using a collection of trusted parties, so it is difficult to eliminate. On the other hand a system can make the task of the attacker much more expensive and difficult, though the use of forward security mechanisms.

Here is a proposal for achieving forward-secure compulsion-resistant multi-server PIR: the user contacts the servers one by one, using an encryption channel providing forward secrecy (OTR would work; so would SSL using signed ephemeral DH.) After the result of the query is returned, the server securely deletes all information about the query, and forgets the session keys associated with the channel. At this point an adversary will never be able to retrieve any information about the query or the result, even if they get access to all the secrets on the server.

The user can then proceed to perform the same protocol sequentially with all the other servers participating in the PIR scheme. After sessions with each server close, the user is guaranteed that the query information will never be retrieved in the future. A single honest server, willing to provide strong guarantees against compulsion, is sufficient to guarantee this property, even if all the others log requests and are ready to hand them over to the adversary.

Furthermore the sequential nature of the requests allow a client to terminate the query early, if there is any suspicion that one or more servers act under compulsion. This could be detected through a covert channel, a change of key, or unavailability. This technique is a further argument for operators to terminate their services instead of giving in to compulsion.

The list of accepted papers for the Privacy Enhancing Technologies Symposium for 2008 (PET) has just been published. The program this year is very heavy on anonymous communication, and traffic analysis, cementing the symposiumas the de-facto venue in these disciplines.

A selection of papers to appear, that make a very interesting read  (thanks to the authors giving me a copy!), include:

  • Studying Timing Analysis on the Internet with SubRosa
    Hatim Daginawala and Matthew Wright (University of Texas at Arlington, USA)
  • Perfect Matching Disclosure Attacks
    Carmela Troncoso, Benedikt Gierlichs, Bart Preneel, and Ingrid Verbauwhede (COSIC, K.U. Leuven, Belgium)
  • On the Impact of Social Network Profiling on Anonymity
    Claudia Diaz, Carmela Troncoso (K.U.Leuven, Belgium), and Andrei Serjantov (The Free Haven Project, UK)
  • Chattering Laptops
    Tuomas Aura (Microsoft Research, UK), Janne Lindqvist (Helsinki University of Technology, Finland), Michael Roe (Microsoft Research, UK), and Anish Mohammed (Royal Holloway, University of London, UK)
  • Metrics for Security and Performance in Low-Latency Anonymity Systems
    Steven J. Murdoch and Robert N. M. Watson (Computer Laboratory, University of Cambridge, UK)

I should blog in detail about our own contributions, when the final drafts are in, so that I can provide a link to the full text:

  • Bridging and Fingerprinting: Epistemic Attacks on Route Selection
    George Danezis (Microsoft Research, Cambridge, UK) and Paul Syverson (Naval Research Laboratory, USA)
  • How to Bypass Two Anonymity Revocation Schemes
    George Danezis (Microsoft Research, Cambridge, UK) and Len Sassaman (K.U. Leuven, Belgium)

Lets not forget that the PET Award for 2008 deadline is still ahead of us, and good papers from the last two years deserve to be nominated for it!

I was kindly invited by Peeter Laud and Jan Villemson to teach part of the Tartu University course on cryptographic protocols. The theme chosen for the 6 hours of lectures was “Identity and Anonymity“.

 It started with an introduction to authentication protocols, illustarted by the state-of-the-art Just Fast Keying, an authenticated Diffie-Hellaman with DoS protection and privacy, as well as the PAK protocol for authenticated key exchange using a short password (like a PIN). The second 1.5 hours of the lecture series went into the technical details of Brands anonymous credentials, building from simple zero-knowledge protocols, such as Schnorr identification, all the way to how to prove a that a set of linear relation holds on some attributes.

  • Lecture notes on authentication protocols and anonymous credentials (PDF, PPTX)

The second half of the lecture series (3 hours) was devoted to anonymous communications — my pet subject. It contained a review of the Dining Cryptographers protocol for unconditional anonymity, as well as the basic mix constructions to build practical systems. It also described how onion routing works as well as how to measure anonymity and the basics of disclosure attacks on repeated communications.

  • Lecture notes on anonymous communications and traffic analysis (PDF, PPTX)

I just finished reading:

Anonymous Networking amidst Eavesdroppers (PDF)
by Parvathinathan Venkitasubramaniam, Ting He, and Lang Tong.
Pre-print available as arXiv:0710.4903v1 at arxiv.org, October 2007.

This piece of work has the potential to become a milestone is anonymity research. It puts forward, for the first time, an analytical model of the anonymous communication capacity for a certain degree of quality of service, in terms of latency and throughput.

The key idea behind the anonymity mechanism is that relays transmit a schedule that is statistically independent from the messages they receive. This ensure that no traffic characteristics propagate in the network to allow tracing. When it is time to send a message out, a node checks if an input message is in the queue, otherwise it sends a dummy. Input messages in the internal queues expire after a certain time to ensure that message are delivered in a timely fashion (introducing unreliability, and thus a reduction in capacity, instead.)

The measure of anonymity used is entropic (but different from previous ones), in that it measure the fraction of information gain. Say the adversary has H(S) bits of uncertainty before the observation of the system, and H(S|Y) afterwards, the metric of security is defines as a = H(S|Y) / H(S). It is cute that Fano’s inequality allows to calculate the probability the adversary identifies the correct set of communicating partners from this metric.

Excellent, highly recommended, work!

I just read a nice paper entitled “P2P: Is Big Brother Watching You?” by Banerjee, Faloutsos and Bhuyan at at UC Riverside. They present experiments to determine the probability a P2P file sharer stumbles upon an IP address thought to be used by anti-P2P entities, potentially to launch law-suits. Interestingly without the use of Black lists the probability is very close to 100% while even simple filtering brings it down to 1%.

That is of interest to the traffic analyst is the techniques used for both surveillance and counter surveillance. Infiltration of the P2P network can be thought as an active, sybil attack, and could potentially be facilitated through the identification of nodes with higher degree or other structural properties. This seems to not be the case, and P2P super-nodes turn out to have the same probability of being visited as normal peers.

The second point of interest is the use of anonymity and identification from all parties. Peers use blacklists of IP ranges in order to detect potential organizations that run surveillance operations. Clearly it would have been of some benefit for those performing surveillance to have access to communication systems that hide their network attachment points. On the other hand they do try to make it harder to link IP ranges to real-world identities by locating machines, and routing from and to, part of the unassigned IP space. As a result WHOIS lookups do not yield any information about the real world entity behind the surveillance operations.

I have been reading some papers suggested by Janne Lindqvist on the subject of device identification, an often neglected aspect of traffic analysis.

Remote Physical Device Fingerprinting by Kohno et al. introduced the clock skew technique for uniquely identifying networked devices that support the TCP timestamp extensions. The key idea is that the clock skey of each device’s clock is unique, and observing the time drift over a long period allows an adversary to identify it. The technique has been extended by Steven Murdoch in his paper Hot or Not: Revealing Hidden Services by their Clock Skew where he uses it to identify Tor hidden services.

For simpler devices, that may not have a TCP/IP stack, identification can be done at an even lower level. In their work entitled Implications of Radio Fingerprinting on the Security of Sensor Networks, Ramussen and Capkun discuss how the shap of the radio transmission signal can be used to uniquely identify many sensor Cipcon 1000 nodes. For their identification they use features of the radio transient, which is the window of signal between no transmission and a transmission start. By measuring the length, number of peaks, variance of amplitude, and the first wavelet coefficients they manage to identificy the majority (70%) of nodes.

A weakness of the work is the suggestion to use these features to strengthn identification. While this might work in the short term (a similar technique was used to detect first generation cloned mobile phones) in the long run a strategic adversary should have no trouble faking the transient’s signatures.

Finally Corbert et al. present a method to detect the manufacturer of a IEEE 802.11 network device in A Passive Approach to Wireless NIC Identification. The key observation is that the standard supports multipe distinct rates of transmission (of 1, 2, 5.5 and 11 Mbps) and the switching algorithm between the different rates is not standardized. By infering it a good guess can be made as to the vendor of the NIC. Furthermore no interception is required at the physical layer, since the information about rates of transfer is transmitted in clear. A similar approach is advocated in Passive Data Link Layer 802.11 Wireless Device Driver Fingerprinting by Franklin et al.

Just finished reading a new paper on Low Cost Traffic Analysis: 

Wiangsripanawan, R., Susilo, W., and Safavi-Naini, R. 2007. Design principles for low latency anonymous network systems secure against timing attacks. In Proceedings of the Fifth Australasian Symposium on ACSW Frontiers – Volume 68(Ballarat, Australia, January 30 – February 02, 2007).

The authors look afresh at the Low Cost Traffic Analysis attack and how it applies to the Tarzan and MorphMix peer-to-peer anonymity systems. The key observation is that for the attack to apply three preconditions need to hold:

  1. A node’s load affects the latency of relayed traffic.
  2. The adversary knows the nodes participating in the protocols.
  3. The adversary must be able to establish a direct connection with all other nodes.

The paper argues that Tarzan’s mimic based routing structure may invalidate precondition (3). MorphMix on the other hand makes it difficult for the adversary to know all nodes in the network (2). As a general rule they advise designers to make comprehensive node discovery difficult, a property that is also in line with the needs of censorship resistant proposals.

Before getting overly excited about the security of those systems it is worth remembering the inherent unscalability of Tarzan, as well as the recent attacks against MorphMix.

For many years the canonical reference for research in anonymity and anonymous communications has been the Freehaven anonymity bibliography. Claudia Diaz and George Danezis have now written a short introduction to this research field called “A survey of anonymous communication channels“.

The survey includes a discussion of definitions, metrics for anonymity, and then describes systems in order of strength. Th focus is on mixing mechanisms and onion routing, but remailers, as well as provable shuffles are explained.

Follow

Get every new post delivered to your Inbox.