Ed Felten’s blog relays a report from the New York Timesexplaining the target selection strategies used to detect suspect communities in vast call graphs. The full paper reference, containing some novel data structures to support the processing, is:

Some further work from the same author includes a Bayesian re-framing of the problem and the solution in Deepak Agrawal, Daryl Pregibon“Enhancing Communities of Interest Using Bayesian Stochastic Blockmodels” presented at SDM 2004. Others have used a similar paradigm to study email messaging networks, particularly with a view to detect SPAM messages (Lisa Johansen, Michael Rowell, Kevin Butler, and Patrick McDaniel. Email Communities of Interest) or to prevent DDoS attacks (Patrick Verkaik, Oliver Spatscheck, Jacobus Van der Merwe, and Alex C. Snoeren. PRIMED: Community­of­Interest­Based DDoS Mitigation). It seems that the COI meme became popular within AT&T leading to more framework papers like William Aiello , Charles Kalmanek, Patrick McDaniel, Subhabrata Sen , Oliver Spatscheck, and Jacobus Van der Merwe, “Analysis of Communities of Interest in Data Networks“.

A key shortcoming of these papers when it come to security is that they do not consider a strategic adversary that aims to foil detection. The work by Sudarshan S. Chawathe  “Tracking Hidden Groups Using Communications” addresses this side of the problem.

I just read a nice paper entitled “P2P: Is Big Brother Watching You?” by Banerjee, Faloutsos and Bhuyan at at UC Riverside. They present experiments to determine the probability a P2P file sharer stumbles upon an IP address thought to be used by anti-P2P entities, potentially to launch law-suits. Interestingly without the use of Black lists the probability is very close to 100% while even simple filtering brings it down to 1%.

That is of interest to the traffic analyst is the techniques used for both surveillance and counter surveillance. Infiltration of the P2P network can be thought as an active, sybil attack, and could potentially be facilitated through the identification of nodes with higher degree or other structural properties. This seems to not be the case, and P2P super-nodes turn out to have the same probability of being visited as normal peers.

The second point of interest is the use of anonymity and identification from all parties. Peers use blacklists of IP ranges in order to detect potential organizations that run surveillance operations. Clearly it would have been of some benefit for those performing surveillance to have access to communication systems that hide their network attachment points. On the other hand they do try to make it harder to link IP ranges to real-world identities by locating machines, and routing from and to, part of the unassigned IP space. As a result WHOIS lookups do not yield any information about the real world entity behind the surveillance operations.

I have been reading some papers suggested by Janne Lindqvist on the subject of device identification, an often neglected aspect of traffic analysis.

Remote Physical Device Fingerprinting by Kohno et al. introduced the clock skew technique for uniquely identifying networked devices that support the TCP timestamp extensions. The key idea is that the clock skey of each device’s clock is unique, and observing the time drift over a long period allows an adversary to identify it. The technique has been extended by Steven Murdoch in his paper Hot or Not: Revealing Hidden Services by their Clock Skew where he uses it to identify Tor hidden services.

For simpler devices, that may not have a TCP/IP stack, identification can be done at an even lower level. In their work entitled Implications of Radio Fingerprinting on the Security of Sensor Networks, Ramussen and Capkun discuss how the shap of the radio transmission signal can be used to uniquely identify many sensor Cipcon 1000 nodes. For their identification they use features of the radio transient, which is the window of signal between no transmission and a transmission start. By measuring the length, number of peaks, variance of amplitude, and the first wavelet coefficients they manage to identificy the majority (70%) of nodes.

A weakness of the work is the suggestion to use these features to strengthn identification. While this might work in the short term (a similar technique was used to detect first generation cloned mobile phones) in the long run a strategic adversary should have no trouble faking the transient’s signatures.

Finally Corbert et al. present a method to detect the manufacturer of a IEEE 802.11 network device in A Passive Approach to Wireless NIC Identification. The key observation is that the standard supports multipe distinct rates of transmission (of 1, 2, 5.5 and 11 Mbps) and the switching algorithm between the different rates is not standardized. By infering it a good guess can be made as to the vendor of the NIC. Furthermore no interception is required at the physical layer, since the information about rates of transfer is transmitted in clear. A similar approach is advocated in Passive Data Link Layer 802.11 Wireless Device Driver Fingerprinting by Franklin et al.