WPES10 in real time: Social Network Privacy
4 October 2010
- Using Social Networks to Harvest Email Addresses by Iasonas Polakis, Georgios Kontaxis, Eleni Gessiou, Thanasis Petsas, Evangelos P. Markatos and Spiros Antonatos (Institute of Computer Science, Foundation for Research and Technology Hellas)
- Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships and Other Sensitive Binary Attribute Vectors by Arthur Ascuncion and Michael Goodrich (University of California, Irvine) (not on-line yet).
The first work by Polakis et al, looks at how easy it is to massively harvest email addresses using social networks and search engines to further use them as targets for spam. Furthermore, they attach to each email address contextual social information and personal information to produce more convincing spam and phishing emails. For this they used different techniques on three target platforms.
On facebook, they use the facility that allows users to find other by email to harvest personal information. This acts as an oracle to map harvested email addresses to real world names and demographic information. Once a facebook profile is linked, a very convincing fishing email can be crafted – including an email to ask to befriend them. (About 30% of users would befriend a stranger in that manner – a result not reported in the paper.)
A second vector of information is the use of nicknames that are constant across different sites. They use twitter to harvest pairs of (nicknames, email) and then further use the Facebook email to name oracle to link them to real world addresses. Finally, the authors use a google buzz feature to extract emails: every Buzz user ID is also their gmail address – this means that by searching buzz for particular words you can harvest gmail addresses as well as personal information of the users.
But how effective are the email harvesting techniques? How do you even assess this? The authors check the name and address they have harvested against an exact match with the name extracted from Facebook. The first technique yields about 0.3% correct addresses, the second 7%, and the final one 40%, showing that the techniques are practical when linking email to real names.
The second paper by Ascuncion et al. looks at how to aggregate information leaked by social networks to construct complete profiles of users on Social Networks. The aim is to reconstruct the friendship network as well as recovering attributes even if the privacy settings are used.
The techniques assume you can use an oracle to ask group queries against the social network site to check for a particular attribute. The objective is then to find a scheme that minimises the number of queries. It turns out there is a body of work on combinatorial group testing, including adaptive variants, that are readily applicable to this problem. This is not unlike our work on prying data out of a social network. Applying these techniques to social networks is even narrower allowing a lower number of queries to extract attributes (a logarithmic number of queries in the size of possible profiles, and linear in the number of profiles with a certain attribute to be extracted).
The attack is applied and validated by applying it to extract friends in Facebook, DNA sequences in mitochondrial databases, and movie preferences in the NetFlix database. These techniques are interesting as they are very general. At the same time it is likely that faster ways exist to extract specific attributes of users in real-world social networks, as there are strong correlation between attributes and the social structure of users.