3 February 2016
(This is an extract from my contribution to Harper, Richard. “Introduction and Overview”, Trust, Computing, and Society. Ed. Richard H. R. Harper. 1st ed. New York: Cambridge University Press, 2014. pp. 3-14. Cambridge Books Online. Web. 03 February 2016. http://dx.doi.org/10.1017/CBO9781139828567.003)
Cryptography has been used for centuries to secure military, diplomatic, and commercial communications that may fall into the hands of enemies and competitors (Kahn 1996). Traditional cryptography concerns itself with a simple problem: Alice wants to send a message to Bob over some communication channel that may be observed by Eve, but without Eve being able to read the content of the message. To do this, Alice and Bob share a short key, say a passphrase or a poem. Alice then uses this key to scramble (or encrypt) the message, using a cipher, and sends the message to Bob. Bob is able to use the shared key to invert the scrambling (or “decrypt”) and recover the message. The hope is that Eve, without the knowledge of the key, will not be able to unscramble the message, thus preserving its confidentiality.
It is important to note that in this traditional setting we have not removed the need for a secure channel. The shared key needs to be exchanged securely, because its compromise would allow Eve to read messages. Yet, the hope is that the key is much shorter than the messages subsequently exchanged, and thus easier to transport securely once (by memorizing it or by better physical security). What about the cipher? Should the method by which the key and the message are combined not be kept secret? In “La Cryptographie Militaire” in 1883, Auguste Kerckhoffs stated a number of principles, including that only the key should be considered secret, not the cipher method itself (Kerckhoffs 1883). Both the reliance on a small key and the fact that other aspects of the system are public is an application of the minimization principle we have already seen in secure system engineering. It is by minimizing what has to be trusted for the security policy to hold that one can build and verify secure systems – in the context of traditional cryptography, in principle, this is just a short key.
Kerckhoffs argues that only the key, not the secrecy of the cipher is in the trusted computing base. But a key property of the cipher is relied on: Eve must not be able to use an encrypted message and knowledge of the cipher to recover the message without access to the secret key. This is very different from previous security assumptions or components of the TCB. It is not about the physical restrictions on Eve, and it is not about the logical operations of the computer software and hardware that could be verified by careful inspection. It comes down to an assumption that Eve cannot solve a somehow difficult mathematical problem. Thus, how can you trust a cipher? How can you trust that the adversary cannot solve a mathematical problem?
To speak the truth, this was not a major concern until relatively recently, compared with the long history of cryptography. Before computers, encoding and decoding had to be performed by hand or using electromechanical machines. Concerns such as usability, speed, cost of the equipment, and lack of decoding errors were the main concerns in choosing a cipher. When it comes to security, it was assumed that if a “clever person” proposes a cipher, then it would take someone much cleverer than them to decode it. It was even sometimes assumed that ciphers were of such complexity that there was “no way” to decode messages without the key. The assumption that other nations may not have a supply of “clever” people may have to do with a colonial ideology of nineteenth and early twentieth centuries. Events leading to the 1950s clearly contradict this: ciphers used by major military powers were often broken by their opponents.
In 1949, Claude Shannon set out to define what a perfect cipher would be. He wanted it to be “impossible” to solve the mathematical problem underlying the cipher (Shannon 1949). The results of this seminal work are mixed. On the positive side, there is a perfect cipher that, no matter how clever an adversary is, cannot be solved – the one-time pad. On the down side, the key of the cipher is as long as the message, must be absolutely random, and can only be used once. Therefore the advantage of short keys, in terms of minimizing their exposure, is lost and the cost of generating keys is high (avoiding bias in generating random keys is harder than expected). Furthermore, Shannon proves that any cipher with smaller keys cannot be perfectly secure. Because the one-time pad is not practical in many cases, how can one trust a cipher with short keys, knowing that its security depends on the complexity of finding a solution? For about thirty years, the United States and the UK followed a very pragmatic approach to this: they kept the cryptological advances of World War II under wraps; they limited the export of cryptographic equipment and know-how through export regulations; and their signal intelligence agencies – the NSA and GCHQ, respectively – became the largest worldwide employers of mathematicians and the largest customers of supercomputers. Additionally, in their roles in eavesdropping on their enemies’ communications, they evaluated the security of the systems used to protect government communications. The assurance in cryptography came at the cost of being the largest organizations that know about cryptography in the world.
The problem with this arrangement is that it relies on a monopoly of knowledge around cryptology. Yet, as we have seen with the advent of commercial telecommunications, cryptography becomes important for nongovernment uses. Even the simplest secure remote authentication mechanism requires some cryptography if it is to be used over insecure channels. Therefore, keeping cryptography under wraps is not an option: in 1977, the NSA approved the IBM design for a public cipher, the Data Encryption Standard (DES), for public use. It was standardized in 1979 by the US National Institute for Standards and Technology (NIST).
The publication of DES launched a wide interest in cryptography in the public academic community. Many people wanted to understand how it works and why it is secure. Yet, the fact that the NSA tweaked its design, for undisclosed reasons, created widespread suspicion in the cipher. The fear was that a subtle flaw was introduced to make decryption easy for intelligence agencies. It is fair to say that many academic cryptographers did not trust DES!
Another important innovation in 1976 was presented by Whitfield Diffie and Martin Hellman in their work “New Directions in Cryptography” (Diffie & Hellman 1976). They show that it is possible to preserve the confidentiality of a conversation over a public channel, without sharing a secret key! This is today known as “Public Key Cryptography,” because it relies on Alice knowing a public key for Bob, shared with anyone in the world, and using it to encrypt a message. Bob has the corresponding private part of the key, and is the only one that can decode messages used with the public key. In 1977, Ron Rivest, Adi Shamir, and Leonard Adleman proposed a further system, the RSA, that also allowed for the equivalent of “digital signatures” (Rivest et al. 1978).
What is different in terms of trusting public key cryptography versus traditional ciphers? Both the Diffie-Hellman system and the RSA system base their security on number theoretic problems. For example, RSA relies on the difficulty of factoring integers with two very large factors (hundreds of digits). Unlike traditional ciphers – such as DES – that rely on many layers of complex problems, public key algorithms base their security on a handful of elegant number theoretic problems.
Number theory, a discipline that G.H. Hardy argued at the beginning of the twentieth century was very pure in terms of its lack of any practical application (Hardy & Snow 1967), quickly became the deciding factor on whether one can trust the most significant innovation in the history of cryptology! As a result, a lot of interest and funding directed academic mathematicians to study whether the mathematical problems underpinning public key cryptography were in fact difficult and how difficult the problems were.
Interestingly, public key cryptography does not eliminate the need to totally trust the keys. Unlike traditional cryptography, there is no need for Bob to share a secret key with Alice to receive confidential communications. Instead, Bob needs to keep the private key secret and not share it with anyone else. Maintaining the confidentiality of private keys is simpler than sharing secret keys safely, but it is far from trivial given their long-term nature. What needs to be shared is Bob’s public key. Furthermore, Alice need to be sure she is using the public key associated with the Bob’s private key; if Eve convinces Alice to use an arbitrary public key to encrypt a message to Bob, then Eve could decrypt all messages.
The need to securely associate public keys with entities has been recognized early on. Diffie and Hellman proposed to publish a book, a bit like the phone register, associating public keys with people. In practice, a public key infrastructure is used to do this: trusted authorities, like Verisign, issue digital certificates to attest that a particular key corresponds to a particular Internet address. These authorities are in charge of ensuring that the identity, the keys, and their association are correct. The digital certificates are “signed” using the signature key of the authorities that anyone can verify.
The use of certificate authorities is not a natural architecture in many cases. If Alice and Bob know each other, they can presumably use another way to ensure Alice knows the correct public key for Bob. Similarly, if a software vendor wants to sign updates for their own software, they can presumably embed the correct public key into it, instead of relying on public key authorities to link their own key with their own identity.
The use of public key infrastructures (PKI) is necessary in case Alice wants to communicate with Bob without them having any previous relationship. In that case Alice, given only a valid name for Bob, can establish a private channel to Bob (as long as it trusts the PKI). This is often confused: the PKI ensures that Alice talks to Bob, but not that Bob is “trustworthy” in any other way. For example, a Web browser can establish a secure channel to a Web service that is compromised or simply belong to the mafia. The secrecy provided by the channel does not, in that case, provide any guarantees as to the operation of the Web service. Recently, PKI services and browsers have tried to augment their services by only issuing certificates to entities that are verified as somehow legitimate.
Deferring the link between identities and public keys to trusted third parties places this third party in a system’s TCB. Can certification authorities be trusted to support your security policy? In some ways, no. As implemented in current browsers, any certification authority (CA) can sign a digital certificate for any site on the Internet (Ellison & Schneier 2000). This means that a rogue national CA (say, from Turkey) can sign certificates for the U.S. State Department, that browsers will believe. In 2011, the Dutch certificate authority Diginotar was hacked, and their secret signature key was stolen (Fox-IT 2012). As a result, fake certificates were issued for a number of sensitive sites. Do CAs have incentives to protect their key? Do they have enough incentives to check the identity of the people or entities behind the certificates they sign?
Cryptographic primitives like ciphers and digital signatures have been combined in a variety of protocols. One of the most famous is the Secure Socket Layer SSL or TLS, which provides encryption to access encrypted Web sites on the Internet (all sites following the https:// protocol). Interestingly, once secure primitives are combined into larger protocols, their composition is not guaranteed to be secure. For example a number of problems have been identified against SSL and TLS that are not related to the weaknesses of the basic ciphers used (Vaudenay 2002).
The observation that cryptographic schemes are brittle and could be insecure even if they rely on secure primitives (as did many deployed protocols) led to a crisis within cryptologic research circles. The school of “provable security” proposes that rigorous proofs of security should accompany any cryptographic protocol to ensure it is secure. In fact “provable security” is a bit of a misnomer: the basic building blocks of cryptography, namely public key schemes and ciphers cannot be proved secure, as Shannon argued. So a security proof is merely a reduction proof: it shows that any weakness in the complex cryptographic scheme can be reduced to a weakness in one of the primitives, or a well-recognized cryptographic hardness assumption. It effectively proves that a complex cryptographic scheme reduces to the security of a small set of cryptographic components, not unlike arguments about a small Trusted Computing Base. Yet, even those proofs of security often work at a certain level of abstraction and often do not include all details of the protocol. Furthermore, not all properties can be described in the logic used to perform the proofs. As a result, even provably secure protocols have been found to have weaknesses (Pfitzmann & Waidner 1992).
So, the question of “How much can you trust cryptography?” has in part itself been reduced to “How much can you trust the correctness of a mathematical proof on a model of the world?” and “How much can one trust that a correct proof in a model applies to the real world?” These are deep epistemological questions, and it is somehow ironic that national, corporate, and personal security depends on them. In addition to these, one may have to trust certificate authorities and assumptions on the hardness of deep mathematical problems. Therefore, it is fair to say that trust in cryptographic mechanisms is an extremely complex social process.
5 December 2015
As many in the UK are fighting a rear-guard action to prevent the most shocking provisions of the IP Bill becoming law (incl. secrecy and loose definitions), I was invited to provide three public policy recommendations for strengthening IT security in the EU. Instead of trying to limit specific powers (such as backdoors) here are some more radical options, more likely to resolve the continuous tug-of-war cyber civil liberties and the security services have been engaging in a while.
1 July 2015
One of my key annoyances while doing work in privacy technologies is the poor support of key cryptography libraries in my favorite Python programming language. Today, I would like to share my work on building petlib, a more or less pythonic, wrapper around the OpenSSL low level crypto and math libraries, as well as numerous example privacy technologies (PETs) that I have implemented as examples.
The needs of someone doing research in the PETs field are quite different from other developers: on one hand we need access to low level primitives (such as block cipher and hash function operations), as well as low level mathematical functions on big integers and elliptic curves on finite fields. A number of available libraries try to hide those primitives from developers behind abstractions such as “signed envelopes” or “secure sockets” — which so not serve those who try to build different abstractions. On the other hand, issues such a tight memory management and absolute control over other low-level aspects of the library are not essential; in fact a clean programming interface that leads to beautifully clear reference code for proposed protocols is preferable.
The petlib library is available for everyone to use, and after installing the OpenSSL prerequisites can be acquired through the python repositories through:
pip install petlib
The petlib library was used as the basis for teaching the labs of the Privacy Enhancing Technologies course at UCL, and thus has extensive documentations, and is properly version controlled, packaged and tested:
- petlib github repository
- petlib installation and programming documentation (read the docs)
- petlib listing on pypi
The best way to get a feel for how the library can be used to build PETs prototypes is to browse the examples in the source tree:
- A toy RSA example
- A simple Schnorr Zero-Knowledge Proof
- An additivelly homomorphic public key encryption scheme
- A generic engine for building zero-knowledge proofs using sigma protocols
- The Groth-Kohlweiss ring signature and zero-cash scheme
- The algebraic MAC scheme by Chase, Meiklejohn and Zaverucha
- An anonymous credential scheme based on the aMAC scheme above
- The Baldimtsi and Lysyanskaya anonymous credentials light scheme
In terms of more real-work research project, we have already used petlib for implementing prototypes for a few projects and labs:
- The centrally banked cryptocurrency framework (with Sarah Meiklejohn)
- A private stats collection system for Tor (with Melis and De Cristofaro)
- Exercises in Privacy Enhancing Technologies (for UCL COMPGA17)
One key missing component from the underlying OpenSSL crypto library is support for computations on pairings of elliptic curves. This limits the types of protocols that can currently be implemented with petlib, until such functionality becomes available in the underlying libraries (please contribute!) Bug reports and pull requests with fixes to the code and documentation are very welcome.
30 June 2015
The course covers principally, and in some detail, engineering aspects of PETs and caters for an audience of CS / engineering students that already understands the basics of information security and cryptography (although these are not hard prerequisites). Students were also provided with a working understanding of legal and compliance aspects of data protection regimes, by guest lecturer Prof. Eleni Kosta (Tilburg); as well as a world class introduction to human aspects of computing and privacy, by Prof. Angela Sasse (UCL). This security & cryptographic engineering focus sets this course apart from related courses.
The taught part of the course runs for 20 hours over 10 weeks, split in 10 topics:
- Introduction and privacy in communication. (01-GA17-IntroComms)
- Anonymous communications & Traffic analysis (02-GA17-Anonymous-Comms)
- Private Computations with homomorphic encryption and secret sharing (03-GA17-Private-Computations)
- Privately checking inputs using Zero-Knowledge Proofs (04-GA17-ZeroKnowlegde)
- Private authorization using selective disclosure credentials (05-GA17-Selective-Disclosure)
- Data anonymization & de-anonymziation attacks (08-GA17-Data-Anonymization)
- Private Storage, queries and lookups (09-GA17-Storage-Retrieval)
- Privacy by design case-studies (10-GA17-Privacy-by-design-case-studies – Copy)
- Guest lectures: Human aspects (Angela Sasse)
- Guest lectures: Data Protection (Eleni Kosta)
Most importantly the course includes 10 hours of labs (20 next year!), split into 5 exercises, that give students (and their teachers!) hands on experience implementing extremely advanced privacy enhancing technologies. More generally the course provides an introduction to solid cryptographic engineering, test-driven development, testing & QA tools and code audits. The programming language used was Python on a Linux environment, with the petlib library that was specially developed for this course.
For each lab exercise students in pairs were provided with a partial code file, and a set of unit tests, and were asked to fill in the remaining code to fulfill the task, and at least make the unit tests pass. The topics of the exercises track the first 5 lecture topics:
- Private communications and basic programming with petlib
- Building a simple mix server and client
- Building a private polling system with homomorphic encryption
- Basics of zero-knowledge proofs of knowledge, equality and linear statement
- A basic selective disclosure authorization credential system
Finally, part of the grading was based on students performing a code review of other groups, looking for code defects leading to security or other bugs.
Overall, I am very proud of the progress everyone made. The course was attended by 16 MSc student and 2 MEng students. Everyone eventually was able to complete all lab assignment — not a given considering the advanced nature of the tasks at hand. It was evident while discussing with student the final exercise, on building a selective disclosure credential, that many had developed an intuitive understanding of how to build solutions based on zero-knowledge protocols, and all had definitely overcome their initial fear of these more advanced concepts in PETs.
I was also very impressed with many students that were able to tackle the hardest questions in the exam. One of those questions, basically asked students to re-invent a variant of the privacy preserving genomic testing protocol we presented at WPES 2014 — and many did successfully. Similarly, they were asked to de-anonymize a mechanism very similar to the 15:15 rule in place in California to “protect” smart meter reading, and again many did so successfully under time constrain and the high pressure environment of exams. As ever, the great engagement from students was the most rewarding part of teaching the course.
All material is available online (see links to slides, and git repositories), and I would be delighted to share / receive any additional exercises by others finding this material relevant to their courses.
I will be participating on a panel this afternoon on “Creating Usable and Secure Software”, in the context of the conference on Digital Citizenship and Surveillance Society. I share a platform with a number of illustrious people — Dave Hrycyszyn, Lola Oyelalo and Blaine Cook — with a much deeper experience in usable software and services development. However, I will attempt to provide some context, and my opinions, on why we can observe a broadly poor state of affairs when it comes to usability of privacy technologies — and hopefully open a discussion on how to overcome roadblocks.
My main two positions will be as follows:
- The political context within which technical security and privacy research and development had to be conducted over the past 40 years greatly contributed to the lack of wide deployment and poor usability of privacy technologies.
- The lack of “knowledge” about methods for developing usable privacy friendly solutions only offer a partial explanation for this poor state of affairs, and has to compete with other roadblocks that systemically undermined the deployment of usable privacy technologies.
First, it is worth reminding ourselves that research into security technologies and strong cryptography specifically, was until recently the prerogative of governments. Public discussion and know-how on this topic was developed seriously after the mid-1980s, and often despite serious pressure from the US and other governments. The technical security community is small and there remain serious technical challenges to providing privacy friendly solutions — solutions that require deep expertise developed over years of practice (requiring funding).
Second, the export control regimes, and also requirements for cooperation with law-enforcement slowed down significantly the blanket deployment of privacy technologies even after the strict export control regime of the 1990s was lifted. What makes a number of privacy technologies unusable — email encryption, instant messaging encryption — is the fact that common clients do not support them transparently and by default — requiring plug-ins, user configuration and manual key management. Thus the lasting impact of these regulation has not been the non-proliferation of strong crypto technologies, but the lack of integration of these into mainstream platforms. It is telling that the current Law Enforcement and Government narrative is not about preventing encryption know-how from spreading, but rather discouraging wide deployment of such technologies without the ability for back-door or front-door access.
Third, there are commercial pressures — which again have been related with government hostility of the wide deployment of privacy technologies. It is easy to forget that governments, are major customers of technologies. Thus they are able to dictate requirements that make it difficult to widely deploy privacy technologies. It is telling that mainstream mail clients — such as Microsoft Outlook — do not transparently support PGP based end-to-end encryption and have instead opted for S/MIME and models that make the use of encryption by individuals rather difficult. In this context one may assume that the key customers of this software — large enterprises and governments — simply never asked for such features, and in fact probably considered such a feature to conflict with other requirements (such as the need to recover mail of employees, backup, …).
These commercial pressures, have changed in the past few years, as large internet companies start relying heavily on serving end-users (search, webmail, social networking). Sadly, these companies have adopted both a business model — ad-based monetization — and a technical architecture — cloud computing — that makes meaningful privacy protection very difficult. In turn the “success” of those architectures has lead to an extreme ease of developing using this model, and an increasing difficulty in providing end-user solutions with appropriate privacy protections — let alone usable ones.
The rise of services has pushed a number of key privacy technologies into not being commercially supported and a key feature, and in effect at best a “common” — with the governance and funding problems this entails. We have recently learned about the systemic under funding of key privacy technologies such as OpenSSL and GPG. Technologies like Tor are mostly funded for their national firewall traversal features, seeing development on anonymity features suffer. Unlike other commons (health, parks, quality assurance in medicines), the state has not stepped in to either help with governance or with funding — all the opposite. For example, standardization efforts have systematically promoted “surveillance by design” instead of best of breed privacy protection; funding for surveillance technology is enormous compared to funding for privacy technologies, and somehow ironically, a number of calls for funding of privacy technologies are in the context of making surveillance more “privacy friendly” — leading to largely non-nonsensical outcomes.
So, lack of “knowledge” about how to develop usable software, while also a contributing factor, has to be seen within the context of the above structural pressures. In parallel, pressures undoubtedly exist when it comes to the discipline of UX which is in itself recent, and constantly involving. Along with serious funding for collaboration on building more usable privacy software (which the Simply Secure project that I am associated with attempts to provide), we need a strategy to counter those systemic pressures to ensure the wide deployment of usable privacy technologies.
I have a rather enlightening chat with Yvo Desmedt after the Cambridge Security Protocol‘s Workshop, who was kind enough to give me an overview and insight into the group key agreement protocols of the ’90s. As part of an on-going conversation on secure chat, I am trying to understand the genealogy, and requirement, for key group agreements to be “contributory” — namely ensure all participants contribute to the final group key, even under malice. It seems that there is also a preference for symmetric schemes, where all participants perform the same operations.
Yvo’s classic Eurocrypt paper [BD94] is the basis of GOTR [LVH13], which manages to complicate it considerably. The original paper had a O(N) broadcasts cost, and was “peer-to-peer”, meaning that everyone just does the same thing. However it did not consider an active adversary, and was also not “contributory” meaning that an insider (active) adversary could force the key to anything they liked. Interestingly, Yvo points me to [BD96], which he thinks is superior to [BD94]. This paper really illustrates that there is no magic to group key agreement: a master key for the group is determined and then propagated using a key-sharing graph. This reduces the cost from O(N) broadcasts to just O(N) point to point messages — which to me seems optimal.
Now, the idea that schemes must be “contributory” (ie. no participant is special in determining the key — no one can force the key to be some specific value) emerged sometime in the late ’90s. The first reference I found to this property is [BW98], where the authors look at the round complexity of key agreement. However they state that “If the group key is generated and distributed by a central trusted party, then it is not necessary to discuss the communication complexity“. Then, they just launch into those schemes with no justification as to the reason centralized distribution may be a problem …
Katz and Young [KY03] also state that “(we exclude here centralized protocols in which a designated group manager is assumed; such asymmetric, non-contributory schemes place an unfairly high burden on one participant who is a single point of failure and who must also be trusted to properly generate keys)“. So it seems that the security issue contributory schemes try to mitigate is a flawed RNG. However this is a marginal threat — if the RNG is bad, than it may be likely the adversary can also corrupt other aspects of the platform to extract keys. In any case the flawed RNG threat can be dealt with by including some entropy from other participants assuming that they do not act in an adaptive malicious manner (if they do they can just leak the key to the adversary). I find it strange that KY03 argue that a single participant must not be burdened, when this results in proposed protocols that burden all participants to a great extent instead.
Around 2000 a number of works spring up that attempt to extend key agreement to dynamic membership setting, including [STW00] and [KPT00]. It is not at all clear to me whether those are in fact superior to running the key exchange multiple times, or even having a central party distributing keys.
Finally, Goldberg and others [GUGC09] propose extensions to OTR for a multi-user setting. These focus on deniability and signatures (and call a generic key exchange protocol) and a shared authentic transcript. This is a fine property, however I am a bit surprised the protocols are (a) so complex to establish ephemeral signatures and (b) so simple if they are to establish transcript consistency. My understanding is that they rely on the channel to offer consistent ordering, and then simply cryptographically ensure it was not tampered by an adversary — however I only read the paper obliquely.
Conclusion: It seems that a lot of the literature on group key exchange is based on the premise that the protocols need to be symmetric and contributory. Yet, I fail to see any justification beyond the fact that centralized schemes are simple and efficient, and no one could possibly write an academic paper about them. All schemes I have seem rely on the honest channel offering ordering, and being reliable. If that is not the case some of them detect it and hard fail (for example the integrity checks fail, with no hint that it is due to missing messages). This means that they assume some ordering happens on the outside of the crypto, which is dubious without some leader election. Few works have dealt with how you determine the group, which would either go the admin way or the voting way (can of worms).
[BD94] Mike Burmester, Yvo Desmedt: A Secure and Efficient Conference Key Distribution System (Extended Abstract). EUROCRYPT 1994: 275-286
[KY03] Katz, Jonathan, and Moti Yung. “Scalable protocols for authenticated group key exchange.” Advances in cryptology-CRYPTO 2003. Springer Berlin Heidelberg, 2003. 110-125.
[BW98] Becker, Klaus, and Uta Wille. “Communication complexity of group key distribution.” Proceedings of the 5th ACM conference on Computer and communications security. ACM, 1998.
[KPT00] Kim, Yongdae, Adrian Perrig, and Gene Tsudik. “Simple and fault-tolerant key agreement for dynamic collaborative groups.” Proceedings of the 7th ACM conference on Computer and communications security. ACM, 2000.
I was very honored to be invited to Asiacrypt 2013 to present some of our work on privacy-friendly computations. It was an opportunity to consolidate a presentation that includes an overview of privacy-friendly billing and aggregation for smart metering. The slides of the presentation are available in Powerpoint 2012 format (and an older ppt format).
The key references providing more technical details on smart-metering privacy are:
- Alfredo Rial, George Danezis: Privacy-preserving smart metering. WPES 2011: 49-60
- Klaus Kursawe, George Danezis, Markulf Kohlweiss: Privacy-Friendly Aggregation for the Smart-Grid. PETS 2011: 175-191
- George Danezis, Markulf Kohlweiss, Alfredo Rial: Differentially Private Billing with Rebates. Information Hiding 2011: 148-162
- George Danezis, Benjamin Livshits: Towards ensuring client-side computational integrity. CCSW 2011: 125-130
- Andres Molina-Markham, George Danezis, Kevin Fu, Prashant J. Shenoy, David E. Irwin: Designing Privacy-Preserving Smart Meters with Low-Cost Microcontrollers. Financial Cryptography 2012: 239-253
- Gilles Barthe, George Danezis, Benjamin Grégoire, César Kunz, Santiago Zanella Béguelin: Verified Computational Differential Privacy with Applications to Smart Metering. CSF 2013: 287-301
- Cedric Fournet, Markulf Kohlweiss, George Danezis and Zhengqin Luo. ZQL: A Compiler for Privacy-Preserving Data Processing. USENIX Security Symposium, Washington DC, 2013.
- George Danezis, Cedric Fournet, Markulf Kohlweiss and Santiago Zanella-Beguelin. Smart Meter Aggregation via Secret-Sharing. ACM SEGS 2013: Smart Energy Grid Security Workshop, Berlin, 2013.
- Carmela Troncoso, George Danezis, Eleni Kosta, Josep Balasch, Bart Preneel: PriPAYD: Privacy-Friendly Pay-As-You-Drive Insurance. IEEE Trans. Dependable Sec. Comput. 8(5): 742-755 (2011)
- George Danezis, Markulf Kohlweiss, Benjamin Livshits, Alfredo Rial: Private Client-Side Profiling with Random Forests and Hidden Markov Models. Privacy Enhancing Technologies 2012: 18-37
28 July 2011
Privacy-friendly Aggregation for the Smart-grid
Klaus Kursawe (Radboud Universiteit Nijmegen) and George Danezis and Markulf Kohlweiss (Microsoft Research)
Privacy in for smart electricity provision seems to be a rising topic, and this year there is a whole session on it at PETS 2011. The first paper (one which I am a coauthor) looks at the problem of gathering aggregate data from groups of smart meters, without allowing any third party to get the the individual measurements. This can be applied as a PET to solve real-world problems such as fraud detection, leakage detection, load estimates, demand response, weather prediction — all of which only require aggregate data (sometimes in real time), not individual measurements.
The key challenge to providing a private aggregation protocols are the specific constraints of smart meters. They are cheap devices, with modest resources, hardly any bandwidth, no ability to communicate, etc. Two specific protocols are presented: the first one allows to compare the sum of meter readings with a reference number (maybe measured from a feeder meter). This protocol allows for fancy proofs of correctness, but it slow in terms of computation and bandwidth (it requires public key operations for each reading). The second protocol is extremely fast and has no communication overhead. In both cases a pragmatic approach to the threat model is followed: we assume that the utilities will be honestly defining groups of meters and facilitating the key management protocol — for the second protocol there is no overhead of public key operations after the initial key setup.
The key highlight from this work is not as much its technical depth (tricks with DC networks and hash function that would not surprise any PETS regular). What is interesting is that the protocols were designed for a real industrial application and now fully integrated on real smart meters and their communication protocols in collaboration with our collaborators at Elster.
Plug-in privacy for Smart Metering billing
Marek Jawurek, Martin Johns, and Florian Kerschbaum (SAP Research)
This second paper looks at the problem of billing for fine-grained time of use tariffs — their energy consumption at different times costs a different rate per unit. This is a very important topic, as correct billing and time of use tariffs are a key driver of fine-grained data collection through smart meters — if we can do billing privately then maybe less personal information may be collected.
Technically the protocols proposed are based on the homomorphic properties of Pedersen commitments: readings are commitments, and you can use multiplication by a constant and addition to compute the bill, and (most importantly) prove that it is correct. The system model is that the meter outputs signed commitments of readings, a privacy component computes the bill and proofs of correctness, and those are sent to the supplier for verification (and printing the bills!).
This is the core of a nice solution for the basic billing case (which is likely to be the common one in smart grids). We have shown in related work that the protocol can be further improved to have zero communication overhead. Since it avoids expensive zero-knowledge proofs it is fast for its proofs and verification. It also provides the basic infrastructure to support further more expressive billing policies and general computations.
27 July 2011
Quantifying Location Privacy: The Case of Sporadic Location Exposure
Reza Shokri and George Theodorakopoulos (EPFL), George Danezis (Microsoft Research), and Jean-Pierre Hubaux and Jean-Yves Le Boudec (EPFL)
This work evaluates the privacy of using location-based services sporadically using a set of location privacy mechanisms. Sporadic services include those that require location infrequently, rather than continuously (think of restaurant suggestions rather than relaying real-time GPS streams). The key novelty of the approach is that the model of location exposure, as well as privacy protection is very general. It encompasses anonymization, generalization and obfuscation of location, use of fake traffic and suppression of location. In turn the analysis relies on advanced models of location and mobility (based on markov chains) and is based on Bayesian inference. The evaluation of different location privacy techniques is done on real-world traces from SF taxis.
I am one of the authors of this work, so of course I think it is awesome! More seriously, it is one of the fist works to combine under a common framework a multitude of location privacy mechanisms, and provide a common evaluation framework for them, to quantify the degree of protection they offer relatively to each other for different adversaries. It is also one of the first systematic applications of Bayesian inference to analyze location privacy — extending the inference paradigm beyond the analysis of network anonymity systems.
Of course this is not the last word. Only a subset of protection techniques and combination of techniques were look at, and other protection mechanisms can be integrated and evaluated in the same framework (the tracing model and threat model can be unchanged). Secondly, the analysis itself may be augmented with side-information — be it commercial transactions or traces of network traffic — that may be giving some information about location, to increase the capabilities of the adversary (or make them more realistic). The model we use, based on markov chains, has the benefit of giving analytically tractable results, but numerical techniques may be used to extend it to be more true to real-life attacks.
Privacy in Mobile Computing for Location-Sharing-Based Services
Igor Bilogrevic and Murtuza Jadliwala (EPFL), Kubra Kalkan (Sabanci University), Jean-Pierre Hubaux (EPFL), and Imad Aad (Nokia)
This paper looks at applications where users need to share their location. For example, two users may want to find out if they are close to each other or where they should meet in order to share a taxi ride. Yet, those users do not want to leak any of their location information to the other users or the service provider. More specifically two users specify a set of ranked prefered location they could meet and the system needs to determine on of those fairly without revealing the current location or other preferences (except the one chosen to meet). This is called the fair rendez-vous problem.
The key contribution of this work is to show that this problem can be set with a set of concrete cryptographic protocols. It also presents an implementation of these algorithms on a real mobile phone to show that it is practical. The cryptographic computations are based on homomorphic encryption schemes as well as interactions with the service (to do multiplication that is not possible with Paillier). The implementation on a mobile phone takes a few seconds on the client and the server, and is paralelizable in the number of users. Untypically, the authors also did a user study: users were asked what their concerns were, and after using the application of the phone they were asked how usable it was, and whether they appriciated the privacy provided by the application.
This is a really nice example of a privacy application, that applies advanced crypto, but also evaluates it on a real platform for performance as well as users’ reaction to it. The obvious extensions to this work would be to generalize it to more complex rendez-vous protocols, as well as other location sharing applications. It is good to see that modern mobile devices can do plenty of crypto in a few seconds, so I am very hopeful we will see more work in this field.
On The Practicality of UHF RFID Fingerprinting: How Real is the RFID Tracking Problem?
Davide Zanetti, Pascal Sachs, and Srdjan Capkun (ETH Zurich)
This paper looks that UHF tags — they are the dumb tags that can be read at about 2m that are attached to things you buy to facilitate stock management or customer aftercare. Interestingly this study looks at how identifiable the tags are at the physical layer, not using the actual tag ID! Therefore these techniques may bypass any privacy protection that attempt to prevent access to the tag ID. It turns our that one can build a unique and reliable ID for a tag from its physical characteristics that can be used to trace people as they move around.
What is new about this work is that the focus was on practicality and cost of extracting a reliable fingerprint (previous approaches relied on expensive equipment and laboratory conditions). The solution was implemented using a cheap software radio (USRP2 device + PC).
I am not quite sure what to conclude from the evaluation on the quality of the fingerprint. It seems that an adversary can place tags within one of 83 to 100 groups. Is this really a good results or not? I guess it depends on the application and the density of tags. Of course if more than one tag is carried, then the adversary could combine fingerprints to identify individuals more easily — if you carry 5 tags you have a 20 bit IDs. Interestingly, there is extensive evaluation of the stability of the tag to temperature and mobility — it turns out that these factors do affect the quality of the fingerprint and further reduce the effective number of unique IDs that can be extracted (down to about 49 classes).
It would be interesting to combine this attack vector with the ideas from the first paper (pretending that the short physical IDs are a version of a privacy protection system) to evaluate the effectivness of tracing a set of individual throughout town.
27 July 2011
I am currently sitting at the PETS 2011 symposium in Waterloo, CA. I will be blogging about selected papers (depending on the sessions I attend) over the next couple of days — authors and other participants are welcome to comment!
The first session is about data mining and privacy.
“How Unique and Traceable are Usernames?”
Daniele Perito, Claude Castelluccia, Mohamed Ali Kaafar, and Pere Manils (INRIA)
The first paper looks at the identifiably of on-line usernames. The authors looked at user names from different sites and assess the extent to which they can be linked together, as well as link them to a real person. Interestingly they used Google Profiles as ground truth, since they allow users to provide links to other accounts. First they assess the uniqueness of pseudonyms based on a probabilistic model: a k-th order markov chain is used to compute the probability of each pseudonym, and its information content (i.e. -log_2 P(username)). The authors show that most of the usernames observed have “high entropy” and should therefore be linkable.
The second phase of the analysis looks at usernames from different services, and attempts to link them even given small modifications to the name. The key dataset used was 300K google profiles, that list (sometimes — they used 10K tuples of usernames) other accounts as well. They then show that the Levenshtein distance (i.e. edit distance) of usernames from the same person is small compared to the distance of two random user names. A classifier is built, based on a threshold of the probabilistic Levenshtein distance, to assess whether a pair of usernames belongs to the same or a different user. The results seem good: about 50% of usernames are linkable with no mistakes.
So what are the interesting avenues for future work here? First, the analysis used is a simple threshold on the edit distance metric. It would be surprising if more advanced classifiers could not be applied to the task. I would definitely try to use random forests for the same task. Second, the technique can be used for good not evil: as users try to migrate between services, they need an effective way to find their contacts — maybe the proposed techniques can help with that.
“Text Classification for Data Loss Prevention” (any public PDF?)
Michael Hart (Symantec Research Labs), Pratyusa Manadhata (HP Labs), and Rob Johnson (Stony Brook University)
The paper looks at the automatic classification of documents as sensitive or not. This is to assist “data loss prevention” systems, that raise an alarm when personal data is about to be leaked (i.e. when it is about to be emailed or stored on-line — mostly by mistake). Traditionally DLP try to describe what is confidential through a set of simple rules, that are not expressive enough to describe and find what is confidential — thus the authors present a machine learning approach to automatically classify documents using training data as sensitive or not. As with all ML techniques there is a fear of mistakes: the technique described tries to minimise errors when it comes to classifying company media (ie. public documents) and internet documents, to prevent the system getting on the way of day to day business activities.
The results were rather interesting: the first SVN classifier looked at unigrams with binary weights to classify documents. Yet, it first had a very high rate of false positives for public documents. It seems the classifiers also had a tendency to classify documents as “secret”. A first solution was to supplement the training set with public documents (from wikipedia), to best described “what is public”. Second, the classifier was tweaked to (in a rather mysterious way to me) by “pushing the decision boundary closer to the true negative”. A further step does 3-category classification as secret, public and non-enterprise, rather than just secret and not-secret.
Overall: They manage to get the false positive / false negative rate down to less than 2%-3% on the largest datasets evaluated. That is nice. The downside of the approach, is common to most work that I have seen using SVNs. It requires a lot of manual tweaking, and further it does not really make much sense — it is possible to evaluate how well the technique performs on a test corpus, but difficult to tell why it works, or what makes it good or better than other approaches. As a resut, even early positive resutls should be considered as preliminary until more extensive evaluation is done (more like medicine rather than engineering). I would personally like to see a probabilistic model based classifier on similar features that integrates the two-step classification process into one model, to really understand what is going on — but then I tend to have a Baysian bias.
P3CA: Private Anomaly Detection Across ISP Networks
Shishir Nagaraja (IIIT Delhi) and Virajith Jalaparti, Matthew Caesar, and Nikita Borisov (University of Illinois at Urbana-Champaign)
The final paper in the session looks at privacy preserving intrusion detection to enable cooperation between internet service providers. ISPs would like to pool data from their networks to detect attacks: either because the volume of communications is abnormal at certain times, or because some frequency component is odd. Cooperation between multiple ISPs is important, but this cooperation should not leak volumes of traffic at each IPS since they are a commercial secret.
Technically, privacy of computations is achieved by using two semi-trusted entities, a coordinator and key holder. All ISPs encrypt their traffic under an additive homomorphic scheme (Paillier) under the keyholder key, and send it to the coordinator. The coordinator is using the key-holder as an oracle to perform a PCA computation to output the top-n eighen vectors and values of traffic. The cryptographic techniques are quite but standard, and involve doing additions, subtraction, multiplication, comparison and normalization of matrices privately though a joint private two-party computation.
Surprisingly, the performance of the scheme is quite good! Using a small cluster, can process a few tens of time slots from hundresds of different ISPs in tens of minutes. A further incremental algorithm allows on-line computations of eighen vector/value pairs in seconds — making real-time use of the privacy preserving algorithm possible (5 minutes of updates takes about 10 seconds to process).
This is a surprising result: my intuition would be that the matrix multiplication would make the approach impractically slow. I would be quite interested to compare the implementation and algorithm used here with a general MPC compiler (under the same honest-but-curious model).