5 November 2015
I laughed out loud when I saw the calls from Andrew Parker, the head of MI5, for a mature debate on surveillance, in particular in relation to the draft investigatory Powers Bill (via Paul Bernal). My reading of the IP Bill is that it will result in, and perhaps intends, closing for ever the democratic debate about what constitutes acceptable state surveillance.
Gagging orders for targeted warrants: interception, equipment interference and communications data. S.43(1-7) impose a gag order in relation to the existence or any other aspects of an interception warrant, except for seeking legal advice. S.44(2)(a) makes it an offense to disclose anything about such a warrant, with a penalty of up to 12 months in jail and / or a fine. Similar provisions exist for “equipment interference”: S.102 makes it an offense for a telecommunication provider disclose anything about a warrant for hacking someone! Similar secrecy provisions apply to notices for handling out communication data (S.66).
These prohibitions may make sense in the context of operational needs for secrecy — such as during investigations. But what about when the warrant expires? What about either interception or equipment interference against subjects, organizations, or others that does not lead to any criminal or other conviction — namely against innocent people and associations? What is the imperative for keeping those secret? The imperative is simply to keep the debate about the surveillance capabilities, the uses of warrants, the selection of targets for surveillance, the prevalence of surveillance, and the techniques used and their proportionality secret — namely to avoid even the possibility of a mature debate in the future.
Gagging orders for retention notices. The previous warrants and notices clearly applied, at least for some time, to operations against specific targets. More interestingly, secrecy is also required when it comes to issued retention notices: S.77, makes disclosing such a notice a civil offence.
What this means is that the secretary of state may issue notices for operators to keep some communication data, but these operators are not allowed to tell anyone! This despite the significant public policy interest on the matter, that has in fact led to numerous challenges against such policies, and the eventual legal challenge of the EU Data Retention Regulation in the European Court of Justice. Of course this may lead to nonsensical outcomes: I could build a service, and deploy it in the UK or elsewhere (remember extra-territoriality S.79) only to be told that a retention notice exists covering my service — which was previously unknown to me due to secrecy, and that I cannot discuss or challenge politically openly due to the same secrecy.
This is in contrast with, for example, the Data Retention directive that provided a strict list of services and categories of data that were to be retained, in the text of the directive — not in secret. Even those provisions were found to not be proportional, so go figure what the gagging order in the IP Bill is. This provision clearly aims to make the IP Bill the last, if any, political discussion on retention, its proportionality, necessity or legitimacy in a democratic society. Once it becomes law, the gagging orders will hide what is retained at all.
Gagging orders for bulk interception and interference. Given the audacity of enabling bulk interception and bulk interference, while maintaining the IP Bill is not about mass surveillance, it is no surprise that gagging orders are also imposed on those asked to facilitate it: S.120(b) states that disclosures should not be made about the existence or facilitation of bulk interception, and S.148 prohibits disclosure of a bulk interference warrant — making it illegal to even discuss that mass hacking might be taking place! Those apply to overseas operators too.
Gagging orders for bulk communications data collection. Bulk acquisition follows the pattern, and a special offence is created in relation to disclosing anything about to it in S.133. Again, this goes way beyond protecting specific operation, since the acquisition is performed in bulk, and cannot betray any specifics. The secrecy order protects the capability to access in bulk certain categories of communication data, which in effect means shielding it from any proper scrutiny as related to its necessity, or appropriateness in the future,or any debate on that matter.
Gagging orders in relation to implementing surveillance capabilities & back doors. Finally, gagging orders apply to “technical capability notices” (as well as “national security notices” — the joker card in this legislation allowing to impose any requirement at all). In S.190(8) specified that such notices should not be disclosed.
This should put to rest any romantics — and there are few, but some, in the midst of computer security and cryptography experts — that think that we will have some kind of debate about the type of back doors; or that we can build privacy-friendly back doors; or that somehow when a new technology presents itself we will have a debate about how strong the privacy it provides should. There will be none of this: secret backdoor notices (I mean “technical capability notices”) will be issued, and enterprising geek that wants to open a debate about them will either know nothing about them, or be breaking the law. There will be no debate about what kind of back doors, of when they should be used — all will be happening in total secrecy.
Keeping surveillance evidence out of courts, and the defense’s hands. S.42(1-4) of the Draft IP Bill prevents anyone involved in interception from ever mentioning it took place as part of any legal proceedings. Note that this section is absolute: it does not have exceptions, for example in relation to the public interest: such as the ability to discuss the benefit or downsides of part interception activities; no exception for talking about this to MPs, or other democratic representatives; or even to exculpate anyone who otherwise would be wrongfully found guilty. Similar provisions (S.120(a)) keep the fruits of bulk interception out of courts.
Secret hearings in secret tribunals and commissioners. There exist provisions from RIPA for secret hearings and appeals in front of secret tribunals. There are also provisions for the commissioners looking at what is doing on. These are so weak, so removed from democratic practice, and so alien to concepts of the rule of law and democratic rule — let alone nonsensical — that I am not going to discuss them further.
In conclusion. For sure the Investigatory Powers Bill future proofs surveillance capabilities: mostly against future democratic scrutiny. Once it becomes law, its “technology” neutral provision can be applied to intercept, collect, back door, hack, even in bulk, while making it illegal to even discover, and as a result discuss or make policy about, interferences with private life the state is up to. The gagging provisions are a clear example that calls for a mature debate around surveillance are mere rhetoric, the securocrats want one last discussion before making any discussion about surveillance simply impossible.
4 November 2015
At last the UK government today published the draft Investigatory Powers Bill, after about a week of carefully crafted briefings aimed at managing opinion, and even dissent. The document comes bundled with a lot of supplementary material, purporting to be from “A Guide” to “Explanatory Notes”. As Richard Clayton advised me a while back: don’t read them! Those are simply smoke-and-mirrors, designed to mislead, provide material for lazy journalists and confuse the reader — the only thing that has legal validity is the law itself on pages 35-227.
The good news is that I read through those 181 pages, and extracted the “juicy bits” from a technology public policy point of view. I am no lawyer, but am not as much interested in the fine print of the law. I am interested in the capabilities that the government wants to grant itself when it comes to, basically, attacking computers and telecommunication systems — with a view to understanding the business of policing and intelligence. So here are my notes…
20 August 2015
This posts presents a quick opinion on a moral debate, that seems to have taken large proportions at this year’s SIGCOMM, the premier computer networking conference, related to the following paper:
Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests
by Sam Burnett (Georgia Tech) and Nick Feamster (Princeton).
The paper was accepted to be presented, along with a public review by John W. Byers (Boston) that summarizes very well the paper, and then presents an account of the program committee discussions, primarily focused on research ethics.
In a nutshell the paper proposes using unsuspecting users browsing a popular website as measuring relays to detect censorship. The website would send a page to the users’ browser — that may be in a censored jurisdiction — that actively probes potentially blocked content to establish whether it is blocked. Neat tricks to side-step and use cross domain restrictions and permissions may have other applications.
Most of the public review reflected an intense discussion on the program committee (according to insiders) about the ethical implications of fielding such a system (2/3 of the 1 side is devoted to this topic). The substantive worry is that, if such a system were to be deployed the probes may be intercepted and interpreted as willful attempt to bypass censorship, and lead to harm (in “a regime where due process for those seen as requesting censored content may not exist”). Apparently this worry nearly led to the paper being rejected. The review goes on to disavow this use case — on behalf of the reviewers — and even call such measurements unethical.
I find this rather lengthy, unprecedented and quite forceful statement a bit ironic, not to say somewhat short-sighted or even hypocritical. Here is why.
1 July 2015
One of my key annoyances while doing work in privacy technologies is the poor support of key cryptography libraries in my favorite Python programming language. Today, I would like to share my work on building petlib, a more or less pythonic, wrapper around the OpenSSL low level crypto and math libraries, as well as numerous example privacy technologies (PETs) that I have implemented as examples.
The needs of someone doing research in the PETs field are quite different from other developers: on one hand we need access to low level primitives (such as block cipher and hash function operations), as well as low level mathematical functions on big integers and elliptic curves on finite fields. A number of available libraries try to hide those primitives from developers behind abstractions such as “signed envelopes” or “secure sockets” — which so not serve those who try to build different abstractions. On the other hand, issues such a tight memory management and absolute control over other low-level aspects of the library are not essential; in fact a clean programming interface that leads to beautifully clear reference code for proposed protocols is preferable.
The petlib library is available for everyone to use, and after installing the OpenSSL prerequisites can be acquired through the python repositories through:
pip install petlib
The petlib library was used as the basis for teaching the labs of the Privacy Enhancing Technologies course at UCL, and thus has extensive documentations, and is properly version controlled, packaged and tested:
- petlib github repository
- petlib installation and programming documentation (read the docs)
- petlib listing on pypi
The best way to get a feel for how the library can be used to build PETs prototypes is to browse the examples in the source tree:
- A toy RSA example
- A simple Schnorr Zero-Knowledge Proof
- An additivelly homomorphic public key encryption scheme
- A generic engine for building zero-knowledge proofs using sigma protocols
- The Groth-Kohlweiss ring signature and zero-cash scheme
- The algebraic MAC scheme by Chase, Meiklejohn and Zaverucha
- An anonymous credential scheme based on the aMAC scheme above
- The Baldimtsi and Lysyanskaya anonymous credentials light scheme
In terms of more real-work research project, we have already used petlib for implementing prototypes for a few projects and labs:
- The centrally banked cryptocurrency framework (with Sarah Meiklejohn)
- A private stats collection system for Tor (with Melis and De Cristofaro)
- Exercises in Privacy Enhancing Technologies (for UCL COMPGA17)
One key missing component from the underlying OpenSSL crypto library is support for computations on pairings of elliptic curves. This limits the types of protocols that can currently be implemented with petlib, until such functionality becomes available in the underlying libraries (please contribute!) Bug reports and pull requests with fixes to the code and documentation are very welcome.
30 June 2015
The course covers principally, and in some detail, engineering aspects of PETs and caters for an audience of CS / engineering students that already understands the basics of information security and cryptography (although these are not hard prerequisites). Students were also provided with a working understanding of legal and compliance aspects of data protection regimes, by guest lecturer Prof. Eleni Kosta (Tilburg); as well as a world class introduction to human aspects of computing and privacy, by Prof. Angela Sasse (UCL). This security & cryptographic engineering focus sets this course apart from related courses.
The taught part of the course runs for 20 hours over 10 weeks, split in 10 topics:
- Introduction and privacy in communication. (01-GA17-IntroComms)
- Anonymous communications & Traffic analysis (02-GA17-Anonymous-Comms)
- Private Computations with homomorphic encryption and secret sharing (03-GA17-Private-Computations)
- Privately checking inputs using Zero-Knowledge Proofs (04-GA17-ZeroKnowlegde)
- Private authorization using selective disclosure credentials (05-GA17-Selective-Disclosure)
- Data anonymization & de-anonymziation attacks (08-GA17-Data-Anonymization)
- Private Storage, queries and lookups (09-GA17-Storage-Retrieval)
- Privacy by design case-studies (10-GA17-Privacy-by-design-case-studies – Copy)
- Guest lectures: Human aspects (Angela Sasse)
- Guest lectures: Data Protection (Eleni Kosta)
Most importantly the course includes 10 hours of labs (20 next year!), split into 5 exercises, that give students (and their teachers!) hands on experience implementing extremely advanced privacy enhancing technologies. More generally the course provides an introduction to solid cryptographic engineering, test-driven development, testing & QA tools and code audits. The programming language used was Python on a Linux environment, with the petlib library that was specially developed for this course.
For each lab exercise students in pairs were provided with a partial code file, and a set of unit tests, and were asked to fill in the remaining code to fulfill the task, and at least make the unit tests pass. The topics of the exercises track the first 5 lecture topics:
- Private communications and basic programming with petlib
- Building a simple mix server and client
- Building a private polling system with homomorphic encryption
- Basics of zero-knowledge proofs of knowledge, equality and linear statement
- A basic selective disclosure authorization credential system
Finally, part of the grading was based on students performing a code review of other groups, looking for code defects leading to security or other bugs.
Overall, I am very proud of the progress everyone made. The course was attended by 16 MSc student and 2 MEng students. Everyone eventually was able to complete all lab assignment — not a given considering the advanced nature of the tasks at hand. It was evident while discussing with student the final exercise, on building a selective disclosure credential, that many had developed an intuitive understanding of how to build solutions based on zero-knowledge protocols, and all had definitely overcome their initial fear of these more advanced concepts in PETs.
I was also very impressed with many students that were able to tackle the hardest questions in the exam. One of those questions, basically asked students to re-invent a variant of the privacy preserving genomic testing protocol we presented at WPES 2014 — and many did successfully. Similarly, they were asked to de-anonymize a mechanism very similar to the 15:15 rule in place in California to “protect” smart meter reading, and again many did so successfully under time constrain and the high pressure environment of exams. As ever, the great engagement from students was the most rewarding part of teaching the course.
All material is available online (see links to slides, and git repositories), and I would be delighted to share / receive any additional exercises by others finding this material relevant to their courses.
I will be participating on a panel this afternoon on “Creating Usable and Secure Software”, in the context of the conference on Digital Citizenship and Surveillance Society. I share a platform with a number of illustrious people — Dave Hrycyszyn, Lola Oyelalo and Blaine Cook — with a much deeper experience in usable software and services development. However, I will attempt to provide some context, and my opinions, on why we can observe a broadly poor state of affairs when it comes to usability of privacy technologies — and hopefully open a discussion on how to overcome roadblocks.
My main two positions will be as follows:
- The political context within which technical security and privacy research and development had to be conducted over the past 40 years greatly contributed to the lack of wide deployment and poor usability of privacy technologies.
- The lack of “knowledge” about methods for developing usable privacy friendly solutions only offer a partial explanation for this poor state of affairs, and has to compete with other roadblocks that systemically undermined the deployment of usable privacy technologies.
First, it is worth reminding ourselves that research into security technologies and strong cryptography specifically, was until recently the prerogative of governments. Public discussion and know-how on this topic was developed seriously after the mid-1980s, and often despite serious pressure from the US and other governments. The technical security community is small and there remain serious technical challenges to providing privacy friendly solutions — solutions that require deep expertise developed over years of practice (requiring funding).
Second, the export control regimes, and also requirements for cooperation with law-enforcement slowed down significantly the blanket deployment of privacy technologies even after the strict export control regime of the 1990s was lifted. What makes a number of privacy technologies unusable — email encryption, instant messaging encryption — is the fact that common clients do not support them transparently and by default — requiring plug-ins, user configuration and manual key management. Thus the lasting impact of these regulation has not been the non-proliferation of strong crypto technologies, but the lack of integration of these into mainstream platforms. It is telling that the current Law Enforcement and Government narrative is not about preventing encryption know-how from spreading, but rather discouraging wide deployment of such technologies without the ability for back-door or front-door access.
Third, there are commercial pressures — which again have been related with government hostility of the wide deployment of privacy technologies. It is easy to forget that governments, are major customers of technologies. Thus they are able to dictate requirements that make it difficult to widely deploy privacy technologies. It is telling that mainstream mail clients — such as Microsoft Outlook — do not transparently support PGP based end-to-end encryption and have instead opted for S/MIME and models that make the use of encryption by individuals rather difficult. In this context one may assume that the key customers of this software — large enterprises and governments — simply never asked for such features, and in fact probably considered such a feature to conflict with other requirements (such as the need to recover mail of employees, backup, …).
These commercial pressures, have changed in the past few years, as large internet companies start relying heavily on serving end-users (search, webmail, social networking). Sadly, these companies have adopted both a business model — ad-based monetization — and a technical architecture — cloud computing — that makes meaningful privacy protection very difficult. In turn the “success” of those architectures has lead to an extreme ease of developing using this model, and an increasing difficulty in providing end-user solutions with appropriate privacy protections — let alone usable ones.
The rise of services has pushed a number of key privacy technologies into not being commercially supported and a key feature, and in effect at best a “common” — with the governance and funding problems this entails. We have recently learned about the systemic under funding of key privacy technologies such as OpenSSL and GPG. Technologies like Tor are mostly funded for their national firewall traversal features, seeing development on anonymity features suffer. Unlike other commons (health, parks, quality assurance in medicines), the state has not stepped in to either help with governance or with funding — all the opposite. For example, standardization efforts have systematically promoted “surveillance by design” instead of best of breed privacy protection; funding for surveillance technology is enormous compared to funding for privacy technologies, and somehow ironically, a number of calls for funding of privacy technologies are in the context of making surveillance more “privacy friendly” — leading to largely non-nonsensical outcomes.
So, lack of “knowledge” about how to develop usable software, while also a contributing factor, has to be seen within the context of the above structural pressures. In parallel, pressures undoubtedly exist when it comes to the discipline of UX which is in itself recent, and constantly involving. Along with serious funding for collaboration on building more usable privacy software (which the Simply Secure project that I am associated with attempts to provide), we need a strategy to counter those systemic pressures to ensure the wide deployment of usable privacy technologies.
18 December 2014
Last week I had the opportunity to attend a joint US National Academy of Sciences and UK Royal Society event on cyber-security in Washington DC. One of the speakers, a true expert that I respect very much, described how they envision building (more) secure systems, and others in the audience provided their opinion (Chatham House Rule prevents me from disclosing names). The debate was of high quality, however it did strike me that it remained at the level of expert opinion. After 40 years of research in cyber-security, should we not be doing better than expert opinion when it comes to understanding how to engineer secure systems?
First, let me say that I have a great appreciation for craftsmanship and the deep insights that come from years of practice. Therefore when someone with experience tells me to follow a certain course of action to engineer a systems, in the absence of any other evidence, I do listen carefully. However, expert opinion is only one, and in some respects the weakest form of evidence in what researchers in other disciplines have defined as a hierarchy of evidence. Stronger forms of evidence include case studies, case-control and cohort studies, double-blind studies with good sample sizes and significant results, and systematic meta-analyses and reviews.
In security engineering we have quite a few case reports, particularly relating to specific failures, in the form of design flaws and implementation bugs. We also have a set of methodologies as well as techniques and tools that are meant to help with security engineering. Which work, and at what cost? How do they compare with each other? What are the non-security risks (cost, complexity, training, planning) associated with them? There is remarkably little evidence, besides at best expert opinion, at worse flaming, to decide. This is particularly surprising, since a number of very skilled people have spent considerable time advocating for their favorite engineering paradigms in the name of security: static analysis, penetration testing, code reviews, strong typing, security testing, secure design and implementation methodologies, verification, pair-coding, use of specific frameworks, etc. However, besides opinion it is hard to find much evidence of how well these work in reducing security problems.
I performed a quick literature survey, which I add here for my own future benefit:
28 November 2014
It takes quite a bit of institutional commitment and vision to build a strong computer security group. For this reason I am delighted to share here that UCL computer science has in 2014 hired three amazing new faculty members into the Information Security group, bringing the total to nine. Here is the line-up of the UCL Information Security group and teaching the MSc in Information Security:
- Prof. M. Angela Sasse is the head of the Information Security Group and a world expert on usable security and privacy. Her research touches upon the intersection of security mechanisms or security policies and humans — mental models they have, the mistakes they make, and their accurate or false perceptions that lead to security systems working or failing.
- Dr Jens Groth is a cryptographer renowned for his work on novel zero-knowledge proof systems (affectionately known as Groth-Sahai), robust mix systems for anonymous communications and electronic voting and succinct proofs of knowledge. These are crucial building blocks of modern privacy-friendly authentication and private computation protocols.
- Dr Nicolas Courtois is a symmetric key cryptographer, known for pioneering work on algebraic cryptanalysis, extraordinary hacker of real-world cryptographic embedded systems, who has recently developed a keen interest in digital distributed currencies such as bitcoin.
- Prof. David Pym is both an expert on logic and verification, and also applies methods from economics to understand complex security systems and the decision making in organizations that deploy them. He uses stochastic processes, modelling and utility theory to understand the macro-economics of information security.
- Dr Emiliano de Cristofaro researchers privacy and applied cryptography. He has worked on very fast secure set intersection protocols, that are key ingredients of privacy technologies, and is one of the leading experts on protocols for privacy friendly genomics.
- Dr George Danezis (me) researches privacy technologies, anonymous communications, traffic analysis, peer-to-peer security and smart metering security. I have lately developed an interest in applying machine learning techniques to problems in security such as anomaly detection and malware analysis.
- Dr Steven Murdoch (new!) is an world expert on anonymous communications, through his association with the Tor project, banking security and designers of fielded banking authentication mechanisms. He is a media darling when it comes to explaining the problems of real-world deployed cryptographic systems in banking.
- Dr Gianluca Stringhini (new!) is rising star in network security, with a focus on the technical aspects of cyber-crime and cyber-criminal operations. He studies honest and malicious uses of major online services, such as social networks, email services and blogs, and develops techniques to detect and suppress malicious behavior.
- Dr Sarah Meiklejohn (new!) has a amazing dual expertise in theoretical cryptography on the one hand, and digital currencies and security measurements on the other. She has developed techniques to trace stolen bitcoins, built cryptographic compilers, and contributed to fundamental advances in cryptography such as malleable proof systems.
One key difficulty when building a security group is balancing cohesion, to achieve critical mass, with diversity to cover a broad range of areas and ensuring wide expertise to benefit our students and research. I updated an interactive graph illustrating the structure of collaborations amongst the members of the Information Security Group, as well as their joint collaborators and publication venues. It is clear that all nine faculty members both share enough interest, and are complementary enough, to support each other.
Besides the nine fullt-time faculty members with a core focus on security, a number of other excellent colleagues at UCL have a track record of contributions in security, supporting teaching and research. Here is just a handful:
- Prof. Brad Karp is an expert in networking and systems and has made seminal contributions to automatic worm detections and containment.
- Dr David Clark specializes in software engineering with a core interest in information flow techniques for confidentiality, software security and lately malware.
- Dr Earl Barr researches software engineering, and has researched security bugs, and malware as well as ideas for simple key management.
- Prof. Ingemar Cox (part-time at UCL) is a world expert in multimedia security, watermarking and information hiding.
- Prof. Yvo Desmedt (part-time at UCL) is a renowned cryptographer with key contributions in group key exchange, zero-knowledge and all fields of symmetric and asymmetric cryptography.
The full list of other colleagues working in security, including visiting researchers, post-doctoral researchers and research students list many more people — making UCL one of the largest research group in Information Security in Europe.
26 September 2014
I am today attending the first Internet Privacy Engineering Network (IPEN), where the issue of translating Data Protection principles into requirements has been raised a number of times. While this exercise needs to be repeated for each given service or application, it reminded me that I had drafted a number of generic Technical Requirements for Processing PII. These need to be reviewed and validated, but I hope they offer at least proof that the problem can be made tractable.
The paper “I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis” is revisiting the vulnerability of HTTPS encryption to traffic analysis.
The authors note that most secure web connections access directly a specific website that is requested by a user (as opposed to the setting where a user is observed using Tor, where the site is not known). There are challenges in doing this: different websites are usually quite distinct, while many sub-pages in the same website may share a common layout that makes them hard to distinguish through encryption. Additional complications are added by cache, cookies, and the use of Flash. However, this setting may be easier than the Tor case, in that the adversary could realistically enumerate all pages on a website.
The paper chose 10 sites, including health, finance, video rental, and civil liberties. Traces are collected from encrypted requests and replies, and some simple pre-processing is done. In particular, the traffic volumes are collated in bursts and only their sizes are stored. This is done to add robustness to jitter in the traffic. Traces are then clustered using k-means and Gaussians are used to model each of the clusters. The use of the Gaussian distribution provides some further resilience to noise in the size of resources.
A machine learning approach is then followed to select which features distinguish pages: a logistic regression model is used to learn weights that best separate pages. A Hidden Markov Model is then used to track full traces of used navigation through an encrypted web-site.
To evaluate the approach traces were collected using virtual machines. This allows for the full state of a VM to be frozen, to reset the machine state perfectly between trace collection. Traces were collected in sessions where user state was meant to be preserved, and otherwise a fresh VM could be spun to clear the state. This way different conditions of cache on/off, cookies and browsing different sites were used for the collection. Interestingly the effect of cache being on systematically reduced the accuracy of the attack by 10%-20%.
Finally, the paper presents a defense against traffic analysis. It is based on padding the bursts of traffic to some threshold in order to confuse the feature extraction stage. Since many thresholds are possible, and the maximum is chosen, subject to a size “inflation” budget. Other defenses were also tried and compared, and there is a substantial reduction in the accuracy of this attack.