Boing Boing just released a classified GCHQ document that was meant to act as the Sept 2011 guide to open research problems in Data Mining. The intended audience, Heilbronn Institute for Mathematical Research (HIMR), is part of the University of Bristol and composed of mathematicians working for half their time on classified problems with GCHQ.
First off, a quick perusal of the actual publication record of the HIMR makes a sad reading for GCHQ: it seems that very little research on data mining was actually performed post-2011-2014 despite this pitch. I guess this is what you get trying to make pure mathematicians solve core computer science problems.
However, the document presents one of the clearest explanations of GCHQ’s operations and their scale at the time; as well as a very interesting list of open problems, along with salient examples.
Overall, reading this document very much resembles reading the needs of any other organization with big-data, struggling to process it to get any value. The constrains under which they operate (see below), and in particular the limitations to O(n log n) storage per vertex and O(1) per edge event, is a serious threat — but of course this is only for un-selected traffic. So the 5000 or so Tor nodes probably would have a little more space and processing allocated to them, and so would known botnets — I presume.
Secondly, there is clear evidence that timing information is both recognized as being key to correlating events and streams; and it is being recorded and stored at an increasing granularity. There is no smoking gun as of 2011 to say they casually de-anonymize Tor circuits, but the writing is on the wall for the onion routing system. GCHQ at 2011 had all ingredients needed to trace Tor circuits. It would take extra-ordinary incompetence to not have refined their traffic analysis techniques in the past 5 years. The Tor project should do well to not underestimate GCHQ’s capabilities to this point.
Thirdly, one should wonder why we have been waiting for 3 years until such clear documents are finally being published from the Snowden revelations. If those had been the first published, instead of the obscure, misleading and very non-informative slides, it would have saved a lot of time — and may even have engaged the public a bit more than bad powerpoint.
5 December 2015
As many in the UK are fighting a rear-guard action to prevent the most shocking provisions of the IP Bill becoming law (incl. secrecy and loose definitions), I was invited to provide three public policy recommendations for strengthening IT security in the EU. Instead of trying to limit specific powers (such as backdoors) here are some more radical options, more likely to resolve the continuous tug-of-war cyber civil liberties and the security services have been engaging in a while.
7 November 2015
The recently unveiled UK Draft IP Bill imposes all sorts of obligations on telecommunications operators, including obligations to collaborate with warrants to facilitate surveillance, hack, notices to retain data, handing it out in bulk, and even obligations to implement bag doors, as well as gagging orders. Despite their centrality, it is surprisingly difficult to clearly understand who exactly is a “telecommunication operator”, and therefore on whom these obligations apply.
The scope of the legislation would be vastly different if it only applies to traditional telecommunication companies that control physical infrastructure, such as BT or cable companies, versus more widely to any internet service that allows messaging in any form, such as google chat, facebook, whatsapp and tinder (or any other dating app). What if it also applied to general purpose software and hardware companies, or free software projects? As ever, it is unwise to rely on the explanatory notes, or the announcements of politicians to elucidate this question — they have no legal validity. So I turn to the legislation itself, to try to get some insights.
S.193 provides definitions, and specifically S.193(8) to S.193(14) defines telecommunication operators, public and private, telecommunication services and finally telecommunication systems. We will take them in turn. I am always surprised how obscure, subtle, and wide-ranging, such definitions are.
S.193(10) Defines a telecommunications operator as being one of two things: they either offer a telecommunications “service” to persons in the UK; or they control or provide a telecommunication “system” which is at least in part in the UK, or controlled from the UK. Note the choice of subtle difference between a “service” and a “system“, as well as “offer“, “provide” versus “control“.
S.193(11) defined what a telecommunications service is: it is anything that provides, accesses, or facilitates the use of a telecommunication system. Helpfully, it points out that a service may be using a system provided by someone else: presumably this is intended to label as operators those providing services over infrastructure, logical or physical, provided by others; or software and hardware provided by others.
There is a further clarification in S.193(12): something is a telecommunications service if it is involved in the facilitation of the creation, management or storage of communications transmitted by a telecommunication system. Particularly troubling is the mention of “creation”: it might be used to argue that client side applications do facilitate the creation of communications (and their storage), and therefore are a telecommunication service. Their provision thus makes potential creators of software and apps, and for sure those providing web-mail and instant messaging services, telecommunication operators.
Finally, S.193(13) defines as a telecommunications system a system that in any way transmits communications using electric or electromagnetic energy including the communication apparatus (machinery) that is used to do this. The definition is very wide ranging, and includes all communications, except postal (which are dealt separately), and all telecommunication equipment in use.
I am not a lawyer (but neither are most MPs — only about 15% are legally trained).
My reading of the telecommunications operator definition is that it encompasses everyone that is somehow related to communications: their creation, management, storage, transmissions, processing, routing, etc. In my view this covers internet services and phone apps that allow private messaging at least: social network, instant messaging applications, dating websites, on-line games, etc. Of course it also covers trivially traditional telephony, mobile or fixed, Internet Service Providers and cable providers.
It is less clear whether only messaging and internet services, or also suppliers or hardware and software, are covered by this definition. For example, one could argue that a software vendor “provides a telecommunications system (S.193(10)(b))”, if by system we mean the software used to facilitate transmissions. In fact the definition of “system” includes the “apparatus comprised in it” (S.193(13)), namely software and hardware. Following that argument, software and hardware vendors of general computing equipment may be considered telecommunications operators — when their kit is used in the context of telecommunications. If I consider this argument reasonable, probably judges in secret courts, secretaries of state, and judicial commissioners may be convinced.
This ambiguity has far reaching consequences: if an enacted Investigatory Powers Bill, is interpreted to cover suppliers of communications software and hardware, then they may be coerced by notice to provide “interception capabilities” — government backdoors — into their software and hardware and further facilitate “interference warrants” — hacking — against the customers of their products. Operating system manufacturers, and even processor manufacturers may not be safe from this legislation which will discredit any assertion they make about the security of their products in an international market.
5 November 2015
I laughed out loud when I saw the calls from Andrew Parker, the head of MI5, for a mature debate on surveillance, in particular in relation to the draft investigatory Powers Bill (via Paul Bernal). My reading of the IP Bill is that it will result in, and perhaps intends, closing for ever the democratic debate about what constitutes acceptable state surveillance.
Gagging orders for targeted warrants: interception, equipment interference and communications data. S.43(1-7) impose a gag order in relation to the existence or any other aspects of an interception warrant, except for seeking legal advice. S.44(2)(a) makes it an offense to disclose anything about such a warrant, with a penalty of up to 12 months in jail and / or a fine. Similar provisions exist for “equipment interference”: S.102 makes it an offense for a telecommunication provider disclose anything about a warrant for hacking someone! Similar secrecy provisions apply to notices for handling out communication data (S.66).
These prohibitions may make sense in the context of operational needs for secrecy — such as during investigations. But what about when the warrant expires? What about either interception or equipment interference against subjects, organizations, or others that does not lead to any criminal or other conviction — namely against innocent people and associations? What is the imperative for keeping those secret? The imperative is simply to keep the debate about the surveillance capabilities, the uses of warrants, the selection of targets for surveillance, the prevalence of surveillance, and the techniques used and their proportionality secret — namely to avoid even the possibility of a mature debate in the future.
Gagging orders for retention notices. The previous warrants and notices clearly applied, at least for some time, to operations against specific targets. More interestingly, secrecy is also required when it comes to issued retention notices: S.77, makes disclosing such a notice a civil offence.
What this means is that the secretary of state may issue notices for operators to keep some communication data, but these operators are not allowed to tell anyone! This despite the significant public policy interest on the matter, that has in fact led to numerous challenges against such policies, and the eventual legal challenge of the EU Data Retention Regulation in the European Court of Justice. Of course this may lead to nonsensical outcomes: I could build a service, and deploy it in the UK or elsewhere (remember extra-territoriality S.79) only to be told that a retention notice exists covering my service — which was previously unknown to me due to secrecy, and that I cannot discuss or challenge politically openly due to the same secrecy.
This is in contrast with, for example, the Data Retention directive that provided a strict list of services and categories of data that were to be retained, in the text of the directive — not in secret. Even those provisions were found to not be proportional, so go figure what the gagging order in the IP Bill is. This provision clearly aims to make the IP Bill the last, if any, political discussion on retention, its proportionality, necessity or legitimacy in a democratic society. Once it becomes law, the gagging orders will hide what is retained at all.
Gagging orders for bulk interception and interference. Given the audacity of enabling bulk interception and bulk interference, while maintaining the IP Bill is not about mass surveillance, it is no surprise that gagging orders are also imposed on those asked to facilitate it: S.120(b) states that disclosures should not be made about the existence or facilitation of bulk interception, and S.148 prohibits disclosure of a bulk interference warrant — making it illegal to even discuss that mass hacking might be taking place! Those apply to overseas operators too.
Gagging orders for bulk communications data collection. Bulk acquisition follows the pattern, and a special offence is created in relation to disclosing anything about to it in S.133. Again, this goes way beyond protecting specific operation, since the acquisition is performed in bulk, and cannot betray any specifics. The secrecy order protects the capability to access in bulk certain categories of communication data, which in effect means shielding it from any proper scrutiny as related to its necessity, or appropriateness in the future,or any debate on that matter.
Gagging orders in relation to implementing surveillance capabilities & back doors. Finally, gagging orders apply to “technical capability notices” (as well as “national security notices” — the joker card in this legislation allowing to impose any requirement at all). In S.190(8) specified that such notices should not be disclosed.
This should put to rest any romantics — and there are few, but some, in the midst of computer security and cryptography experts — that think that we will have some kind of debate about the type of back doors; or that we can build privacy-friendly back doors; or that somehow when a new technology presents itself we will have a debate about how strong the privacy it provides should. There will be none of this: secret backdoor notices (I mean “technical capability notices”) will be issued, and enterprising geek that wants to open a debate about them will either know nothing about them, or be breaking the law. There will be no debate about what kind of back doors, of when they should be used — all will be happening in total secrecy.
Keeping surveillance evidence out of courts, and the defense’s hands. S.42(1-4) of the Draft IP Bill prevents anyone involved in interception from ever mentioning it took place as part of any legal proceedings. Note that this section is absolute: it does not have exceptions, for example in relation to the public interest: such as the ability to discuss the benefit or downsides of part interception activities; no exception for talking about this to MPs, or other democratic representatives; or even to exculpate anyone who otherwise would be wrongfully found guilty. Similar provisions (S.120(a)) keep the fruits of bulk interception out of courts.
Secret hearings in secret tribunals and commissioners. There exist provisions from RIPA for secret hearings and appeals in front of secret tribunals. There are also provisions for the commissioners looking at what is doing on. These are so weak, so removed from democratic practice, and so alien to concepts of the rule of law and democratic rule — let alone nonsensical — that I am not going to discuss them further.
In conclusion. For sure the Investigatory Powers Bill future proofs surveillance capabilities: mostly against future democratic scrutiny. Once it becomes law, its “technology” neutral provision can be applied to intercept, collect, back door, hack, even in bulk, while making it illegal to even discover, and as a result discuss or make policy about, interferences with private life the state is up to. The gagging provisions are a clear example that calls for a mature debate around surveillance are mere rhetoric, the securocrats want one last discussion before making any discussion about surveillance simply impossible.
4 November 2015
At last the UK government today published the draft Investigatory Powers Bill, after about a week of carefully crafted briefings aimed at managing opinion, and even dissent. The document comes bundled with a lot of supplementary material, purporting to be from “A Guide” to “Explanatory Notes”. As Richard Clayton advised me a while back: don’t read them! Those are simply smoke-and-mirrors, designed to mislead, provide material for lazy journalists and confuse the reader — the only thing that has legal validity is the law itself on pages 35-227.
The good news is that I read through those 181 pages, and extracted the “juicy bits” from a technology public policy point of view. I am no lawyer, but am not as much interested in the fine print of the law. I am interested in the capabilities that the government wants to grant itself when it comes to, basically, attacking computers and telecommunication systems — with a view to understanding the business of policing and intelligence. So here are my notes…
20 August 2015
This posts presents a quick opinion on a moral debate, that seems to have taken large proportions at this year’s SIGCOMM, the premier computer networking conference, related to the following paper:
Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests
by Sam Burnett (Georgia Tech) and Nick Feamster (Princeton).
The paper was accepted to be presented, along with a public review by John W. Byers (Boston) that summarizes very well the paper, and then presents an account of the program committee discussions, primarily focused on research ethics.
In a nutshell the paper proposes using unsuspecting users browsing a popular website as measuring relays to detect censorship. The website would send a page to the users’ browser — that may be in a censored jurisdiction — that actively probes potentially blocked content to establish whether it is blocked. Neat tricks to side-step and use cross domain restrictions and permissions may have other applications.
Most of the public review reflected an intense discussion on the program committee (according to insiders) about the ethical implications of fielding such a system (2/3 of the 1 side is devoted to this topic). The substantive worry is that, if such a system were to be deployed the probes may be intercepted and interpreted as willful attempt to bypass censorship, and lead to harm (in “a regime where due process for those seen as requesting censored content may not exist”). Apparently this worry nearly led to the paper being rejected. The review goes on to disavow this use case — on behalf of the reviewers — and even call such measurements unethical.
I find this rather lengthy, unprecedented and quite forceful statement a bit ironic, not to say somewhat short-sighted or even hypocritical. Here is why.
I will be participating on a panel this afternoon on “Creating Usable and Secure Software”, in the context of the conference on Digital Citizenship and Surveillance Society. I share a platform with a number of illustrious people — Dave Hrycyszyn, Lola Oyelalo and Blaine Cook — with a much deeper experience in usable software and services development. However, I will attempt to provide some context, and my opinions, on why we can observe a broadly poor state of affairs when it comes to usability of privacy technologies — and hopefully open a discussion on how to overcome roadblocks.
My main two positions will be as follows:
- The political context within which technical security and privacy research and development had to be conducted over the past 40 years greatly contributed to the lack of wide deployment and poor usability of privacy technologies.
- The lack of “knowledge” about methods for developing usable privacy friendly solutions only offer a partial explanation for this poor state of affairs, and has to compete with other roadblocks that systemically undermined the deployment of usable privacy technologies.
First, it is worth reminding ourselves that research into security technologies and strong cryptography specifically, was until recently the prerogative of governments. Public discussion and know-how on this topic was developed seriously after the mid-1980s, and often despite serious pressure from the US and other governments. The technical security community is small and there remain serious technical challenges to providing privacy friendly solutions — solutions that require deep expertise developed over years of practice (requiring funding).
Second, the export control regimes, and also requirements for cooperation with law-enforcement slowed down significantly the blanket deployment of privacy technologies even after the strict export control regime of the 1990s was lifted. What makes a number of privacy technologies unusable — email encryption, instant messaging encryption — is the fact that common clients do not support them transparently and by default — requiring plug-ins, user configuration and manual key management. Thus the lasting impact of these regulation has not been the non-proliferation of strong crypto technologies, but the lack of integration of these into mainstream platforms. It is telling that the current Law Enforcement and Government narrative is not about preventing encryption know-how from spreading, but rather discouraging wide deployment of such technologies without the ability for back-door or front-door access.
Third, there are commercial pressures — which again have been related with government hostility of the wide deployment of privacy technologies. It is easy to forget that governments, are major customers of technologies. Thus they are able to dictate requirements that make it difficult to widely deploy privacy technologies. It is telling that mainstream mail clients — such as Microsoft Outlook — do not transparently support PGP based end-to-end encryption and have instead opted for S/MIME and models that make the use of encryption by individuals rather difficult. In this context one may assume that the key customers of this software — large enterprises and governments — simply never asked for such features, and in fact probably considered such a feature to conflict with other requirements (such as the need to recover mail of employees, backup, …).
These commercial pressures, have changed in the past few years, as large internet companies start relying heavily on serving end-users (search, webmail, social networking). Sadly, these companies have adopted both a business model — ad-based monetization — and a technical architecture — cloud computing — that makes meaningful privacy protection very difficult. In turn the “success” of those architectures has lead to an extreme ease of developing using this model, and an increasing difficulty in providing end-user solutions with appropriate privacy protections — let alone usable ones.
The rise of services has pushed a number of key privacy technologies into not being commercially supported and a key feature, and in effect at best a “common” — with the governance and funding problems this entails. We have recently learned about the systemic under funding of key privacy technologies such as OpenSSL and GPG. Technologies like Tor are mostly funded for their national firewall traversal features, seeing development on anonymity features suffer. Unlike other commons (health, parks, quality assurance in medicines), the state has not stepped in to either help with governance or with funding — all the opposite. For example, standardization efforts have systematically promoted “surveillance by design” instead of best of breed privacy protection; funding for surveillance technology is enormous compared to funding for privacy technologies, and somehow ironically, a number of calls for funding of privacy technologies are in the context of making surveillance more “privacy friendly” — leading to largely non-nonsensical outcomes.
So, lack of “knowledge” about how to develop usable software, while also a contributing factor, has to be seen within the context of the above structural pressures. In parallel, pressures undoubtedly exist when it comes to the discipline of UX which is in itself recent, and constantly involving. Along with serious funding for collaboration on building more usable privacy software (which the Simply Secure project that I am associated with attempts to provide), we need a strategy to counter those systemic pressures to ensure the wide deployment of usable privacy technologies.
One of the rare delights of living and working as a security and privacy researcher in the UK is the bi-yearly schedule of surveillance legislation. Despite being often defeated, like the Phoenix, they only spring back to life at the slightest opportunity. This time round is no different: the PM has announced that secret all party negotiations reached consensus on an emergency bill enabling data retention (after it was deemed illiberal at a European level). It is meant to complete its journey through parliament this week, making an analysis all the more pressing.
First of all, it is important to appreciate that the bill fills the gap left by the traffic data retention directive’s (Directive 2006/24/EC) demise, when it was ruled invalid by the Court of Justice of the European Union. In theory, it should enable the same regime of data retention to continue without addressing in the slightest the civil liberties concerns that lead to the demise of the directive. There is however a problem: traffic data retention makes sense if it is widely implemented. There is no point in services in the UK retaining data, if US services or German services do not — the “bad guys” or anyone who values their privacy, would simply move their operations there.
Partly to deal with the possible lack of data retention abroad, the bill has provisions for the extraterritorial application of some powers, to force retention of interception of traffic. Which means that if you have some presence in the UK you may be asked nicely to retain data or provide wiretaps to UK law-enforcement or spooks. In fact even if you do not you may be asked anyways, and in extremis a public notice may be sufficient to force you to retain certain types of data. It is absolutely not clear to me what this means for foreign providers or technology companies.
The bill gives wide powers to the secretary of state to ask operators to retain any “relevant communications” data he/she wishes — where “relevant” points to the types of data mentioned in the Data Retention Directive (2009). They may impose specific conditions, and also decide to compensate operators for their trouble. One key limitation is that the retention period should not exceed 12 months.
For a blast from the past, a quick reminder of how “communications data” is defined in RIPA — which this bill piggy-backs on:
(4) In this Chapter “communications data” means any of the following—
(a) any traffic data comprised in or attached to a communication (whether by the sender or otherwise) for the purposes of any postal service or telecommunication system by means of which it is being or may be transmitted;
(b) any information which includes none of the contents of a communication (apart from any information falling within paragraph (a)) and is about the use made by any person—
(i) of any postal service or telecommunications service; or
(ii) in connection with the provision to or use by any person of any telecommunications service, of any part of a telecommunication system;
(c) any information not falling within paragraph (a) or (b) that is held or obtained, in relation to persons to whom he provides the service, by a person providing a postal service or telecommunications service.
Back in 2000 this definition was just about sane. At the time you could have email (content = body, comms = headers, in relation = subscriber information) or web requests to public resources, or IRC or usenet — none of which had much data on users. Today, what exactly is meant by category (c) “held or obtained, in relation to persons to whom he provides the service” is rather all encompassing. I am told this means “subscriber information”, ie. the credit card that pays for the email account. But, why not other data that is not explicitly the content of communications? What about your full facebook profile? It is after all the equivalent of “subscriber data”? Why not your OK Cupid profile, with the answers to all questions about your kinky preferences? They are input into a form like other subscriber data, and there is no question OK Cupid does provide a communication service. What is the limit? By perpetuating the fiction that contents of communications are protected by warrant, all other items are now susceptible game for access as communications data.
An interesting detail is that the bill somewhat changes the definition of a telecommunication service to include any service facilitating messaging (communications), or involved in the in “creation, management, or storage of communications transmitted, or that may be transmitted”. I assume this includes relays like Tor, but also cloud storage services that may contain emails, webmail, facebook chat, on-line game chat and the like. Interestingly it also includes all their infrastructure providers, transit providers, storage providers, etc. If a notice comes their way, they will have to help intercept.
21 June 2014
I read today a brief missive about the Russian government’s intent to replace US sourced CPUs, the heart of a modern computer, with domestically produced ones. This is presumably a move to protect their critical infrastructure from hardware back doors, that are difficult to detect or eliminate. This is a good opportunity to share my thoughts on what is at stake in the current debate about the NSA’s and GCHQ’s pervasive surveillance infrastructure, including historic attempts to prevent the development and widespread use of security and cryptology technologies, and their current active compromise of international communications and end-users.
A lot has been written about the right to privacy of American citizens, and to some extent now British subjects. In my opinion, this important domestic issue lies on the insignificant end of the global impact of the Snowden revelations. It is also the only issue that may be resolved through better oversight and stronger privacy guarantees in national laws (with the caveats relating to the “liberal fallacy“).
What is truly at stake is whether a small number of technologically-advanced countries, including the US and the UK, but also others with a domestic technology industry, should be in a position to absolutely dominate the “cyber-space” of smaller nations. About 20 years ago, this may have been a minor concern as few things were critically dependent on IP or mobile networks. Today, most social and economic interactions are mediated through such technologies, or could economically benefit from being so, if only due to “security and privacy concerns”.
19 December 2013
I had the opportunity to speak as part of a panel at the London Crypto Festival on November 30th 2013. My main point was that we have not one, but many ways to protect privacy in on-line services. Therefore consumers and citizens should demand from their service and software providers strong protections for their privacy, and come to expect them. The examples I used are from what I know best, namely smart metering privacy for which we have proposed in the past very credible protocols for privacy friendly billing and statistics.
My Crypto Party Presentation can be found here.