31 October 2013
I am joining a fantastic team of researchers: Angela Sasse heads the group and is doing pioneering work on human aspects of security; Jens Groth is an expert on cryptography, and zero knowledge; Nicolas Courtois is a leading cryptanalyst, and has hit the news many times in the past by demonstrating vulnerabilities in deployed systems. Alongside myself, Emiliano De Cristofaro, who works on applied cryptography and privacy, and David Pym, who has a dual interest in formal methods and economics of security, are also joining the group.
One of my first non-research tasks at UCL is to teach the Computer Security 1 course, which is a broad introduction to the basics of computer security. As a matter of principle, namely that the highest levels of quality of protection are achieved when computer security is discussed in public, I consider that the class to be a public event and open to anyone who would like to attend (subject to space restrictions). So if you are based in London, and would like access, just let me know.
3 April 2012
I was sent yesterday articles about the new round of policy proposals related to Internet surveillance and traffic data monitoring (bbc, guardian). What is depressing, but also really revealing, is how similar both the sought powers, as well as the arguments for those powers are to previous attempts, just a few years ago.
In their essence the powers seek to extend the ability of law enforcement agencies to have access to all Internet traffic data, a power that they largely already have when it come to conventional telecommunications, or email services. What are traffic data? Everything that you have not typed as a message: the identity and time of your facebook chats, your facebook likes, the log of the visits to all web-pages, the clicks on on-line polls, the location data your phone sends to access on-line location services, the times and places you were in the same chat room with your friends, your on-line friends, etc. Basically you can think of blanket traffic data retention and access as having a policeman following you around 24h a day / 7 days a week, and making notes about where you have been, what you have looked at, who you are talking to, what you are doing, where you are sleeping (and with whom), everything you bought, every political and trade union meeting you went to, … — but not actually hearning any of the conversation or seeing what you wrote. Traffic data provide an X-ray of your whole life, and the policy suggests they should be available to law enforcement and the intelligence services without any judicial oversight (only political review or police oversight).
As has been discussed many times before recording all this information is very expensive, unlikely to ever be totally comprehensive, technically nearly infeasible, and prone to over-collection and abuse. In fact a focus on “more data” detracts attention from careful on-the-ground case work, and turns policing into a computer game you cannot win. A lot of data is already sitting on databases, and can be accessed by police — in fact police are under-equipped, under trained and under resourced to make use of those investigative resources, let alone more raw data. The focus on ubiquitous surveillance also increases concerns about privacy, and the ideas that everything communicated can be used against a person puts a brake on the adoption of information technologies like computational clouds.
On that basis the conservative and libdem opposition has in the past argued against those measures. They are now in government so one would think that this debate in not necessary. Yet, the internet surveillance plans are back on the table! What is going on here?
This time around, I personally think, that the campaign against those measures should also seek to dislodge those in the permanent and non-elected institutions of the state, that keep bringing back on the table this policy. I find it very hard to believe that these near carbon copy proposal naturally “re-emerge” despite the prolonged public debaes against them. It is much more likely that the policy is simply repackaged and presented anew to every new minister by career civil servants, under permanent pressures from the agencies.
I find it troubling that there is a non-elected set of institutions of the state that have as a permanent policy agenda to undermine civil liberties, despite consistently losing the public debate when specific powers are considered by Parliament. I would call this political subversion. What is even more troubling is that the architects of such policies are hiding under the cloak of secrecy, making it impossible for those outside government and the security services to really call them to account. I appreciate that operational necessities mean that some aspects of the security services’ work must remain secret. Yet, I cannot see why the branch of the security services that aims to directly affect public policy, through pushing a permanent agenda of ubiquitous surveillance, should enjoy that privilege.
Today Theresa May, our home secretary “insisted only data – times, dates, numbers and addresses – not content would be accessible” and that “ordinary people” had nothing to fear. Requesting such information about the communications between the home office, civil servants, and the members of the security services that advised them about this policy may change her opinion about how sensitive such information is. In fact, she may discover that providing such a map of the policy campaign and network of support, gives the opponents of the policy an undeniable advantage. I am very much looking forward to turning my automated social network analysis tools on their call graphs and email logs, and providing all results and intuitions to journalists.
23 October 2011
ACM CCS 2011 just took place this week, so I decided to give a bit more insight into a few processes the program chairs used behind the scenes to manage what is the largest security conference to date. Vitaly Shmatikov (CCS11 Program co-chair) has already given a short introduction this year’s process: we received 429 full papers that we had to review with 54 PC members. While no hard target was set at the start of the process we expected to accept around the 60 papers that are now forming the program of CCS 2011. These are my views and opinions on the process, and they are not automatically shared by anyone else, including Vitaly.
Note: This post describes automated statistics we used to interpret scores of reviews to guide us in assigning more reviews or guiding discussion. All final acceptance decision were taken the old fashioned way through qualitative assesment of the reviews and discussion in the PC.
28 July 2011
Privacy-friendly Aggregation for the Smart-grid
Klaus Kursawe (Radboud Universiteit Nijmegen) and George Danezis and Markulf Kohlweiss (Microsoft Research)
Privacy in for smart electricity provision seems to be a rising topic, and this year there is a whole session on it at PETS 2011. The first paper (one which I am a coauthor) looks at the problem of gathering aggregate data from groups of smart meters, without allowing any third party to get the the individual measurements. This can be applied as a PET to solve real-world problems such as fraud detection, leakage detection, load estimates, demand response, weather prediction — all of which only require aggregate data (sometimes in real time), not individual measurements.
The key challenge to providing a private aggregation protocols are the specific constraints of smart meters. They are cheap devices, with modest resources, hardly any bandwidth, no ability to communicate, etc. Two specific protocols are presented: the first one allows to compare the sum of meter readings with a reference number (maybe measured from a feeder meter). This protocol allows for fancy proofs of correctness, but it slow in terms of computation and bandwidth (it requires public key operations for each reading). The second protocol is extremely fast and has no communication overhead. In both cases a pragmatic approach to the threat model is followed: we assume that the utilities will be honestly defining groups of meters and facilitating the key management protocol — for the second protocol there is no overhead of public key operations after the initial key setup.
The key highlight from this work is not as much its technical depth (tricks with DC networks and hash function that would not surprise any PETS regular). What is interesting is that the protocols were designed for a real industrial application and now fully integrated on real smart meters and their communication protocols in collaboration with our collaborators at Elster.
Plug-in privacy for Smart Metering billing
Marek Jawurek, Martin Johns, and Florian Kerschbaum (SAP Research)
This second paper looks at the problem of billing for fine-grained time of use tariffs — their energy consumption at different times costs a different rate per unit. This is a very important topic, as correct billing and time of use tariffs are a key driver of fine-grained data collection through smart meters — if we can do billing privately then maybe less personal information may be collected.
Technically the protocols proposed are based on the homomorphic properties of Pedersen commitments: readings are commitments, and you can use multiplication by a constant and addition to compute the bill, and (most importantly) prove that it is correct. The system model is that the meter outputs signed commitments of readings, a privacy component computes the bill and proofs of correctness, and those are sent to the supplier for verification (and printing the bills!).
This is the core of a nice solution for the basic billing case (which is likely to be the common one in smart grids). We have shown in related work that the protocol can be further improved to have zero communication overhead. Since it avoids expensive zero-knowledge proofs it is fast for its proofs and verification. It also provides the basic infrastructure to support further more expressive billing policies and general computations.
27 July 2011
An Accurate System-Wide Anonymity Metric for Probabilistic Attacks
Rajiv Bagai, Huabo Lu, Rong Li, and Bin Tang (Wichita State University)
Traditional entropy based anonymity metrics look at the security of single messages. But how can you quantify the security provided by a whole system? The first paper in this session looks at a system-wide definition of anonymity by “counting” the possible number of matchings between inputs and outputs of an anonymity system. Furthermore, the metric extends to the probabilities over perfect matchings to express subtleties of modern anonymity systems. The paper first of all provides a thorough critique of the metric by Edman et al. (there was also previous work on this metric by the Leuven crew).
In a nutshell the proposed system-wide metric associates a probability to each possible matching, and computes the entropy over this distribution as a measure of anonymity (normalized). The choice of shanon entropy to summarise quality can be changed to min-entropy or other (which is very cool!) One key issue with system-wide metrics is that how they express the properties that any individual message receives. Paul Syverson points out that these type of metrics express more the anonymity capacity of a system — namely how much anonymity the system could provide as a whole. The question of how this capacity for protection is distributed across users may need an extension to those metrics. For anyone who would like to extend metrics to capture this aspect, this paper is a very solid foundation.
DefenestraTor: Throwing out Windows in Tor
Mashael AlSabah, Kevin Bauer and Ian Goldberg (University of Waterloo), Dirk Grunwald (University of Colorado), and Damon McCoy, Stefan Savage, and Geoffrey Voelker (University of California-San Diego)
This paper looks at performance issues within the Tor network, and in particular the effects of the congestion and flow control protocols. Tor implements simple end-to-end flow control mechanism at the granularity of circuits and streams. It turns out that the implemented window based flow control has detrimental effects on performance: it does not protect intermediate routers (who are likely to be the congested ones) from congestion.
Two approaches were followed to solve this problem. First, a smaller window could be used — but this would not solve the problem; or windows can be computed dynamically. Second, the N23 congestion control protocol (used for ATM) could be used over Tor. N23 is simple and guarantees no packets are dropped, while implementing a steady flow of data. Its a credit based system, where packets are sent when credits are available (and consume them), and credits are sent up the network when bandwidth is available.
The evaluation was done under realistic conditions on ExperimenTor. The improvement over the current Tor strategy is significant when it comes to the time to get the first byte, but the time to complete larger (bulk) downloads do suffer (which is part of the point of the protocol).
I am really happy to see research on the intersection of traditional networking and anonymous communications. I have never heard of N23 before (shame on me!), and it seems that it is a good fit for the problem of congestion in anonymity networks (where reliability is not an issue when TCP is used).
Privacy Implications of Performance-Based Peer Selection by Onion Routers: A Real-World Case Study using I2P
Michael Herrmann and Christian Grothoff (Technische Universität München)
This is an attack paper on the I2P network, and in particular the performance based peer selection. It combines a denial-of-service attack to influence the selection of peers within the network, and force a victim to choose corrupt servers.
This is a cute attack that combines denial-of-service, traffic analysis for confirmation you are on the same circuit, and interactions with an infrastructure to attack. This is a very good reminder that anonymity engineering is not simply systems’ work. Every design choice about performance can affect security in dramatic ways. The evaluation was also very sensitive to protecting users: the researchers tried their attack on the real network, but targeted their own circuits (I still want to see details to make sure no other users were affected).
Tor too implements circuit selection on the basis of performance — I am wondering to what extent similar ideas could be applied there …
27 July 2011
Quantifying Location Privacy: The Case of Sporadic Location Exposure
Reza Shokri and George Theodorakopoulos (EPFL), George Danezis (Microsoft Research), and Jean-Pierre Hubaux and Jean-Yves Le Boudec (EPFL)
This work evaluates the privacy of using location-based services sporadically using a set of location privacy mechanisms. Sporadic services include those that require location infrequently, rather than continuously (think of restaurant suggestions rather than relaying real-time GPS streams). The key novelty of the approach is that the model of location exposure, as well as privacy protection is very general. It encompasses anonymization, generalization and obfuscation of location, use of fake traffic and suppression of location. In turn the analysis relies on advanced models of location and mobility (based on markov chains) and is based on Bayesian inference. The evaluation of different location privacy techniques is done on real-world traces from SF taxis.
I am one of the authors of this work, so of course I think it is awesome! More seriously, it is one of the fist works to combine under a common framework a multitude of location privacy mechanisms, and provide a common evaluation framework for them, to quantify the degree of protection they offer relatively to each other for different adversaries. It is also one of the first systematic applications of Bayesian inference to analyze location privacy — extending the inference paradigm beyond the analysis of network anonymity systems.
Of course this is not the last word. Only a subset of protection techniques and combination of techniques were look at, and other protection mechanisms can be integrated and evaluated in the same framework (the tracing model and threat model can be unchanged). Secondly, the analysis itself may be augmented with side-information — be it commercial transactions or traces of network traffic — that may be giving some information about location, to increase the capabilities of the adversary (or make them more realistic). The model we use, based on markov chains, has the benefit of giving analytically tractable results, but numerical techniques may be used to extend it to be more true to real-life attacks.
Privacy in Mobile Computing for Location-Sharing-Based Services
Igor Bilogrevic and Murtuza Jadliwala (EPFL), Kubra Kalkan (Sabanci University), Jean-Pierre Hubaux (EPFL), and Imad Aad (Nokia)
This paper looks at applications where users need to share their location. For example, two users may want to find out if they are close to each other or where they should meet in order to share a taxi ride. Yet, those users do not want to leak any of their location information to the other users or the service provider. More specifically two users specify a set of ranked prefered location they could meet and the system needs to determine on of those fairly without revealing the current location or other preferences (except the one chosen to meet). This is called the fair rendez-vous problem.
The key contribution of this work is to show that this problem can be set with a set of concrete cryptographic protocols. It also presents an implementation of these algorithms on a real mobile phone to show that it is practical. The cryptographic computations are based on homomorphic encryption schemes as well as interactions with the service (to do multiplication that is not possible with Paillier). The implementation on a mobile phone takes a few seconds on the client and the server, and is paralelizable in the number of users. Untypically, the authors also did a user study: users were asked what their concerns were, and after using the application of the phone they were asked how usable it was, and whether they appriciated the privacy provided by the application.
This is a really nice example of a privacy application, that applies advanced crypto, but also evaluates it on a real platform for performance as well as users’ reaction to it. The obvious extensions to this work would be to generalize it to more complex rendez-vous protocols, as well as other location sharing applications. It is good to see that modern mobile devices can do plenty of crypto in a few seconds, so I am very hopeful we will see more work in this field.
On The Practicality of UHF RFID Fingerprinting: How Real is the RFID Tracking Problem?
Davide Zanetti, Pascal Sachs, and Srdjan Capkun (ETH Zurich)
This paper looks that UHF tags — they are the dumb tags that can be read at about 2m that are attached to things you buy to facilitate stock management or customer aftercare. Interestingly this study looks at how identifiable the tags are at the physical layer, not using the actual tag ID! Therefore these techniques may bypass any privacy protection that attempt to prevent access to the tag ID. It turns our that one can build a unique and reliable ID for a tag from its physical characteristics that can be used to trace people as they move around.
What is new about this work is that the focus was on practicality and cost of extracting a reliable fingerprint (previous approaches relied on expensive equipment and laboratory conditions). The solution was implemented using a cheap software radio (USRP2 device + PC).
I am not quite sure what to conclude from the evaluation on the quality of the fingerprint. It seems that an adversary can place tags within one of 83 to 100 groups. Is this really a good results or not? I guess it depends on the application and the density of tags. Of course if more than one tag is carried, then the adversary could combine fingerprints to identify individuals more easily — if you carry 5 tags you have a 20 bit IDs. Interestingly, there is extensive evaluation of the stability of the tag to temperature and mobility — it turns out that these factors do affect the quality of the fingerprint and further reduce the effective number of unique IDs that can be extracted (down to about 49 classes).
It would be interesting to combine this attack vector with the ideas from the first paper (pretending that the short physical IDs are a version of a privacy protection system) to evaluate the effectivness of tracing a set of individual throughout town.
27 July 2011
Andreas Pfitzmann has sadly passed away last year, and a special pannel session is taking place right now at PETS 2011 commemorating his work on anonymous communications and privacy. Andreas’ technical contribution span about 30 years, and as such he can be considered a founding father of the field of anonymous communications. His work in educating policy makers, and advocating privacy in the public sphere had a profound impact on German technology policy.
The pannel includes a short excerpt from an interview with Andreas, as well as recorded contributions, by collaborators (Michael Waidner and Marit Hansen), former students (Anna Krasnova and Hannes Federrath) and people in the PET community (Paul Syverson and Caspar Bowden).
27 July 2011
I am currently sitting at the PETS 2011 symposium in Waterloo, CA. I will be blogging about selected papers (depending on the sessions I attend) over the next couple of days — authors and other participants are welcome to comment!
The first session is about data mining and privacy.
“How Unique and Traceable are Usernames?”
Daniele Perito, Claude Castelluccia, Mohamed Ali Kaafar, and Pere Manils (INRIA)
The first paper looks at the identifiably of on-line usernames. The authors looked at user names from different sites and assess the extent to which they can be linked together, as well as link them to a real person. Interestingly they used Google Profiles as ground truth, since they allow users to provide links to other accounts. First they assess the uniqueness of pseudonyms based on a probabilistic model: a k-th order markov chain is used to compute the probability of each pseudonym, and its information content (i.e. -log_2 P(username)). The authors show that most of the usernames observed have “high entropy” and should therefore be linkable.
The second phase of the analysis looks at usernames from different services, and attempts to link them even given small modifications to the name. The key dataset used was 300K google profiles, that list (sometimes — they used 10K tuples of usernames) other accounts as well. They then show that the Levenshtein distance (i.e. edit distance) of usernames from the same person is small compared to the distance of two random user names. A classifier is built, based on a threshold of the probabilistic Levenshtein distance, to assess whether a pair of usernames belongs to the same or a different user. The results seem good: about 50% of usernames are linkable with no mistakes.
So what are the interesting avenues for future work here? First, the analysis used is a simple threshold on the edit distance metric. It would be surprising if more advanced classifiers could not be applied to the task. I would definitely try to use random forests for the same task. Second, the technique can be used for good not evil: as users try to migrate between services, they need an effective way to find their contacts — maybe the proposed techniques can help with that.
“Text Classification for Data Loss Prevention” (any public PDF?)
Michael Hart (Symantec Research Labs), Pratyusa Manadhata (HP Labs), and Rob Johnson (Stony Brook University)
The paper looks at the automatic classification of documents as sensitive or not. This is to assist “data loss prevention” systems, that raise an alarm when personal data is about to be leaked (i.e. when it is about to be emailed or stored on-line — mostly by mistake). Traditionally DLP try to describe what is confidential through a set of simple rules, that are not expressive enough to describe and find what is confidential — thus the authors present a machine learning approach to automatically classify documents using training data as sensitive or not. As with all ML techniques there is a fear of mistakes: the technique described tries to minimise errors when it comes to classifying company media (ie. public documents) and internet documents, to prevent the system getting on the way of day to day business activities.
The results were rather interesting: the first SVN classifier looked at unigrams with binary weights to classify documents. Yet, it first had a very high rate of false positives for public documents. It seems the classifiers also had a tendency to classify documents as “secret”. A first solution was to supplement the training set with public documents (from wikipedia), to best described “what is public”. Second, the classifier was tweaked to (in a rather mysterious way to me) by “pushing the decision boundary closer to the true negative”. A further step does 3-category classification as secret, public and non-enterprise, rather than just secret and not-secret.
Overall: They manage to get the false positive / false negative rate down to less than 2%-3% on the largest datasets evaluated. That is nice. The downside of the approach, is common to most work that I have seen using SVNs. It requires a lot of manual tweaking, and further it does not really make much sense — it is possible to evaluate how well the technique performs on a test corpus, but difficult to tell why it works, or what makes it good or better than other approaches. As a resut, even early positive resutls should be considered as preliminary until more extensive evaluation is done (more like medicine rather than engineering). I would personally like to see a probabilistic model based classifier on similar features that integrates the two-step classification process into one model, to really understand what is going on — but then I tend to have a Baysian bias.
P3CA: Private Anomaly Detection Across ISP Networks
Shishir Nagaraja (IIIT Delhi) and Virajith Jalaparti, Matthew Caesar, and Nikita Borisov (University of Illinois at Urbana-Champaign)
The final paper in the session looks at privacy preserving intrusion detection to enable cooperation between internet service providers. ISPs would like to pool data from their networks to detect attacks: either because the volume of communications is abnormal at certain times, or because some frequency component is odd. Cooperation between multiple ISPs is important, but this cooperation should not leak volumes of traffic at each IPS since they are a commercial secret.
Technically, privacy of computations is achieved by using two semi-trusted entities, a coordinator and key holder. All ISPs encrypt their traffic under an additive homomorphic scheme (Paillier) under the keyholder key, and send it to the coordinator. The coordinator is using the key-holder as an oracle to perform a PCA computation to output the top-n eighen vectors and values of traffic. The cryptographic techniques are quite but standard, and involve doing additions, subtraction, multiplication, comparison and normalization of matrices privately though a joint private two-party computation.
Surprisingly, the performance of the scheme is quite good! Using a small cluster, can process a few tens of time slots from hundresds of different ISPs in tens of minutes. A further incremental algorithm allows on-line computations of eighen vector/value pairs in seconds — making real-time use of the privacy preserving algorithm possible (5 minutes of updates takes about 10 seconds to process).
This is a surprising result: my intuition would be that the matrix multiplication would make the approach impractically slow. I would be quite interested to compare the implementation and algorithm used here with a general MPC compiler (under the same honest-but-curious model).
6 July 2011
Shishir Nagaraja has pointed out that our Drac anonymity system is not the first one to consider an anonymity network overlayed on a social network. The performance versus security of routing messages over a social network was already considered in his work entitled ‘anonymity in the wild’.
This is important prior work and we should have cited it properly. It presents an analysis of an anonymity provided by different synthetic social network topologies, as well as real-world data from LiveJournal.
30 March 2011
My team at Microsoft research has spent the past 6 months grappling with the problem of privacy in next generation energy systems. In parallel with the good honest scientific work I also participated in the UK government consultation on smart metering, in writing and in person, specifically on the issue of privacy. Its conclusions have finally been made public (see DECC’s site and Ofgem’s detailed responses).
First, what is the problem? Smart-meters are to be fitting in most homes, and they provide facilities for recording fine-grained readings of energy consumption. These are to be used for time of use billing, energy advice, the backend settlement process, financial projections of suppliers, fraud detection, customer service, and network management. The problem is that these readings are also personal data, and leak information about the occupancy of households, devices used, habits, etc. So here we have a classic privacy dilemma: where to strike the balance between the social value of sharing data (even mandating such sharing) versus the intrusion to home life?
Or do we? As it is often the case when privacy is framed as a balance, what is ignored is that we can use technology to achieve both privacy and extract value from the data. In fact we show no balancing act is necessary. We designed a host of privacy technologies to fulfill the needs of the energy industry (even the rather exotic ones) while preserving extremely high levels of privacy and user control. Lets look at them in detail:
- We developed a set of protocols to perform computation on private data while maintaining a high degree of integrity and availability. This allows customers to calculate their bills, provide indicators of consumed energy value to be used in settlement, routing demand response requests, and do profiling to support network operation or even marketing. Our framework guarantees that the computations only leak their results to third parties, and also that those results are in fact derived from the real meter readings. The raw meter readings are not necessarily shared, but can be used locally on any user client to offer a rich experience — i.e. pretty graphs of consumption and comparison with their neighbours. A non technical overview is available as a white paper, a technical introduction for meter manufacturers is provided, and a preliminary technical report with all the crypto is also online.
- Sometimes it is important to aggregate information from multiple meters without revealing anything about individual readings. The traditional approach has been to give all readings to a trusted third-party that performs the aggregation and only publishes the sum. We show that a set of meters can in fact perform the aggregation without the need for a trusted party. This is simple, efficient and compact — the computations can be done inside the trusted meter or outside along with cryptographic verification. All details are available in our technical report on aggregation.
- Some smart-meters may be deployed in extremely high-security settings. In such places leaking even the final bill or statistics aggregated over time may leak information and a positive guarantee that the information leakage is limited might be necessary. We developed techniques inspired from differential privacy to inject noise to aggregate readings that guarantee any specific time period consumption is masked. Further more we allow customers to recuperate the bulk of the costs though an oblivious cryptographic rebate system. Our technical report on differential privacy and rebates in metering is available.
- Finally proving that protocols are correct is not sufficient, so we explore options for proving actual implementation of the protocols are in fact providing the necessary security and privacy properties. A report on the certified implementation of a variants of the proposed protocols using refinement types is also available.
The project web-page on privacy in metering links to all those any more.
So much about the science, what about the engagement with government. On the positive side, our rather limited goal has been achieved: we wanted to put privacy technologies, that provide solutions beyond the dilemmas and balance between privacy and value, on the map. The government response to the consultation takes note, in a limited way, of the potential use of privacy technologies. On page 10 it shyly mentions that:
“2.18. Work is in process to understand the options for aggregating or anonymising smart metering data and whether it is necessary for the data to be accessed by a party that carries out the data minimisation. Privacy enhancing technology can potentially enable anonymised or aggregated data to be provided without any party having access to the personal data itself. The programme will work with industry and academics in order to explore the applicability of privacy enhancing technologies within the smart metering system.”
This is actually a rather fair representation of the capabilities of the technology, even if it is presented as a far away goal, rather than the concrete protocols we have proved correct and the implementations we have built.
Paragraph 2.18 mentioning privacy technology is a ray of light amidst an otherwise ambivalent government response. On the up side it recognizes energy consumption as private data from the onset, it mandates meters to hold 13 months of consumption and provide local access to it, it defines narrowly the scope of data that can be gathered without explicit consent and puts them under the data protection regime. On the down side there is confused language about what constitutes personal data (2.17), and the final technical solution involves collecting data in clear through a centralized systems (the glorious DCC) and protecting it using access control — a far cry from what we know possible in terms of technical privacy protection.
The metering privacy geeks (legal & technical) might also find other interesting nuggets in this report:
- It mentions privacy-by-design, but without support for privacy technologies (except a mention of aggregation in 2.14). This is a damaging trend set by the Ontario report on privacy in the smart grid that takes a purely management approach to privacy in the local smart grid deployment. A response to this trend is provided by Prof. Claudia Diaz and her colleagues that highlights the technical protections necessary to engineer privacy-by-design. This is only the start of this tussle.
- The report seems to suggest that personal data is not personal if it is not readily identifiable by the data controller (sect. 2.17 and 3.7). This is the classic argument of “what is de-identified personal data”. Does it mean the data controller cannot identify it, or anyone in the world? It seems the government is as confused as everyone else on this matter.
- The key outcome of the consultation is that the energy industry needs some data to perform “regulated duties”. This concept was present in the initial consultation, but funnily enough there was no description of that those duties were. It transpired in meetings that Ofgem was not in fact clear about what they were, and a large part of the consultation centered around fleshing those out. A list of those duties is available in Appendix 3 of the report, and is probably welcome by all (a similar list is available in the NIST privacy reports).
- So (in 3.15) the government concedes that industry must have access to the data necessary to perform its regulated duties by default, yet this data should be subject to the DPA requirements (3.16 for example specifically calls principle 5 — that the data should not be kept longer than necessary). Well that is a mine field: it is clear that the data is collected for a specified purpose (principle 2). If the other principles are also applied it means that it should not be used without explicit consent for other purposes (*cough*added value services*cough*) and furthermore it should not be excessive for the stated purpose. Well here we are: our technical reports offer ways in which most of the stated purposes in appendix 3 could be fulfilled without collecting the data. Is this a contradiction? Not automatically. The government’s view is clearly that our proposed protocols are not yet ready for prime time — of course as these technologies become better known and deployed this objection will evaporate. Will the data minimization requirement then mandate the use of privacy technologies? This is a rhetorical question at the moment.
- It is interesting to note that the restrictions associated with limiting the automatic collection of data by suppliers was possibly set in place on the grounds of market competition rather than privacy per-se (section 3.32). Automatic collection by suppliers would put them in an advantageous position vis-a-vis third-party providers of value added services. This is an open issue (3.36).
- The government is keen for a local repository of consumption data in the meter (4.6) and the use of geeky toys to visualize it (4.12). This is the setting in which our solutions enable strong privacy guarantees. That is positive, if only half-way.
In conclusion, the debate around privacy in metering has been informed by consumer concerns, privacy concerns, industry needs and technology alternatives. They are all represented in the government response. Yet the final solution is rather conservative: it relies on a centralised conduit for personal information protected by access control layers and management layers. It is far from what we know possible with privacy technologies. The argument today is that those technologies are too new — which is questionable given how quickly IT inovations are brought to market. This argument will lose its potency in the long term if we keep developping and deploying privacy firendly solutions.