The UK goes every ten years through a national census, where every household is called to fill in details about their demographics, habits, travel and income. The next one will be the UK 2011 census.

The office for national statistics has a statutory duty to ensure that the data released from this census cannot be used to identify any individual or to infer any of unknown attribute. Techniques for doing so are called statistical disclosure control, and have been the subject of intense study for the last 40 years at least. One could never have guessed by reading the documents on confidentiality for the next UK census.

To make a long story short: the ONS never considered modern well defined notions of privacy, it lacks a reliable evaluation framework to establish the degree of risk of different methods (let alone utility), and has proposed disclosure control measures that fall rather short of the state of the art.

Moving households around (a bit)

The consultation is not totally over yet, but the current favorite after two rounds of evaluation seems to be a technique called “Record Swapping”. How does it work? The technique takes the database of all responses to the census and outputs another database, that is sufficiently different to avoid identification and inference. Record swapping first categorises all records by the household size, sex, broad age, and hard-to-count variables. Then it selects 2-20% of the records, and each of them are paired with a record from the same category. Then the geographical data of each pair of records (yes, right, only the geographical data) are swapped.

This procedure has the effect to disperse geographically the population a bit so that, it is not possible to know whether single cells in tables are indeed providing information about an individual in a region or, whether they are the product of a swap from a different region. The advantage is that the totals are the same (since swapping things around is invariant to addition), the swaps are with “similar” households, and the procedure is simple to implement.

This is in-line with the definition of privacy of the census office, namely that: 

“The Registrars General concluded that the Code of Practice statement can be met in relation to census outputs if no statistics are produced that allow the identification of an individual (or information about an individual) with a high degree of confidence. The Registrars General consider that, as long as there has been systematic perturbation of the data, the guarantee in the Code of Practice would be met.”

Problems with “Record Swapping”

So far a whole process has been followed to evaluate a list of proposed disclosure control measures, present a methodolody to evaluate them, shortlist some, and perform more in-depth research about their utility and privacy. There is a lot of repetition in these documents, a few ad-hoc indicators of quality and privacy, and no security analysis what-so-ever about inference attacks on the proposed schemes. The subject of ” disclosure by differencing” is left as a suggestion for future work in the latest interim report, while the only method left on the list is Record Swapping, as well as ABS, that has apparently not been tested yet at all.

Why is that a problem? Records include many other potentially identifying fields aside from location. Since the rest of the record stand as it is, and is aggregated into tables, with a secret small cell adjustment technique, we cannot really be sure at all that there are no re-identification attacks. (Apparently revealing the details of the technique cannot be divulged for confidentiality reasons, violating even the most basic principle of security engineering! See page 3).

The utility measures used to assess how acceptable these disclosure control measures will be to data users (Shlomo et al.), are themselves very simplistic and do not offer very tight bounds on possible errors but I will leave this matter for the statisticians to blog about.

To make the problem worse, this time the ONS, is seriously thinking of allowing data users to submit their own queries to the database of statistics. The queries are not likely to be full SQL any time soon, but tables on 3 categories (called cubes) are likely to be allowed. This leaves wide open quite a range of attacks in the literature on inference in statistical databases.

At this point there is absolutely no evidence that the disclosure control scheme is actually secure, which in security engineering means that it is probably not.

How did we get to this situation?

It seems the bulk of the work on disclosure control has been done by the ONS, in conjunction with researchers from the University of Southampton. None of the authors of any of the evaluations has a substancial research experience in privacy technology or theoretical computer security that deals with these privacy matters in a systematic way.

What is revealing is the fact that the most relevant related work is never mentioned. It includes:

  • The work of Denning on trackersand inference in statistical databases (1980). Instead the archaic term “differencing” is used.
  • The work of Sweeney and Samarati on linkage attacks and k-anonymity (1997).
  • The work of Dwork on Differential Privacy (2007), which is the most current and strongest definition of privacy for statistical databases.

These works show repeatedly that ad-hoc inference control measures, that only aim to suppress a handful of known and obvious attacks, are systematically bypassed.

Dwork in her work on Differential Privacy (that won the 2009 year’s PET Award) provides clear arguments on why simpler ad-hoc techniques cannot provide the same guarantee of privacy: their results can be aggregated with side information known to the adversary to facilitate inference. Differential privacy on the other hand guarantees that the results of a query to the database, or published table, reveals no more information when composed with other such queries or any side information. 

This is a hot topic in research today, and all the details may not be ready for a census in 2 years time. This does not justify the ONS’s ignorance of this field.

The annual reports from the Chief Surveillance Commissioner (2008-2009) and the Interception of Communications Commissioner (2008)just came out. They contain some interesting statistics, buried in the mist of boring self-congratulations on how wonderful the surveillance regime is in the UK.

First of all we get a bit of an idea on how, and how often, the RIPA part III powers to compel decryption or request keys, are to be used. It seems, from both reports, that any such request has to be approved by NTAC first, before anyone is served. Then a judge rubber-stamps the request that is served to an individual. These individual comply or go to jail, the theory goes. In the period 2008-2009:

  • NTAC approved 26 applications to serve a decryption notice (and declined 1).
  • A judge approved 17 notices (and zero were declined).
  • 15 notices were served.
  • 11 individuals failed to comply (the assumption is that 4 of them complied)
  • 7 individuals were charged as a result of their failure to comply
  • 2 individuals were convicted

What does all this add to? About 10% or less conviction rate for failing to comply with a notice (2 / 22, assuming 4 complied). It would of course be of interest to find out if any of those who complied were charged and convicted with any offences, or whether the requests are just keeping honest people honest.

It is a real pity more qualitative information is not provided about the specific cases that reached court, aside the fact that the powers were used to investigate counter terrorism, child indecency and domestic extremism. Finding how each case went would be quite worth while.

The appendix B of the Surveillance Commissioner has a rough breakdown of the authorisations for property interference as well as surveillance, by types of offence investigated. The trends, and changes, between this period (2008-2009) and the previous period (2007-2008) are very interesting, and again totally unexplained in the text of the report. Some highlights:

  • Most of the authorisations for property interference are related to drugs offenses (63% in 2008-2009, and 60% in 2007-2008). That seems pretty stable, and is the single biggest category by an order of magnitude.
  • We used to have a terrorism problem, with about 4.8% of property interference related to it in 2007-2008. It seems we have ran out of terrorism to investigate in 2008-2009, and now it only accounts for 0.6% of all cases of property interference. That is nearly an order of magnitude reduction.
  • While terrorism is down, conspiracy investigations are up: 2.8% of authorisations related to it in 2008-2009, versus only 1.5% for the previous year. That may not be unrelated to the shift of looking at “domestic terrorism”, with the usual silly “conspiracy to cause a nuisance” charges.
  • It is unclear where child indicency fits in any of these categories, despite requiring some property interference, presumably to raid people and seize their computers.

Similar trends are observed when it comes to intrusive surveillance authorised under RIPA Part II. Drugs are biger than anything else, terrorism is no more a pretext for surveillance (1 case!) and conspiracy is becoming popular with a serious increase of surveillance. The investigations of burglaries and robberies using surveillance and property interference is also up. About 2681 property interference authorisations were issued, and 384 intrusive surveilance authorisations were served in 2008-2009. (There were also 16118 directed surveillance authorisations.)

The interception of communication figures look relatively similar. In 2008 about the same number of warrants were issued or active under RIPA (2599 RIPA warrants) for intercepting communications. The fact that the numbers are of the same order of magnitude may suggest that the different authorisations are used as a “bundle” for particular cases. It might also be just a coincidence.

There are no specific figures about access to traffic data (under traffic data retention regimes) but it is estimated that out of all requests 80% concern subscriber information, e.g. who is behind this telephone number? This is in-line with previous statistics.

What about CHIS, the euphemism for Covert Human Intelligence Source, or more commonly known as a “snitch“? There were 3722 CHIS at the end of March 2009, and 4278 recruited in the year. This means that on average each CHIS is used for a bit less than a year. The variance can of course be significant.

Overall the pictured offered is that the UK is a really quiet place. With about 60 Million people and only about 3000-4000 cases requiring surveillance authorisations, let alone the laughable 26 applications to coerce decryption, there seems to be more rhetoric about serious crime, than there is serious crime. Of course there statistics exclude warrants obtained by MI5 and SIS, who are subject to a different oversight body, that is much less keen on publishing statistics. It is not unlikely that a lot of the terrorism and political crimes are investigated there.

The ACLU and the BBC have today posted the first memo, dated 1 August 2002, authorising the use of torture by the CIA against Abu Zubaydah, described as “one of the highest ranking members ofAl Qaeda”. Interestingly one of the enablers for passing into an “increased pressure phase” (you have to love these euphemisms) comes down to traffic analysis, as this passage suggests:

Snippet mentioning suspicious chatter

According to the document “intelligence indicates that there is currently a level of `chatter’ equal to that which preceded the September 11 attacks”. It is not comforting at all to know that such automatic processing, as well as subjective interpretation, can be used to start torturing people, in the absence of any other concrete evidence.

Update: Steven Murdoch points to the Washington Post article clarifying the role of the Abu Zubaida as being nowhere near as important as initially assumed. The article states that “Abu Zubaida was not even an official member of al-Qaeda”. Worth reading in its entirety.

There is a tendency amongst privacy advocates in the UK to focus on mistakes, or false positives, of ubiquitous surveillance, as well as small scale “disproportionate” uses of surveillance. These two are the key arguments used to fend off plans to increase the level of data collection. 

In the first case the argument is that perfectly honest people might be mistaken for crooks because of the imperfect view that any data collection system provides the authorities. Any automated decisions, the argument goes, will inevitably flag up Innocent people, while miss the sought targets, since they will be using an array of evasion tactics to foil it. In its essence, this first criticism is true, but can easily be countered by a good oversight mechanism, including human judgement in the loop, as well as pointing out that the bad guys will never have perfect discipline in implementing counter surveillance measures, and if they do it will be at a great cost. Needless to say the false positive / false negative argument has not been very successful, even though it is a good one.

The second argument is based on proportionality: once surveillance powers are in place for one purpose, such as the prevention of serious crime or terrorism, they will inevitably be used for other unforeseen and disproportionate aims. The key recent example is how local UK authorities are using directed surveilance powers to prevent littering and dog fouling. Similar fears have been expressed about traffic data retention that could be used as part of civil cases, or simply seized for any crime what so ever using established evidence collection laws. Again, this argument is valid but a good oversignt mechanism can take care of those cases, at least in theory.

The reason these arguments are first to be used, as well as ineffective, is that they start from the premise that institutionally those performing the surveillance are “the good guys”, and their aim is to catch “the bad guys” to protect the public. Sure, in the process mistakes happen, but they are in good faith and are rectified since all the good people are on the same side after all. “Bad apples” misusing their surveillance powers will be weeded out, since institutionally the context in which they use these powers is benevolent, and devoid of malice. On can easily see why privacy advocates in the UK have found it easy to use this assumption, since they mostly lobby politicians and have a close relationship with law enforcement as well as industry, who while admitting isolated mistakes will never admit a systematic privacy problem, let alone systematic malicious use of surveillance powers.

The tide is turning on this argument. In the recent months we have witnessed direct interference with the elected political process by the police, namely the raid on the Parliament office of MP Damian Green. As The Register reports “Green’s homes and offices were searched on 27 November following his arrest, on suspicion of leaking embarrassing informationfrom the Home Office.” The information was simply politically embarrassing, not sensitive or national security related. It seem this incident has challenged in the mainstream that those in charge of surveillance will simply act in the public interest, and other cases of mass political surveillance have since seen the light:

These are no more isolated abuses, but systematic operations running for many years, and supported at the highest level of management of both organizations. In its editorial the Guardianput its finger on the key argument against surveillance powers by finally saying out loud: “today’s revelations underline the perils surveillance represent for democracy [...]“. These worries are now being echoed at the highest echelons of the political system, as The Register reports regarding the Policing complaints at the recent Climate Camp:

“The problem with incidents of this kind, according to Norman Baker MP, who addressed the meeting on the Climate Camp protest yesterday is that they look suspiciously like police-made law and go hand in hand with the politicisation of the police. He said: “The IPCC exist to investigate allegations of individual misconduct by Police Officers. They are not there to investigate systemic abuses of power, which is what seem to be going on in cases such as the Climate Camp.”

“I am a strong supporter of the Police. But there looks increasingly to be a need for additional oversight into the ways in which they interpret the law.”

Lords recommend PETs

6 February 2009

The house of Lords Constitution Committeehas just published a report on Surveillance: Citizens and the State as well as the evidence they heard. As part of their recommendations they push Privacy enhancing Technologies to be part of the procurement process of government projects. In particular they say:

485. We recommend that the Government review their procurement processes so as to incorporate design solutions that include privacy-enhancing technologies in new or planned data gathering and processing systems. (paragraph 349)

They also push, albeit in an indirect way, for privacy enhanced identification schemes and ID cards, citing the example of Austria. This is basically a recommendation to implement selective disclosure credential technologies:

478. We recommend that the Government’s development of identification systems should give priority to citizen-oriented considerations. (paragraph 268)

Which refers to:

268. The Information Commissioner’s Office (ICO) drew attention to the use in Austria of a system of identification numbers that allows access to information in different databases “without the need for a single widely known personal identification number that may be misused.” (p 5) The Royal Academy of Engineering (RAE) explained that it is possible for individuals to fulfil their legitimate need or desire to maintain multiple roles or identities in transactions with state or other organisations and to avoid the possibility of those organisations needlessly correlating them. The technology involved in identification can be developed to suit an individual’s preference to keep domestic status and work life separate, where the protection of identity is necessary to avoid abusive relationships or stalking, or where witnesses and children need protection.118 We recommend that the Government’s development of identification systems should give priority to citizen-oriented considerations.

This is all good news! It is indeed at the procurement phase that such requirements for PETs should be specified and entrenched in the delivery contracts. Negotiating PETs for complex surveillance technologies will also make the cost of recording data just-in-case visible.

A friend of mine recently dropped his phone in water, and found that he lost all his SMS messages for the last month. I advised him to use his subject access rights under the Data Protection Act 1998 and ask his phone company “Three” for the records of calls, SMS messages as well as locations of the phone (just for good measure). The results were quite unexpected.

Here is the answer he got back (edited to protect identities) with some added emphasis:

Dear Mr X,

Thank you for your below email.

Please be advised that we do not disclose details of incoming calls or texts unless required under a Court Order.

Please also be advised that location data does not constitute ‘Personal Data’ as defined under the Data Protection Act 1998 (personal data is information which relates to a living individual who can be identified from that data).

I can confirm that we have no solely automated decision making processes in place. Our credit checking system is not solely automated and requires manual intervention.

If you require details of your outgoing calls or texts (we do not retain the content of text messages) I would be grateful if you would forward proof of your identity and a cheque for £10 made payable to Hutchison 3G UK Ltd. A photocopy of your passport or photo drivers licence would be acceptable proofs of ID. Please send this to:

Data Protection and Privacy Officer
H3G UK Ltd
Star House
20 Grenfell Road
Maidenhead
Berkshire
SL2 2NE

Kind regards

Yours sincerely

Rhian T.
Compliance Executive
Legal
Hutchison 3G UK Ltd

This answer is very surprising. Three does not state that they do not hold the data relating to incoming calls or text messages, but simply that they are not happy to provide them — with no further explanation as to why. Similarly the fact that there is a human in the loop of their credit decision processing (maybe just pressing “OK” at some stage) seems to shield them from the burden of disclosing anything about their processing of the data.

Yet what is most interesting is the statement that location data is not personally identifiable. First, in the case of a phone operator this is simply not true. They hold all necessary records to link a particular record describing the location of a handset, to a physical person. Yet, most interestingly, recent work by myself and collaborators in COSIC, Leuven, focused on showing that even coarse grained anonymized location data can be quickly and efficiently linked back to a physical person. The reference, link and abstract are below for those interested in reading more.

  • Yoni De Mulder, George Danezis, Lejla Batina and Bart Preneel. Identification via Location-Profiling in GSM Networks. Workshop on Privacy in the Electronic Society ( WPES 2008 ), Alexandria, Virginia, USA.

    Abstract: As devices in a cellular network move, they register their new location with cell base stations to allow for the correct forwarding of data. We show it is possible to identify a mobile user from these records of movement within the network and a pre-existing location profile, based on previous movement. Two different identification processes are studied, and their performances are evaluated on real cell location traces. The best of those allows for the identification of around 80$\%$ of users. We also study the misidentified users and characterise them using hierarchical clustering techniques. Our findings highlight the difficulty of anonymizing location data, and firmly establish they are personally identifiable.

[Update: URL of paper is now working.]

The German Big Brother award winners’ list is out, and the Technology award was given to a Pay-as-you-drive insurance company. The nomination and award statement reads:

The Big Brother Award 2007 in the “Technology” category goes to

PTV Planung Transport Verkehr AG,
represented by Dr Hans Hubschneider

for their system for individual rating of car insurances with the so-called “pay-as-you-drive” technology, i.e. a device that records routes and driving behaviour in a car and transmits these data to the insurance company. [...]

It is good to see that for once Privacy Technology proposals are not lagging behind: an interdisciplinary team from K.U.Leuven has already put forward a proposal for a privacy friendly pay-as-you-drive scheme.

Carmela Troncoso, George Danezis, Eleni Kosta, Bart Preneel: Pripayd: privacy friendly pay-as-you-drive insurance. WPES 2007: 99-107

The London Times are reporting that UK government officials are considering a centralized database to hold all traffic data gathered under data retention legislation. While the plan has been criticised as a privacy disaster (which it is), the centralized approach comes as no surprise and is in line with the economics of data retention.

Already in December 2001, when the traffic retention plans were still young, I commented on trends that are likely to follow a mandatory traffic regime. There are three of them:

  1. On-line traffic databases: The idea that traffic data will be stored in tapes archived in a locked room is simply a fantasy. In order to minimize costs associated with complying with Law Enforcement (LE) requests, service providers have incentives to keep records on live, spinning, storage. Similarly an easy to use (and abuse) interface is likely to be provided for technical personnel to process the queries and provide results. This is already the case in telephony: the Ericsson interception GUI manualis publicly available.
  2. Ahead-of time indexing: Trawling though a huge volume of call data takes time. In order to speed up the process (thus minimizing cost), as well as support legitimate business needs, service providers have incentives to index the call records to efficient retrieval. This means creating tables pointing to records by network-identifier of the caller / callee or time of call. This is the initial phase of an analysis process, and effectively creates an efficient decentralized surveillance infrastructure.
  3. Out-sourcing and centralisation: Communication and service providers are not in the business of running complex and cheap data retention regimes and answering requests. As for other business activities that are not at the core of their business model (like cleaning or catering) they are likely to outsource the task to a specialized company. Since the main activity of such a company is processing information, there are enormous economies of scale, most likely leading to (at best) an oligopoly, of a few providers. Those providers can at any point be in a “special relationship” with LE, or other government agencies, providing effectively a full feed in real time (as the current proposal suggests.)

It is interesting to note that LE as well as agencies have incentives to push as much as possible for those trends to materialize, since it increases the efficiency of the request process, as well as pushing towards centralization – meaning that it is easier to get to the data in bulk [1]. This can be done by imposing on operators `quality standards’, maybe even to guarantee the security and privacy of data, that will in effect push up the cost of in-house retention management.

[1] I realize that some people are not so cynical about the agencies being interested in the data in bulk, even if it is acquired illegally. Those people should definitely read the account of the blanket telegraph interception.

It is not often that a technical policy matter, such as Traffic Data Retention, mobilizes the masses and is the subject of popular demonstrations. Yet during the 24th Chaos Communication Congress a sizable crowd took the streets to protest against the new German Data Retention legislation to come into effect on Jan 1st, 2008. A related debate concerns the “Federal Trojan“, a piece of malware controlled by the German federal police, used to gather intelligence from an infected computer as part of an investigation.

 The illustration (from indymedia) of this beast is priceless:

The federal trojan

Claudia Diaz just forwarded an email by Eric Rescorla pointing to an article in Wired describing how the FBI has been gaining access to telephone traffic data without a warrant. A saucy exerpt:

The revelation is the second this year showing that FBI employees bypassed court order requirements for phone records. In July, the FBI and the Justice Department Inspector General revealed the existence of a joint investigation into an FBI counter-terrorism office, after an audit found that the Communications Analysis Unit sent more than 700 fake emergency letters to phone companies seeking call records. An Inspector General spokeswoman declined to provide the status of that investigation, citing agency policy.

[...]

The message was sent to an employee in the FBI’s Operational Technology Division by a technical surveillance specialist at the FBI’s Minneapolis field office — both names were redacted from the documents. The e-mail describes widespread attempts to bypass court order requirements for cellphone data in the Minneapolis office.

Remarkably, when the technical agent began refusing to cooperate, other agents began calling telephone carriers directly, posing as the technical agent to get customer cellphone records.

The interesting point here is how agents seemed to have been abusing the lawful access process, by pretending to be a colleague with legal authority, in order to get out of phone companies either records of calls and locations of phone, or surveillace equipment to be turned on. A similar scandal had broken out in Chicago back in 2006 when it became known that insiders in phone companies have been selling phone records to the FBI as well as private entities: the police was then concenrned that such information may be used by the mob to out informants.