I am at the Privacy Enhancing Technologies Symposium in Minnesota. As it is customary the conversations with other privacy researchers and developers are extremely engaging. One topic I discussed over beer with Mike Perry and Natalie Eskinazi is whether secure group messaging protocols could be simplified by relaxing some of their security and systems assumptions.

In the past I questioned whether group key agreements need to be contributory of symmetric. In this post I argue that plausible deniability properties can be relaxed, leaderlessness assumptions may be too strong, full asynchrony unnecessary, and high-scalability unlikely to make a difference. In brief, protocol designers should stop looking for the perfect protocol, with all features, and instead design a protocol that is good enough. This is controversial, so I will discuss those one by one.

Plausible deniability within a group?

Off-the-record messaging protocols, from OTR to the protocols underlying signal / whatsapp, provide cryptographic plausible deniability. In a nutshell, this means that when Alice talks to Bob, after a certain point in time Bob will not be able to prove cryptographically to any other party that Alice sent a particular message. This is technically achieved by either publishing the signature keys used after a while, or using symmetric message authentication codes to ensure Bob could have forged any authenticatior.

Before moving to the group setting it is worth thinking what plausible deniability buys you in the 2-party case: It protects Alice from an adversary, that needs cryptographic proof that she said a particular thing to Bob. However, the adversary — at the time of the conversation cannot be Bob. Bob is convinced that it is Alice that sent a message — and thus there is no need for other evidence. Similarly, if Bob is fully trusted by the adversary — for example to keep a tamper proof log of all messages sent and received and all secrets — then the defense is ineffective.

Plausible deniability protects Alice in a very narrow setting: When Alice sent a message to Bob, who at the time was honestly following the protocol, and that in the future decides to work with an adversary (that does not fully trust Bob) to convince the adversary that Alice said something. Even in the 2-party case, this is quite eclectic.

However, this property becomes even more elusive in the multi-party case. Imagine that Alice sends a message to a group of 7 participants, though a channel providing plausible deniability. At the time the message is received by 6 parties, that are all convinced that it came from Alice. For plausible deniability to be meaningful, all those parties at that time should not be adversaries. Then, down the line one or more (incl. Alice) transcripts are leaked, and the those transcripts do not contain any cryptographic authenticators for Alice.

However, for adversaries not performing legal attacks (when a court needs to be convinced of a fact) the lack of authenticators has never been a barrier to believe in a transcript of a conversation. Similarly, courts in the UK, do not rely on cryptographic authenticators to ensure authenticity: a record of a conversation, backed by an expert witness that states that it is unlikely to have been modified (through forensics) is sufficient. Furthermore, in the context of a group chat there will be multiple transcripts, all matching — unless users engaged in a very wide conspiracy to make the discussion look like something else. No software facilitates this, and without this facility, plausible deniability would provide a meaningless protection.

So my conclusion is as follow: a weak form of plausible deniability may be provided by ensuring honest parties delete authenticators (incl. signatures) once they have been checked. This ensures that honest captured devices do not contain cryptographic evidence. However, as we discussed above, dishonest parties can anyway bypass plausible deniability — and thus providing it through crypto tricks increases complexity without providing much protection.

So I advocate for the simple options of using signatures and deleting them upon reception. If your chat partners are bad, and do not delete it at this point, you are in trouble anyway.

Groups will always be small.

People want to have private conversations within groups. However, how large can these groups be until at least one of the participants or their device is compromized? The size of the group is often used in secure group chat protocols as a parameters N, and the cost of the  protocol in terms of number of messages or their size can be O(N) or O(1) or O(logN). However, this is meaningless since N will never become too big — for systems or operational security reasons.

Thus secure group chat protocols that work well for small number of users are likely to be good enough — no need to optimize in terms of N. Optimizations that reduce the constant factors are likely to be as important.

Submitting to a leader & asynchrony

Significant complexities arise in secure group chat protocols from the implicit assumption that all members of the group are equal, they talk through point-to-point messages, and that no infrastructure exists that can be trusted to anything relating to the chat. However, this is not how popular chat programs work: irc, xmpp, signal, whatsapp all rely on servers to mediate communications, and provide some services!

The most important service that can be provided by a central party is ordering: a server or a special participant can assume a “leader” role to order messages, in a total order that others will accept. They may be prevented from doing this arbitrarily: clients can include enough information to ensure that the total order imposed is compatible with causal ordering of messages (as Ximin Luo points out). However, since there is no canonical total ordering, why not rely on the ordering imposed by one of the members of the group (the leader) or a services mediating the communications.

Of course, a key challenge when relying on a leader is what happens when the leader dies or goes off line. When there is a graceful exit the leader can handover to the next leader. When there is a crash-failure things become more complex: group member need to detect the failure and replace the leader. This is complex in systems assuming high degrees of asynchrony, where there is no cut-off time out after which one can be sure another node it dead. In that case paxos or pBFT protocols need to be used to do leader election and consensus. However, if one ditches this assumption, and instead assumes that there exists a perfect failure detector — then nodes can detect failures and replace the leader.

Now, in practice users do not expect a chat session to survive the demise of the service they use (be it skype, whatsapp, or facebook). So it seems that providing such a property for secure group chat is unnecessarily complex.


I would now advocate group secure messaging systems on the following lines:

  • Ditch full plausible deniability; use signatures, and rely on honest parties to delete them upon checking.
  • Design systems for optimal operation with group sizes of 3-50. There is no point in having much larger groups, or killing designs on the basis they do not perform when N goes to infinity.
  • Rely on an infrastructure server or a leader within the group to offer consistent total ordering to all. Assume that others can detect its liveness through a timeout (synchronous), and either elect or rotate another one in. Or simply, stop the chat session once they become unavailable.

You know you live in 2017 when the top headline on national newspapers relates to a ransomware attack on the National Heath Service, the UK Prime minister comments on the matter, and the the security researchers dealing with the outbreak are presented as heroic figures. As ever, The Register, has the most detailed and sophisticated technical article on the matter. But also strangely the most informative in terms of public policy. As if somehow, in our days, technical sophistication is a prerequisite also for sophisticated political comment on those matters. Other news outlets present a caricature, of the bad malware authors, the good security researcher and vendors working around the clock, the valiant government defenders, and a united humanity trying to beat the virus. I want to break that narrative open in this article, and discuss the actual political and social lessons we should be learning. In part to avoid similar disasters in the future.

First off, I am always surprised when such massive systemic outbreaks of malware, are blamed squarely on the author(s) of the malware itself, and the blame game ends there. It is without doubt that the malware author has a great share of responsibility. I personally think it is immoral to deploy ransomware in the wild, deny people access to their data, and seek to benefit from this. It is also a crime in the UK and elsewhere.

However, it is strange that a single author, or a small group of authors, without any major resources can have such a deep and widespread effect on major technological infrastructures. The absurdity becomes clear if we transpose the situation into the world of traditional engineering. Imagine all skyscrapers in major cities had to be evacuated, because a couple of teenagers with rocks were trying to blackmail business owners to pay up, to protect their precious glass windows. The fragility of software and IT systems seems to have no parallel in any other large scale engineering infrastructure — and this is not inherent, but the result of very specific micro-political, geo-political and economic decisions.

Lets take the WannaCrypt outbreak and look at the political and other social decisions that lead to the disaster — besides the agency of the malware authors:

  • The disaster was possible in part, and foremost, because IT systems within the UK critical NHS infrastructure are outdated — and for example rely on Windows XP that is not any more being maintained by Microsoft. Well, actually this is not strictly true: Microsoft does make security updates for Windows XP, but does not provide them for free — and instead Microsoft expects organizations that are locked in the OS to pay up to get patches and stay safe. So two key questions need to be asked …
  • Why is the NHS not upgrading to a new versions of Windows, or any other modern operating system? The answer is simple: line of business applications (LOB: from heath record management, specialist analysis and imaging software, to payroll) may not be compatible with new operating systems. On top of that a number of modern medical devices, such as large X-ray scanners or heart monitors, come with embedded computers running Windows XP — and only Windows XP. There is no way of upgrading them. The MEDJACK cyber-attacks were leveraging this to rampage through hospitals in 2015.
  • Is having LOB software tying you to an outdated OS, or medical devices costing millions that are not upgradeable, a fact of nature? No. It is down to a combination of terrible and naive procurement processes in health organizations, that do not take into account the need and costs if IT and security maintenance — and do not entrench it into the requirements and contracts for services, software and devices. It is also the result of the health software and devices industries being immature and unsophisticated as to the needs to secure IT. They reap the benefits of IT to make money, but without expending much of it to provide quality and security. The tragic state of security of medical devices has built the illustrious career of my friend Prof. Kevin Fu, who has found systemic attacks against implanted heart devices that could kill you, noob security bugs in medical device software, and has written extensively on the poor strategy to tackle these problem. So today’s attacks were a disaster waiting to happen — and expect more unless we learn the right lessons.
  • So given the terrible state of IT that prevents upgrading the OS, why is the NHS not paying up Microsoft to get security patches? That is because the government, and Jeremy Hunt in particular, back in 2014 decided to not pay up the money necessary to keep receiving security updates for Windows XP, despite being aware of the absolute reliance of the NHS on the outdated software. So in effect, a deliberate political decision was taken, at the highest level of the government to leave the NHS open to cyber attack. This is unlikely to be the last Windows XP security bug, so more are presumably to come.
  • Then there is the question of how malware authors, managed to get access to security bugs for windows XP? How did they get the tools necessary to attack such a mature, and rather common system, about 15 years after Windows XP was released, and only after it went out of maintenance? It turn out that the vulnerabilities they used, were in fact hoarded by the NSA as a cyber weapon — which was lost or stolen by hackers or leakers, and released into the wild! (The tool was codenamed EternalBlue). For may years, the computer security research community has been warning that stockpiling vulnerabilities in very common software for cyber-offense purposes, is dangerous. When those cyber weapons are lost, leaked, or even just used, there is proliferation of the technology necessary to attack, which criminals and foreign states can turn against critical infrastructure. This blog commented on the matter as recently as March 8, 2017 in a post entitled “What the CIA hack and leak teaches us about the bankruptcy of current “Cyber” doctrines”. This now feels like an unfortunately fulfilled prophesy, but the NHS attack was just the expected outcome of the US/UK and now common place doctrine around cyber — that contributes to and leverages insecurity rather than security. Alternative public policy options exist of course.

So to summarize, besides the author of the malware, a number of other social and systemic factors contribute to making such cyber attacks possible: from poor security standards in heath informatics industries; poor procurement processes in heath organizations; lack of liability on any of the software vendors (incl. Microsoft) for providing insecure software or devices; cost-cutting from the government on NHS cyber security with no constructive alternatives to mitigate risks; and finally the UK/US cyber-offense doctrine that inevitably leads to proliferation of cyber-weapons and their use on civilian critical infrastructures.

It it those systemic factors that need to change to avoid future failures. Bad people wishing to make money from ransomware, or other badness, will always exist. There is a discipline devoted to preventing this, and it is called security engineering. It is time industry and goverment start taking its advice seriously.