It could be argued that a voluntary program for COVID-19 contact tracing would struggle to achieve its goal.
For instance, past voluntary contact tracing programs for sexually transmitted infections found inconsistencies in partner notifications. This was particularly observed when the relationship between the informer and the informed was weak.
By comparison, mandatory programs — those using contractual controls or notification-by-clinician — were more effective.
Mandating participation would overcome the trend of new patients failing to notify those potentially exposed. This would maximize the potential effectiveness of a contact tracing program.
Image Source: The Race to Trace, Cybersecure Policy Exchange
Privacy concerns with contact tracing apps
The obvious downside here is that mandatory contact tracing challenges the user’s threshold for privacy risk. These concerns are becoming increasingly important to the voting public.
Addressing these concerns is a necessary step toward satisfying the political realities for rolling out a contact tracing program. Data de-identification could be useful here, as they manipulate data so that it becomes unlikely for others to learn new things about a person.
There are several methods for doing this, all of which minimize the amount of personal information flowing through the system.
The value in manipulating user data
Consider unique identifiers that represent a smartphone device. Typically, these devices have only one user. Any personal information associated with that identifier, say in a centralized database, could lead to a successful cyberattacker learning something new about that person.
This represents a threat to data privacy.
A de-identification control would break the connection between the identifier and the user’s personal information. An example would be an identifier generator, similar to an RSA SecurID token, that provides a random and regularly changing identifier for the device. This control would make it more challenging for an attacker to link it with someone without also knowing the generator’s algorithm.
This approach offers protection in the event an attacker has an identifier, but what about a situation where the attacker only has the personal information?
Consider the following scenario, assuming a mandatory contact tracing program.
Take the example of a person who lives in absolute isolation but decides to buy a frozen dessert from an ice-cream truck one day. The driver of that truck later tests positive for COVID-19, leading to an alert to the ice-cream buyer. In this instance, the ice-cream buyer would instantly know who was positively diagnosed.
This would represent a privacy breach. (If this example sounds rare, consider a prisoner with a smuggled phone, or a person in a rural or remote setting.)
Using a differential privacy approach
In this scenario, the algorithm needs to stop the ice-cream buyer from connecting the alert with their single point of contact – the ice-cream seller. Introducing false alerts might achieve this.
This method — similar to a differential privacy control — would negatively impact the certainty someone might have that an alert is associated with a specific person.
The ice-cream buyer might expect a false alert, and wouldn’t be certain about the ice-cream seller’s health. Their privacy would be safe.
Taken further, centralized databases protected by differential privacy would be ideal candidates for analysis, given their reduced privacy risk. Large datasets would still exhibit trends in the data (albeit with greater variance), while the personal information within would be protected precisely because it is unreliable.
In this way, attackers seeking to identify a particular individual will be frustrated.
Moreover, many privacy regulations allow the sharing of data that has been identified, allowing them to extract value from the dataset with a reduced restriction by consent.
Some big caveats accompany differential privacy controls:
- Firstly, a subset of attackers seek to re-identify a large number of individuals, rather than a single individual (think telemarketers). Differential privacy does little to protect against this, as their goals are capable of tolerating missing or incorrect data.
- Secondly, this control is suitable for larger datasets, as the “noise” introduced by privacy controls can obscure trends with smaller statistical effect sizes.
- Lastly, these protective measures are difficult to explain to a large audience, so the chances of getting a widespread buy-in is limited.
Should we be concerned with false alerts?
There are concerns that incorporating false alerts would contradict guiding principles prescribed for contact tracing by developers. For example, “Avoid false positives by optimizing for test result certainty over self-reporting.”
It’s likely that this guidance is outdated, published when the demand for testing outstripped the supply. Today, that situation is reversed and health authorities are promoting repeated testing for controlling the pandemic. Using this approach, surplus testing capacity could be leveraged to provide additional value by safeguarding patient privacy.