Privacy Risks

As emerging technologies increase connections between people, and as tech companies, governments, and private actors gain access and control over increasing amounts of data, privacy has become an enormous concern. For example, online communication and the increased connectivity through the Internet of Things have raised a host of privacy and data security issues. Think of home assistants that record everything that goes on in our personal spaces—our conversations with friends and family and our daily activities—and it soon becomes clear why people are concerned with maintaining their privacy.

What is Privacy?

Privacy means being free from observation or intrusion of our personal lives by others. It is a multifaceted concept. There is privacy of behavior, privacy of thoughts, privacy of the body, local privacy (not having our location revealed), decisional privacy (not having our decisions and actions interfered with), and informational privacy (the ability to control who has access to personal information, and to what extent).

One way to think about privacy is as limiting what we want others to know, and perhaps even control, about us. We might say that, as individuals, privacy is instrumental to our ability to be ourselves. It allows each of us to develop our individual identity as an autonomous person. Being in control of our personal information is part being in control of our own lives and agency. The "others" might be other people, law enforcement, governments, and corporations. At the individual level, privacy is intertwined with our ability to control who has information about us, and what they do with it.

Privacy and Emerging Technologies

We have reasons to care about privacy, not only as private individuals but also as creators of technologies that might impact privacy. At the organizational level, companies often deal with data privacy. This means collecting, storing, and using data and information about people responsibly. This not only means handling data in compliance with laws and regulations, but also in accordance with people’s interests and expectations. In this sense, data privacy is connected to data ethics. This means ‘doing the right thing’ with people’s data, and handling it responsibly—for example, by carefully considering how using people’s data might affect their rights and interests, and society more broadly.

Privacy and Power

Privacy is also closely connected to power. You might think of George Orwell’s 1984, in which Big Brother’s constant surveillance controls people’s behavior. If people know that they are being watched and observed, this will affect their behavior. Having one’s privacy protected is therefore essential to the ability to be oneself, without having one’s behavior influenced by knowing that one is being watched. This can also go the other way—think of how the privacy of internet anonymity has emboldened online trolls.

States or large corporations sometimes violate the privacy of individuals. In 2013, the former NSA employee Edward Snowden revealed that the U.S. government was using extensive surveillance capabilities to spy on its citizens. This opened people’s eyes to the fact that our private communications can be intercepted and used for purposes we might not intend or even be aware of.

More recently, the Chinese government has initiated a national program known as the social credit system. This system aims to collect data about individuals on a massive scale, and to use this data to give people a score as trustworthy or untrustworthy. On this basis, certain infractions have put Chinese citizens on a blacklist, which in turn barred these citizens from purchasing train tickets, renting hotel rooms, or taking out credit. The system relies on private-public partnerships, including commercial technologies such as Alibaba’s Sesame credit. Such collusions of state and private power can create situations in which private communications are constantly subject to potential privacy violations. Another tension to bear in mind in this context is that between fair competition and corporate hegemony, for instance, through data assets.

Identifying Privacy Risks

Sources of Privacy Risks

Privacy is at risk if user data is collected, shared, or processed in ways to which users did not meaningfully consent. To unpack the different types of privacy risks, it is useful to differentiate between risks arising with first versus third-party data and risks arising with primary versus secondary uses of data.

First-party data is collected directly from users or any other audience with whom you work.
Third-party data is collected from audiences with whom you do not have a direct relationship.
Primary use of data is using data according to the stated purpose for which it was collected.
Secondary use of data is using data beyond the original stated purpose.

Risks Arising with Data Collection and Use

First-party data is data an organization collects directly. The main risks you should be aware of are:

Collecting data without the users’ knowledge. A common example of this is web browser cookies.
Processing data in ways to which users did not meaningfully consent. For example, a payday loan provider may use information like the type of device a customer is using or their email provider to set interest rates.

Third-party data is data collected from audiences with whom you do not have a direct relationship. It may often be received from other organizations who do have direct relationships with the data subjects. Risks of collecting and using third-party data include:

Adherence to privacy standards is unverified. When you do not collect the data yourself, it can be difficult to verify whether the data was collected according to applicable privacy standards.
Data quality is unverified. When you do not collect the data yourself, it can be difficult to determine whether the data is accurate, complete, or representative.

Secondary use of data is use of data beyond the original intent with which it was collected. The most common sources or risks of secondary use are:

Monetizing data. For instance, introducing advertisements on a previously ad-free website that crowd-sources content, or selling user health data that users revealed about themselves to connect to other users with similar health conditions.
Cross-correlating datasets. For example, linking a dataset with personal contact information of your customers to a dataset with purchase information. Cross-correlation can reveal facts about users that they do not want to divulge, such as inferring whether they are pregnant, perhaps before users know themselves.

Privacy Regulations

Regulation and standards can help organizations minimize the risks involved in collecting and using personal data and protect users from having their personal data exposed.

Data privacy regulation applies to almost every product using an emerging technology. In addition, many jurisdictions have developed specific regulation applicable to more narrow domains.

The map below shows important data privacy regulation for emerging technologies. Note that most of these regulations apply not only to organizations headquartered in the respective countries, but also to all companies doing business in the respective jurisdiction. Therefore, organizations making their product available to customers globally do well to comply with the most stringent rules and regulation.

The Personal Information Protection and Electronic Documents Act (PIPEDA)

Privacy Online: Fair Information Practices in the Electronic Marketplace: A Federal Trade Commission Report to CongressFederal Trade Commission

Some general principles underlying these regulations include:

Transparency: State what data you collect and the reason you are collecting it.
Purpose limitation: Only collect the data that you need to achieve your stated purposes.
Data minimization: Pursue the way to achieve your stated purpose that requires the least collection and processing of personal data.
Accuracy: Take all reasonable steps to erase or rectify data that is inaccurate.
Storage limitation: Delete personal data when it is no longer required to achieve your stated
purposes.
Confidentiality: Process data in a way that ensures security of personal data.

ISO Privacy Standards

The most important standard to be aware of is the ISO/IEC 27701 standard. It provides guidance for implementing a privacy information management system. It builds on the requirements in ISO/IEC 27001, the information security management system standard, and the code of practice for information security controls in ISO/IEC 27002.

Implementing a management system compliant with these standards will enable you to meet the privacy and information security requirements set forth in GDPR and other data protection regulations. It is therefore a good place to start to embed privacy practices in your organization.

Privacy Risk Identification Techniques

Three basic techniques for identifying privacy risks are mapping the presence of private user data in the organization, tracking personal user data from collection to use, and modeling customer personas to identify non-obvious privacy risks.

Map the presence of personally identifiable information (PII). Identify all points in your workflow where PII is connected, and conduct a risk assessment for each piece of information.
Track customer data. Track each piece of personal user information. Watch out particularly for situations where you receive data from third parties, share data with third parties, and cross- correlate data.
Model customer personas. Sometimes, it is not obvious what information users consider most important and want kept private. Identify user groups and engage them in interviews or focus groups on privacy. This can help understand users’ goals, concerns, and level of technical expertise. Use the stakeholder-mapping methodology to map the privacy expectations and technical expertise of users.

Privacy Tradeoffs

Handling privacy risks is more than a matter of complying with regulations. Sometimes, protecting privacy requires difficult tradeoffs with other values, particularly:

Security
Public health
Convenience
Efficiency

Tradeoffs with Security

Measures that strengthen data protection to increase privacy can at times conflict with different forms of security. Recall the case where Apple refused to build a backdoor to the iPhone for the FBI, because, in the wrong hands, it had the potential to undermine the security of hundreds of millions of people who used Apple products. The FBI was working to protect people from a terrorist threat. Apple was working to protect people’s security in a different sense. But protecting user privacy, in this case, had the flip side of appearing to enable terrorism. Similar concerns are often raised about technologies such as blockchain, which can help users in gaining more privacy, but that are also often used by cybercriminals precisely for this reason.

Tradeoffs with Public Health

To contain a public health crisis, it may be necessary to track and trace people, which may infringe on their privacy.

Public health crises confronted governments and companies with the challenge of determining when, if ever, it’s appropriate to lift data privacy protections. With scientists rushing to develop medicines, rapid access to data anywhere was of the essence. In addition, some effective responses to containing a public health crisis rely on extensive government surveillance tools to track and isolate infected persons and those with whom they had been in contact. Quick access to large amounts of data can be vital for machine-learning forecasters in predicting the trajectory of a crisis. Intrusions on people’s privacy can save lives.

But privacy advocates are often concerned about the precedent this might set. In early 2020, the World Economic Forum released a statement, urging companies to maintain proper AI oversight. WEF’s head of AI and machine learning, Kay Firth-Butterfield, warned that, “we need to keep in mind that the big ethical challenges around privacy, accountability, bias, and transparency of artificial intelligence remain.”

Tradeoffs with Convenience and Efficiency

Protecting privacy often comes with measures for data-minimization, which means that service providers collect only the data necessary for the basic functioning of a service. However, data- minimization can affect the potential effectiveness of a service and the convenience with which it is used.

Tailoring services to individuals based on their personal characteristics might make services like targeted advertising significantly more accurate and efficient. But this also uses information about people’s tastes and preferences in a way that compromises their privacy and autonomy interests.

Sacrificing some of our privacy might save us time, and give us access to things we wouldn’t otherwise be able to access. The majority of people who use the Internet have little understanding of who has access to their data. Even those of us who have some idea typically treat our personal information like a currency - we’re willing to give up some of it for convenience. In fact, in November 2019, Pew Research reported that “roughly six-in-ten U.S. adults say they do not think it is possible to go through daily life without having data collected about them by companies or the government.”

Tradeoffs between privacy and convenience also occur when we use wearables to track our health and fitness. Wearables like Fitbit collect vast amounts of health-related data from their users. For machine learning algorithms that are used for data-analysis, it is often useful to have access to a great range of datasets to produce the best results for the user. Yet, greater access to these datasets can come with privacy risks.

Another context in which tradeoffs with privacy occur is home convenience. For example, Ring and Nest doorbell systems capture video of both your front door and your neighborhood. Ring has been actively collaborating with local law enforcement, which has raised many eyebrows about privacy protection at the local level.

Mitigating Privacy Risks

Framework for Selecting Mitigating Strategies

There is a dazzling number of tools and techniques for protecting user privacy, from discussions about how to obtain meaningful consent from users to sophisticated encryption tools. Often, there will be multiple ways of addressing a privacy risk. Yet there is a hierarchy between options for improving privacy. Here is a three-step framework to cut through the noise.

Minimize data collection and sharing. This is the foundation of managing privacy risk. The minimization requirement is aimed at avoiding gratuitous privacy risks.
Protect data. For all user data that needs to be collected or shared, organizations should make use of state-of-the-art techniques for protecting data. This due-diligence requirement is aimed at minimizing privacy risks that cannot be avoided in the first place.
Opt-in and obtain informed consent. Remaining privacy risks should be communicated transparently to users to enable them to make an informed choice whether to take a privacy risk. This requirement aims at giving users agency in taking privacy risks.

Not collecting private information in the first place is the most obvious way of reducing privacy risk. For every piece of personal user information, ask yourself if you really need this information to build your product or provide your service. Data sharing between organizations should also be minimized, because data sharing makes it difficult to track how data is used. Just because a user gave consent that their data is shared with some entity, it does not mean that this consent applies to third parties.

We should not underestimate the complexities in deciding whether to collect or share personal user data. One issue is that what is necessary is not clear-cut. Rather, applying the criteria requires making ethical judgment calls.

Consider Facebook. Before Facebook came to the market, online chat rooms and networks allowed people to pick a user name which did not have to be their real name, and usually was not. Facebook deliberately required users to create profiles under their real names. That was a strategic decision to create a new type of social network. Rather than facilitating strangers to encounter each other using pseudonyms, Facebook wanted to make it easy for people who were friends in the real world to find each other. Was it necessary for Facebook to adopt the policy? The issue is not black and white. The ability to find friends by their real names was part of Facebook’s value proposition from the start. Yet other social-media platforms function without this requirement.

Moreover, product teams know that features of products often change over time. It is often only through user requests after a product has been launched that organizations discover the functionalities that users value most. Even very mature products using emerging technologies are under constant development. The expectation to pivot makes it tempting to collect more data than strictly necessary for current purposes to keep options open.

Finally, there is sometimes a tradeoff between minimizing the collection of personal data and fair treatment. For instance, a bank might take the necessity requirement too far by making credit decisions based on the socio-economic profile of people living in your postcode, rather than collecting detailed information about your individual financial situation.

2. Protection of User Data

For all user data that needs to be collected or shared, organizations should make use of state-of-the-art techniques for protecting data. The specific techniques change quickly. Three types of techniques to protect user data are anonymization, encryption and zero-knowledge protocols. Each of these is discussed in more depth below, along with their associated advantages and disadvantages.

Anonymization

Anonymization permanently removes all data that might identify a subject. Anonymization is a form of de-identification: a process to prevent someone’s personal identity from being revealed. For instance, you might delete name, email-address, and other columns that hold identifiable information from an employee database. A close cousin of anonymization is pseudo- anonymization, which disguises all data that might identify a subject. For instance, in an employee database you might replace names with numbers and keep a record of the mapping of names to records elsewhere. Another closely related concept is the use of synthetic data. Synthetic data is data generated by a computer simulation, approximating personally identifiable data, but fully algorithmically generated.

The advantage of anonymization is that it is less likely that data can be traced back to an individual if exposed, reducing the risk of violating a user’s privacy. Note, however, that combining information from apparently innocuous columns such as business unit, gender, and age may be sufficient to single out individuals. Moreover, anonymized data may reveal identifiable information if it is linked to another database. For instance, the Ministry of Road Transport in India sold information about vehicle owners to private companies. Through registration numbers available in the dataset, the database can be linked to another database containing driving license records and insurance information.

Privacy Breach? Transport Ministry Selling Driving License, Vehicle Registration Data To Commercial Firmshttps://www.outlookindia.com/

Encryption

Encryption protects information from access by unauthorized people. Data can be encrypted both at rest and in transit.

Consider the encryption of data at rest. Regular encryption uses algorithms and a key to encode a message. To decrypt the resulting cyphertext, translating the data back into its original form with the same algorithm and key is required.
For encryption in transit, VPNs can be used to hide IP addresses and other identifiable information in all network traffic, and to encrypt data; proxies can be used to do the same with web traffic.

Homomorphic encryption is a technique to analyze and manipulate data while it is still encrypted. Only the results are decrypted. It enables you to work on data without sharing it in its unencrypted form.

Encryption of data in transit and at rest is now a standard requirement in privacy regulation and standards. The latest encryption techniques should be used wherever available to protect data from being exposed to unauthorized people. These techniques can reduce the risk of data being exposed inadvertently. However, homomorphic encryption aside, data that cannot be decrypted in any way is useless. Each point at which data is processed in an unencrypted form is an attack vector.

Zero-Knowledge Protocols

Zero-knowledge protocols allow you to prove that you have a piece of information without revealing the information itself. They are techniques to implement differential privacy, as they allow the public sharing of information about a dataset by describing patterns within the dataset while withholding information about individuals in the dataset. These protocols can be used to communicate about personal data without revealing the actual data. For instance, rather than sharing bank account statements, zero-knowledge protocols enable users to prove they have a certain level of income without revealing any further details about their income.

Zero-knowledge protocols are useful to transfer relevant information to third parties while minimizing the exposure of private user data. They are a useful addition in the privacy toolbox. However, their application is limited to cases where third parties require less sensitive information about a user than the organization would need to share as proof.

Informed consent is consent given based upon a clear appreciation and understanding of the facts, implications, and consequences of an action. Only once the previous strategies are exhausted should organizations rely on eliciting informed consent from users about what data is collected about them and how this data is used. But users typically lack a sophisticated understanding about what data an organization may collect about them, and how that data is used. In fact, research shows users lack the knowledge to make informed decisions about privacy options.

This raises a big problem for privacy strategies relying on informed consent: There is insufficient comprehension and willingness in the consent process for users to give informed consent for the collection and management of their personal information. Therefore, obtaining informed consent cannot replace the previous strategies, but should be used only to address remaining privacy risks once the previous strategies are exhausted.

As a result, organizations should not go too far beyond ensuring that users have accepted a data sharing agreement. To make consent more meaningful, the policy you propose to users should:

Live up to high ethical standards like minimizing data collection and data protection.
Ask for consent in a way that gives users agency about what happens with their data.
Connect to the framework for an ethics risk assessment, in that organizations should be wary of taking risks where they are the beneficiaries and decision makers, but not exposed to the risk, whereas users bear the risk but have no decision-making powers.

Establishing meaningful consent and giving users options aims at transforming the risk role of users to make them co-decision-makers about what happens with their data.

Online privacy and informed consent: The dilemma of information asymmetryAssociation for Information Science & Technology

PreviousSecurity Risks NextFairness and Bias Risks

Last updated 2 years ago