computing on encrypted data
enables cooperation without data sharing
we all need data. the „new oil“ fuels industry 4.0 and machine learning. scientists and politicians, businesses and media need data to make well-informed decisions. free access to information is becoming a human right. connecting datasets turns data into meaningful and valuable information.
but data also renders us transparent and vulnerable. the free exchange of data is opposed by the right to privacy. data sovereignty and data protection are paramount. data leaks are a public relation nightmare. large data lakes are a prime target for hackers. often, we simply cannot centrally pool data, cannot share our data with others.
fortunately, mathematics has an answer to this problem. with modern cryptography, data may remain completely on-premises, while still enabling collaboration. participants join decentralised peer-to-peer networks without trusted third parties or central data pools. they never have to reveal the data in their custody. instead, they retain full control over how when and by whom their data is used. privacy and security are guaranteed by military-grade encryption.
use cases
machine learning – train neural networks on multiple confidential data sets
clinical research and mobile health – analyze populations without sharing patient data
public sector – combine the information of different departments without the need for access to each others’ databases
pandemic control – COVID-19 contract tracing is a very successful example of privacy-preserving computation across millions of smart devices
industry benchmarking and consulting – compare KPIs and compute best practices without leaking business secrets between competitors
autonomous driving and smart homes – service networks without exposing user data
supply chain resilience and deep analytics – learn about interdependencies in trustless networks
secure computing comes in several variants
secure multiparty computation
secure multiparty computation (SMPC) is the gold standard of privacy-preserving computing. several parties form a completely decentral peer-to-peer network. they exchange encrypted messages (“shares”) which do not reveal anything about their private data. only the result of the computation becomes known to the parties.
there are mathematical proofs of correctness and security. some of the protocols can even guarantee security if all but one single party are “corrupt” and try to circumvent the protocol.
homomorphic encryption
homomorphic encryption (HE) works with a central party, e.g. a cloud provider. however, unlike in regular cloud computing, the central party need NOT have the trust of the data owners.
the individual data owners encrypt their data on-premise before uploading it to the cloud. the cloud infrastructure then computes “blindly” on encrypted data. finally, the result is decrypted by the data owners.
the advantage is the highly efficient and fast computing on cloud infrastructure. however, not every computation can be performed on encrypted data, so use case or implementation may be limited.
federated learning
machine learning is dependent on large datasets for training. with federated learning (FL) these datasets need not be shared. instead, the machine learning model is trained at the sites where the data is stored. only the trained model is shared.
federated learning is a very active area of research. high performance computing is steadily pushing the boundaries of what is possible on infrastructure available today.