In practice, this data is often held by different parties in a variety of sources. If this data is to be linked together and analysed in a way that is secure and respects privacy, it is essential for the algorithms to be trained without the need for mutual sharing of the underlying data from the various sources. Within this project, personal medical data related to lung cancer is the starting point. New proof-of-concept solutions are being developed, implemented and tested using synthetic data.
Use of artificial intelligence (AI)
Various AI innovations are being utilised within the project, such as multi-party computation (MPC) and federated learning (FL). MPC is a collection of cryptographic techniques allowing multiple parties to carry out joint analyses of their data without having to share that data with each other or with a third party. This allows both simple analyses and AI algorithms to be applied without infringing privacy rules. FL resolves the privacy problem by taking the analyses to the data instead of bringing the data to the analyses. The analyses are broken down into small partial calculations that are carried out locally by the various parties. After a local computations have been executed, only its results or intermediate output are shared with one or more external parties. As a result, the sensitive data items are not shared with anyone and remain with the source. Both MPC and FL result in a far more precise algorithm while protecting the data source.
What challenge does it solve?
The project aims to make data analyses possible in a secure and scalable way that respects privacy. In this case the focus is on lung cancer. Combinations using two AI innovations – multi-party computation and federated learning – will thereby generate a better understanding of the factors affecting the impact of treatment and the survival rates for cancer patients.
The same technique can be used for all kinds of other analyses where conditions for privacy, confidentiality and security apply.
What will the use case teach us?
The intended result is that scalable and privacy-friendly open source tooling will be developed, based on multi-party computation and federated learning. The project is demonstrating that these tools can be used for lung cancer to obtain new understandings based on multiple data sources. The solutions developed will be generically usable for all kinds of applications in which AI algorithms have to be trained using sensitive data from several parties.
In addition, learnings will be shared on the process and implementation aspects.
This project is part of TNO’s Appl.AI programme and it is partly financed from the kickstart fund that the NL AIC received from the government for research and development of AI applications. SELECTED stands for ‘Secure Learning for oncology on vertically partitioned data’. It is being developed by TNO. This is done in collaboration with the parallel project LANCELOT, partly funded by Holland High Tech, in collaboration with the Netherlands Comprehensive Cancer Organisation (IKNL) and Janssen.
- A number of new Multi-Party Computation components and solutions have been published open source, which can be used to develop secure analyses on distributed datasets.
SELECTED contributed to the secure Kaplan Meier implementation, an MPC solution to securely compute log-rank statistics associated with the Kaplan-Meier estimator, an essential technique for survival analysis, frequently used in the medical domain.
In addition, the parallel project LANCELOT contributed to the secure logistic regression model within the open source secure learning package, in which AI models can be securely trained using MPC. Logistic regression has been chosen as the AI model to be used in the lung cancer use case.
- A joint flyer between IKNL and TNO, presented at Health RI 2021, on our work on secure survival analysis.
- In a special issue of ERCIM News on privacy preserving computation, TNO and IKNL jointly wrote a short article on the use of Multi-Party Computation for Kaplan-Meier and their application in the medical domain.
- A scientific paper written jointly by TNO and IKNL on their results using Multi-Party Computation to train the Cox Proportional Hazards Model on distributed datasets without violating privacy. This is a widely used model in the medical field to determine the influence of certain factors (such as tumour characteristics and patient characteristics) on the survival rate. It has been accepted in BMC Medical Informatics and Decision Making, and will be published there in 2022.
- A memo and implementation on the application of federated learning to train generalized linear models (GLMs) on vertically partitioned data, and ideas on how MPC could be coupled to this solution to make it more secure. This is currently not public yet, but plans are to make this public in 2022, e.g. via a paper.
We aim to:
- Integrate MPC-techniques with FL algorithms for Generalized Linear Models (developed in 2021) applicable on vertically partitioned data, to ensure less information needs to be shared between organisations, and reduce potential information leakage. This helps to gain valuable insights on how MPC and FL solutions can be integrated to.
- Publish about the new secure techniques developed in the project applied to the lung cancer use case, based on results in the accompanying LANCELOT project with IKNL and Janssen that will be achieved in 2022.
- Presentations about the results in different venues (to be determined).