Privacy model and synthetic data helped a lot

AMdEX has achieved many milestones via its usecases. In a series of articles, we’ll look back on these results and showcase their user scenarios. This article describes the UNL usecase. 

The association University of the Netherlands (UNL) is an umbrella organisation, acting on behalf of the universities of The Netherlands. Dr. Thomas van Binsbergen is Assistant Professor at the Complex Cyber Infrastructure Group, Computer Science Institute, University of Amsterdam.

Thomas van Binsbergen | AMdEX

Thomas van Binsbergen

What was the objective of the usecase?

“The UNL collects data about university employees for the benefit of the universities and to report to the Dutch Ministry of Education. This involves a trade-off, not only between the privacy of university personnel and the kinds of analyses that are required. But also, between the control achieved through manual intervention and the efficiency of automated decision making. Our objective was to investigate the trade-offs and to see whether the compute-to-data and third-party computation data exchange archetypes can offer solutions that are feasible from an organisational and technical perspective.”

What were the biggest challenges?

“It proved difficult to reach agreements that do sufficient justice to different interests, such as privacy/GDPR and analytical power. A particular challenge is quality control, which cannot be done remotely within our system without sharing sensitive data. Furthermore, there is tension between the level of detail in the data supplied and the quality of analysis results. Also, different technical solutions support more, or less accuracy and more, or less privacy.”

What were the main lessons learned?

“The compute-to-data concept is certainly very appealing as it gives control to the data provider and still enables automation of data processing. Automation of control has been achieved by applying concepts such as K-anonymity (a privacy model) and synthetic data (generated by algorithms). For some scenarios, the distributed data does need to be brought together to support the required analysis (e.g., certain regressions). For these scenarios secure multi-party computation is a great privacy-enhancing technology.”

How can the results of this experiment be applied further?

“UNL is planning a pilot with AMdEX partners, including SURF, the collaborative organisation for IT in Dutch education and research. Three or more universities will participate. The pilot will be used to facilitate the discussion on the agreements to be made between the universities on how data will be exchanged, considering the various trade-offs. The pilot should make the questions concrete and the effects of decisions tangible.”

What user scenarios did you encounter in the usecases?

“The first scenario is Manual Approval. Each consortium member can submit data processing requests (i.e. to act as a data consumer). All data providers (the universities) can accept or deny such requests. Scenario 2 is Automatic Processing and Clearing. Manual approval by the data custodian is not necessary, as the AMdEX infrastructure can automatically approve requests for synthetic data. The Trusted Third Party (TTP) in scenario 3 is neither data user nor data owner, but a service provider. The TTP is a member of the consortium and the dataspace.

The three scenario’s we encountered will be described at length in our deliverable.”

What are the recommendations for future research or experiments?

“We would like to experiment with the automatic enforcement of (parts of) the agreements made during the pilot phase. We’re particularly interested in seeing how the agreements can be used to configure privacy-enhancing techniques such as synthetic data generation and secure multi-party computation.”

Deliverables UNL usecase

The deliverables are described fully in the Reference Architecture Report.