Introducing SAPU, the platform for sensitive data analysis


On April 28, 2023, NCC Estonia organized a webinar where SAPU, an environment for processing sensitive data, was introduced. In addition to Estonian stakeholders, invited guests from Denmark and Bulgaria also participated in the webinar. HPC centres often store valuable research data that could be used by other scientists and institutes to perform research and…

On April 28, 2023, NCC Estonia organized a webinar where SAPU, an environment for processing sensitive data, was introduced. In addition to Estonian stakeholders, invited guests from Denmark and Bulgaria also participated in the webinar.


HPC centres often store valuable research data that could be used by other scientists and institutes to perform research and answer questions. This data is often sensitive, or potentially sensitive in certain circumstances. At this point, a contradiction arises between security and research worlds. Security demands that this data should be locked in a box accessible for only very specific people. On the other hand, research wants to make this data available for everyone to convey scientific knowledge and solve problems. By accepting some risks, and replacing some aspects of the physical boundary with legal ones, it is possible to use technology to achieve something similar to the locked box.


Specialists from the HPC centre of the University of Tartu are currently developing the data analysis platform SAPU that has a good potential to enable secure processing of sensitive data. SAPU is an isolated computing environment whose communication with the outside world is secured through a VPN, a firewall and an intrusion detection system that monitors network traffic for suspicious activity and alerts when such activity is discovered. Therefore, SAPU enables analysts and programmers to work on sensitive data by reducing the possible data unauthorized copying, transfer, or retrieval from the machines and providing a higher class of security than a standard high-performance cluster would.


In principle, SAPU runs on a virtual machine that has no access to the outside world and access to the machine is possible only through a virtual desktop environment. After the data owners or technical representatives of the data owners have moved the necessary data inside the machine, the data analysts can get access to the machine. Analysts can move files using object storage, which saves anything moved. However, moving files out requires approval from the data owners’ side.
SAPU has already gained popularity among geneticists and other Estonian researchers working with sensitive data.

Although still in a beta status with some challenges still needing to be resolved, SAPU solution has the potential to be necessary for HPC centres dealing with sensitive data storage and processing.


More information: https://docs.hpc.ut.ee/services/SAPU/


YOU MAY ALSO LIKE

  • EuroCC4SEE @ IT IEEE Žabljak, Montenegro: Advancing HPC/AI Collaboration and Innovation in Southeast Europe

    The EuroCC4SEE project was successfully presented at the 29th IT IEEE Conference in Žabljak, Montenegro, bringing together national and regional experts in HPC and AI experts. The conference served as a platform for international networking, business-academia collaboration, and knowledge exchange, aligning emerging HPC/AI trends, services, and innovations with the needs of scholars, researchers, and industry…

  • EUROCC2 & EUROCC4SEE at IT2025 – Advancing HPC Collaboration in the SEE Region

    At the IEEE IT2025 Conference in Žabljak, EUROCC2 and EUROCC4SEE will play a key role in fostering high-performance computing (HPC) adoption through strategic discussions, expert panels, and networking opportunities. With a strong lineup of discussions and expert engagement, these sessions will drive innovation, knowledge exchange, and future collaboration in the European HPC ecosystem. Don’t miss…

  • IT2025 Conference and EuroCC sessions

    We are pleased to inform you that the Program of the 29th IT Conference and the schedule of sections are posted on the site within the links “Conference program” and “Sections schedule”. We would like to inform you that you can follow the work of the Conference via a videoconference link. Authors of papers will…