The era of Big Data has opened a new chapter in various fields of science, medicine, and business by allowing knowledge to be extracted from vast quantities of collected data, which has never been possible before. In High Energy Physics, the discovery of the Higgs boson would not have been possible without a carefully designed architecture responsible for the processing of Petabytes of collected collisions at the Large Hadron Collider (LHC).
One of the common aspects of all data intensive applications is the streaming of recorded data from some remote storage location. Very often this imposes constraints on the network and forces a compute node to introduce complex logic to perform aggressive caching in order to remove latency. Moreover, this substantially increases the memory footprint of the running application on a compute node. The project HIOS aims to provide a scalable solution for such data intensive workloads by introducing heterogeneous I/O units directly on the compute clusters. This allows offloading the aggressive caching functionality onto these heterogeneous units. By removing this complicated logic from compute nodes, the memory footprint decreases for data intensive applications. Furthermore, the project will investigate the possibility to include additional logic, coding/decoding, serialisation, I/O specifics, directly onto such units.
An integral part of the project will be the ability to integrate developed units directly with current High Performance Computing (HPC) facilities. One of the main outcomes of the project will be the reduced time required to extract insights from large quantities of acquired information, which, in turn, directly impacts society and scientific discoveries.