Throughput with timeslice

1/15/2023

Throughput with timeslice

Read Now

The building of subsequent time-slices is parallel in space and sequential but overlapping in time. Short duration and variance of time-slice building can only be achieved with uniform usage of the network. A high aggregate bandwidth can only be achieved by utilizing all communication links at all times. Our goal is to achieve a high aggregate bandwidth between input and compute nodes while maintaining short duration and variance of time-slice building (receiving all contributions) in a scalable system. The data distribution and time-slice building are done in a parallel cluster application called FLESnet, which is part of the First-Level Event Selector (FLES) 10 of CBM. Subsequent time-slices are assigned to different compute nodes in a round-robin fashion for analysis to sustain the incoming data rate and to spread the load equally.

To build the time-slices, each sensor contributes via its input node its measurements corresponding to that time-slice for the analysis.

The packets of measured data signals arrive at multiple input nodes and are analyzed in aggregated time-slices at compute nodes. High-energy nucleus-nucleus collisions are observed by many sensors surrounding the experiment, generating hundreds of data streams with an aggregate data rate of a few terabytes per second. The CBM experiment is currently set up to explore the QCD phase diagram 9 in the region of high baryon densities. Some applications require to collect and aggregate stream data based on feature-, sensor-, or time-dependent constraints before they can be processed in (mini-)batches, as in case of the Compressed Baryonic Matter (CBM) experiment. 3, 4 It is especially challenging to solve when hard real-time constraints have to be met as in latency-sensitive 5, 6 and life-critical 7 applications. The paradigm is successfully used in time-critical applications such as sensor data processing, 1 simulation and prototyping, 2 and real-time query processing. The stream processing paradigm aims at the continuous processing of incoming data at the rate of all incoming data streams sustainably. We present a distributed Data Flow Scheduler (DFS) that reduces the variance of arrival times from all sources at least 30 times and increases the achieved aggregate bandwidth by up to 50%. Motivated by the use case of the Compressed Baryonic Matter experiment (CBM), we study the performance and variance of such communication patterns on a Cray XC40 with different routing schemes and scheduling approaches. When there are many concurrent point-to-point connections, the communication pattern needs to be dynamically scheduled in a fine-grained manner to avoid network congestion (links, switches), overload in the node's incoming links, and receive buffer overflow. In such scenarios, it is critical to saturate but not to congest the bisectional bandwidth of the network topology in order to achieve a good aggregate throughput. Achieving efficient many-to-many communication on a given network topology is a challenging task when many data streams from different sources have to be scattered concurrently to many destinations with low variance in arrival times.

0 Comments

Throughput with timeslice

Leave a Reply.

Author

Archives

Categories