The rCUDA technology: an inexpensive way to improve the performance of GPU-based clusters

Speaker

Associate Professor Federico Silla, Universidad Politecnica de Valencia (UPV), Spain-- 14-04-2015

Abstract

The use of GPUs to accelerate general-purpose scientific and engineering applications is mainstream nowadays, but their adoption in current high performance computing clusters is primarily impaired by the trend of including accelerators in all the nodes of a cluster, as this presents several drawbacks. First, in addition to increasing acquisition costs, the use of accelerators also increments maintenance and space costs. Second, energy consumption is also increased, as GPUs are known to be power-hungry devices. Third, GPUs in such a cluster may present a relatively low utilization rate, given that it is quite unlikely that all the accelerators in the cluster will be used all the time, as very few applications feature such an extreme data-concurrency degree. In consequence, reducing the amount of GPUs installed in the cluster and virtualizing them is revealed as an appealing strategy to deal with all these drawbacks simultaneously, as those nodes equipped with GPUs become servers that provide GPU services to all the nodes in the cluster.

In this talk we introduce the rCUDA remote GPU virtualization framework, which has been shown to be the only one that supports the most recent CUDA versions, in addition to leverage the InfiniBand interconnect for the sake of performance. rCUDA not only increases the amount of jobs executed per time unit in a cluster, but it can also improve per-job performance by enabling a single application to exploit all of the GPUs in the cluster (up to 64 have been tested). In this talk we also present the last developments within this framework.

Bio

Prof. Federico Silla received the MS and PhD degrees from Technical University of Valencia (UPV), Spain. He is currently an associate professor at the Department of Computer Engineering (DISCA) at that university. His research is mainly performed within the Parallel Architectures Group of Technical University of Valencia, although he is also and external contributor of the Advanced Computer Architecture research group at the Department of Computer Engineering at the University of Heidelberg. Furthermore, he worked for two years at Intel Corporation, developing on-chip networks. His research addresses high performance on-chip and off-chip interconnection networks as well as distributed memory systems and remote GPU virtualization mechanisms. The different papers he has published so far provide an H-index impact factor equal to 23 according to Google Scholar. Currently, he is coordinating the rCUDA remote GPU virtualization project since it began in 2008. Additionally, he is also leading the development of other virtualization technologies. With respect to his teaching activity, he teaches Computer Networks as well as High Performance Interconnects courses at the Computer Engineering School of the Technical University of Valencia.

Slides

Will be available soon

CE Tweets