Compiler-Managed Replication of Cuda Kernels for Reliable Execution of Gpgpu Applications

Kaya,E.; Öz,I.

Please use this identifier to cite or link to this item: https://hdl.handle.net/11147/14403

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kaya,E.	-
dc.contributor.author	Öz,I.	-
dc.date.accessioned	2024-05-05T14:59:31Z	-
dc.date.available	2024-05-05T14:59:31Z	-
dc.date.issued	2024	-
dc.identifier.issn	0218-1266	-
dc.identifier.uri	https://doi.org/10.1142/S0218126624502542	-
dc.description	Kaya, Ercument/0000-0001-5073-8159; Oz, Isil/0000-0002-8310-1143	en_US
dc.description.abstract	As Graphics Processing Units (GPUs) evolve for general-purpose computations besides inherently fault-tolerant graphics programs, soft error reliability becomes a first-class citizen in program design. Especially, safety-critical systems utilizing GPU devices need to employ fault-tolerance techniques to recover from errors in hardware components. While software-level redundancy approaches, based on the replication of the application code, offer high reliability for safe program execution, it is essential to perform redundancy by utilizing parallel execution units in the target architecture not to hurt performance with redundant computations. In this work, we propose redundancy approaches using the parallel GPU cores and implement a compiler-level redundancy framework that enables the programmer to configure the target GPGPU program for redundant execution. We run redundant executions for GPGPU programs from the PolyBench benchmark suite by applying our kernel-level redundancy approaches and evaluate their performance by considering the parallelism level of the programs. Our results reveal that redundancy approaches utilizing parallelism offered by GPU cores yield higher performance for redundant executions, while the programs that already make use of parallel GPU cores in their original form suffer from overhead caused by contention among redundant threads. © World Scientific Publishing Company.	en_US
dc.description.sponsorship	Scientific and Technological Research Council of Turkey (TUBITAK) [119E011]	en_US
dc.description.sponsorship	This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK), Grant No: 119E011. We thank Martin Ruefenacht for his valuable comments on the paper.	en_US
dc.language.iso	en	en_US
dc.publisher	World Scientific	en_US
dc.relation.ispartof	Journal of Circuits, Systems and Computers	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	compiler support	en_US
dc.subject	GPU computing	en_US
dc.subject	redundancy	en_US
dc.subject	soft errors	en_US
dc.title	Compiler-Managed Replication of Cuda Kernels for Reliable Execution of Gpgpu Applications	en_US
dc.type	Article	en_US
dc.authorid	Kaya, Ercument/0000-0001-5073-8159	-
dc.authorid	Oz, Isil/0000-0002-8310-1143	-
dc.department	Izmir Institute of Technology	en_US
dc.identifier.volume	33	en_US
dc.identifier.issue	14	en_US
dc.identifier.wos	WOS:001205493700001	-
dc.identifier.scopus	2-s2.0-85190833876	-
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.identifier.doi	10.1142/S0218126624502542	-
dc.authorscopusid	57727235800	-
dc.authorscopusid	37097877800	-
dc.identifier.wosquality	Q4	-
dc.identifier.scopusquality	Q3	-
dc.description.woscitationindex	Science Citation Index Expanded	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.languageiso639-1	en	-
item.openairetype	Article	-
item.grantfulltext	none	-
item.fulltext	No Fulltext	-
item.cerifentitytype	Publications	-
Appears in Collections:	Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show simple item record

CORE Recommender

Page view(s)

174

checked on Mar 31, 2025

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Altmetric

Google Scholar^TM