Mixed-proxy extensions for the NVIDIA PTX memory consistency model

Lustig, Daniel, Cooksey, Simon, Giroux, Olivier (2022) Mixed-proxy extensions for the NVIDIA PTX memory consistency model. In: ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture. Association for Computing Machinery ISBN 978-1-4503-8610-4. (doi:10.1145/3470496.3533045) (KAR id:96841)

PDF Publisher pdf Language: English
Download this file (PDF/504kB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: https://doi.org/10.1145/3470496.3533045

Abstract

In recent years, there has been a trend towards the use of accelerators and architectural specialization to continue scaling performance in spite of a slowing of Moore's Law. GPUs have always relied on dedicated hardware for graphics workloads, but modern GPUs now also incorporate compute-domain accelerators such as NVIDIA's Tensor Cores for machine learning. For these accelerators to be successfully integrated into a general-purpose programming language such as C++ or CUDA, there must be a forward- and backward-compatible API for the functionality they provide. To the extent that all of these accelerators interact with program threads through memory, they should be incorporated into the GPU's memory consistency model. Unfortunately, the use of accelerators and/or special non-coherent paths into memory produces non-standard memory behavior that existing GPU memory models cannot capture.

In this work, we describe the "proxy" extensions added to version 7.5 of NVIDIA's PTX ISA for GPUs. A proxy is an extra tag abstractly applied to every memory or fence operation. Proxies generalize the notion of address translation and specialized non-coherent cache hierarchies into an abstraction that cleanly describes the resulting non-standard behavior. The goal of proxies is to facilitate integration of these specialized memory accesses into the general-purpose PTX programming model in a fully composable manner. This paper characterizes the behaviors that proxies can capture, the microarchitectural intuition behind them, the necessary updates to the formal memory model, and the tooling that we built in order to ensure that the resulting model both is sound and meets the needs of business-critical workloads that they are designed to support.

Item Type:	Conference proceeding
DOI/Identification number:	10.1145/3470496.3533045
Subjects:	Q Science > QA Mathematics (inc Computing science) > QA 75 Electronic computers. Computer science Q Science > QA Mathematics (inc Computing science) > QA 76 Software, computer programming,
Institutional Unit:	Schools > School of Computing
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Computing
Funders:	University of Kent (https://ror.org/00xkeyj56)
Depositing User:	Simon Cooksey
Date Deposited:	09 Sep 2022 10:26 UTC
Last Modified:	29 Apr 2026 08:57 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/96841 (The current URI for this page, for reference purposes)

University of Kent Author Information

Cooksey, Simon.

Creator's ORCID:
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.