Bugfix stream-mask override, support old CUDA, and start Hopper support

Use a different callback to intercept the TMD/QMD later in the launch pipeline. Major improvements: - Fix bug with next mask not overriding stream mask on CUDA 11.0+ - Add CUDA 6.5-10.2 support for next- and global-granularity partitioning masks on x86_64 and aarch64 Jetson - Remove libdl dependency - Partially support TMD/QMD Version 4 (Hopper) Minor improvements: - Check for sufficient CUDA version before before attempting to apply a next-granularity partitioning mask - Only check for sufficient CUDA version on the first call to `libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`, rather than checking every time (lowers overheads) - Check that TMD version is sufficient before modifying it - Improve documentation Issues: - Partitioning mask bits have a different meaning in TMD/QMD Version 4 and require floorsweeping and remapping information to properly construct. This information will be forthcoming in future releases of libsmctrl and nvdebug.
author: Joshua Bakita <jbakita@cs.unc.edu> 2024-12-19 14:20:38 -0500
committer: Joshua Bakita <jbakita@cs.unc.edu> 2024-12-19 14:48:21 -0500
commit: d052c2df34ab41ba285f70965663e5a0832f6ac9 (patch)
tree: 0a761be3f62910275da8a2cad546a8902073b1e9 /Makefile
parent: aa63a02efa5fc8701f0c3418704bbbc2051c1042 (diff)
1 files changed, 1 insertions, 1 deletions
diff --git a/Makefile b/Makefile
index 0e9ee3a..0d9b9f6 100644
--- a/Makefile
+++ b/Makefile
@@ -3,7 +3,7 @@ CXX = g++
 NVCC ?= nvcc
 # -fPIC is needed in all cases, as we may be linked into another shared library
 CFLAGS = -fPIC
-LDFLAGS = -lcuda -I/usr/local/cuda/include -ldl
+LDFLAGS = -lcuda -I/usr/local/cuda/include
 .PHONY: clean tests
author	Joshua Bakita <jbakita@cs.unc.edu>	2024-12-19 14:20:38 -0500
committer	Joshua Bakita <jbakita@cs.unc.edu>	2024-12-19 14:48:21 -0500
commit	d052c2df34ab41ba285f70965663e5a0832f6ac9 (patch)
tree	0a761be3f62910275da8a2cad546a8902073b1e9 /Makefile
parent	aa63a02efa5fc8701f0c3418704bbbc2051c1042 (diff)

diff --git a/Makefile b/Makefile index 0e9ee3a..0d9b9f6 100644 --- a/Makefile +++ b/Makefile
@@ -3,7 +3,7 @@ CXX = g++
3	NVCC ?= nvcc	3	NVCC ?= nvcc
4	# -fPIC is needed in all cases, as we may be linked into another shared library	4	# -fPIC is needed in all cases, as we may be linked into another shared library
5	CFLAGS = -fPIC	5	CFLAGS = -fPIC
6	LDFLAGS = -lcuda -I/usr/local/cuda/include -ldl	6	LDFLAGS = -lcuda -I/usr/local/cuda/include
7		7
8	.PHONY: clean tests	8	.PHONY: clean tests
9		9