diff options
| author | Joshua Bakita <jbakita@cs.unc.edu> | 2024-12-19 14:20:38 -0500 |
|---|---|---|
| committer | Joshua Bakita <jbakita@cs.unc.edu> | 2024-12-19 14:48:21 -0500 |
| commit | d052c2df34ab41ba285f70965663e5a0832f6ac9 (patch) | |
| tree | 0a761be3f62910275da8a2cad546a8902073b1e9 /Makefile | |
| parent | aa63a02efa5fc8701f0c3418704bbbc2051c1042 (diff) | |
Bugfix stream-mask override, support old CUDA, and start Hopper support
Use a different callback to intercept the TMD/QMD later in the
launch pipeline.
Major improvements:
- Fix bug with next mask not overriding stream mask on CUDA 11.0+
- Add CUDA 6.5-10.2 support for next- and global-granularity
partitioning masks on x86_64 and aarch64 Jetson
- Remove libdl dependency
- Partially support TMD/QMD Version 4 (Hopper)
Minor improvements:
- Check for sufficient CUDA version before before attempting to
apply a next-granularity partitioning mask
- Only check for sufficient CUDA version on the first call to
`libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`,
rather than checking every time (lowers overheads)
- Check that TMD version is sufficient before modifying it
- Improve documentation
Issues:
- Partitioning mask bits have a different meaning in TMD/QMD
Version 4 and require floorsweeping and remapping information to
properly construct. This information will be forthcoming in
future releases of libsmctrl and nvdebug.
Diffstat (limited to 'Makefile')
| -rw-r--r-- | Makefile | 2 |
1 files changed, 1 insertions, 1 deletions
| @@ -3,7 +3,7 @@ CXX = g++ | |||
| 3 | NVCC ?= nvcc | 3 | NVCC ?= nvcc |
| 4 | # -fPIC is needed in all cases, as we may be linked into another shared library | 4 | # -fPIC is needed in all cases, as we may be linked into another shared library |
| 5 | CFLAGS = -fPIC | 5 | CFLAGS = -fPIC |
| 6 | LDFLAGS = -lcuda -I/usr/local/cuda/include -ldl | 6 | LDFLAGS = -lcuda -I/usr/local/cuda/include |
| 7 | 7 | ||
| 8 | .PHONY: clean tests | 8 | .PHONY: clean tests |
| 9 | 9 | ||
