libsmctrl.git, branch master

De-duplicate CUDA version checks and omit when building with CUDA > 6.5

2025-05-05T07:13:30+00:00

Code built with CUDA > 6.5 cannot run on CUDA 6.5 or older, so the
check added unecessary overhead.

Tested on CUDA 6.5 and CUDA 10.2 to generate the correct code, and
global and next tested to work on GTX 1060 3 GB with either build
while using CUDA 10.2 at runtime.

Support stream masking on CUDA 12.7 (x86) and 12.8 (x86)

2025-04-07T20:45:50+00:00

Makefile improvements

2025-03-20T20:46:21+00:00

- Add an "all" build target
- Fix build if libcuda.so is not on linker search path
- Do not assume that nvcc is available on $PATH
- Allow specifying CFLAGS and LDFLAGS when running make
- Allow passing non-standard CUDA build locations to make

Suggested usage if CUDA is installed in a non-standard location,
say, /playpen/jbakita/CUDA/cuda-archive/cuda-12.2:

make CUDA=/playpen/jbakita/CUDA/cuda-archive/cuda-12.2

Correctly pass arguments to Python's ctypes find_library() function

2024-12-23T07:15:31+00:00

The prefix "lib" should not be included, per the documentation.

This was not caught in local tests as the fallback was always used
locally.

Patch courtesy of Guanbin Xu .

CRITICAL: Remove stray brackets breaking build

2024-12-23T07:12:48+00:00

Remove unused variables from Makefile

2024-12-19T20:04:26+00:00

Make automatically provides CXX and CC, and these manual
definitions were being ignored.

Also fix a missing space in one of the messages from the tests.

Bugfix stream-mask override, support old CUDA, and start Hopper support

2024-12-19T19:48:21+00:00

Use a different callback to intercept the TMD/QMD later in the
launch pipeline.

Major improvements:
- Fix bug with next mask not overriding stream mask on CUDA 11.0+
- Add CUDA 6.5-10.2 support for next- and global-granularity
  partitioning masks on x86_64 and aarch64 Jetson
- Remove libdl dependency
- Partially support TMD/QMD Version 4 (Hopper)

Minor improvements:
- Check for sufficient CUDA version before before attempting to
  apply a next-granularity partitioning mask
- Only check for sufficient CUDA version on the first call to
  `libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`,
  rather than checking every time (lowers overheads)
- Check that TMD version is sufficient before modifying it
- Improve documentation

Issues:
- Partitioning mask bits have a different meaning in TMD/QMD
  Version 4 and require floorsweeping and remapping information to
  properly construct. This information will be forthcoming in
  future releases of libsmctrl and nvdebug.

Support CUDA 12.2, 12.5, and 12.6 on Jetson aarch64

2024-12-19T18:36:40+00:00

Also test and note that stream masking on CUDA 6.5 seems impossible.

Check for read() errors on procfs files during libsmctrl_get_gpc_info()

2024-12-19T18:32:30+00:00

Also update a comment

Fix a potential bug with stream masking on CUDA 12.6 on aarch64 Jetson

2024-12-18T20:49:48+00:00

Commit 3f9bda39 made an error by using the pre-CUDA-12 mask
structure layout on CUDA 12.6 on aarch64 Jetson. Switch to the
CUDA 12+ layout (as used on x86_64).

Tests work either way on the Jetson Orin, so this change is not
strictly required, but seems advisable to support potenital large
(PCIe-attached?) GPUs on Jetson/DRIVE platforms.