libsmctrl.git/Makefile, branch ecrts25-ae

Support installation on Linux4Tegra

2025-07-03T17:49:18+00:00

- Use alternate path if nvidia-cuda-mps-control is not found.
- Work-around missing CUDA_MPS_PIPE_DIRECTORY default in L4T.
- Install to L4T-specific paths if they exist.

Note that CUDA_MPS_PIPE_DIRECTORY must be manually set by the user
on L4T systems before they can call nvidia-cuda-mps-control, or to
have to have non-nvtaskset-launched applications run with MPS.

Tested on Jetson AGX Orin with L4T 36.3.0 and cuda-compat-12-6.

Remove dependency on patchelf and use /usr/lib instead of /lib for install

2025-06-17T20:34:24+00:00

patchelf seems unreliable (--add-needed was not working in version
0.9), and it can be replaced with a simple sed command.

While on the original test system, /lib was a symlink to /usr/lib,
this does not seem to generally be the case, and CUDA is installed
in /usr/lib.

Rewrite nvtaskset and implementation of partitioning for unmodified tasks

2025-06-17T18:01:49+00:00

Rather than requiring libsmctrl.so to be preloaded, we now wrap
libcuda.so.1. All CUDA-using applications will load libcuda.so.1,
ensuring that our wrapper will always be dynamically loaded, no
matter if LD_PRELOAD is enabled, or if a program has been staticly
linked. All that needs to be done is that the location of our
"fake" libcuda.so.1 need to be put within the loader search path.
This can be done by setting LD_LIBRARY_PATH, or by installing
our wrapper into /lib/x86_64-linux-gnu.

The mask can still be set via the LIBSMCTRL_MASK environment
variable, but the easier-to-use nvtaskset tool is now the
recommended way to view or change the supreme TPC mask for any
CUDA-using application. This allows launching a program on the
first two GPCs via a command as simple as:
    ./nvtaskset -g 0-1 ./a_program a_program_args
(Note that use of the -g option requires the nvdebug kernel module
to first be loaded.)

These changes support the final version of the ECRTS'25 paper.

Note that nvtaskset does not yet fully support multi-GPU systems.

Bugfixes:
- Fix crash that would occur if both libsmctrl.so and libsmctrl.a
  were built into an application.
- Correctly use GPU ID when initializing a context in
  `libsmctrl_test_gpc_info`.
- Include `nvtaskset` as a prerequisite for
  `libsmctrl_test_supreme_mask`.
- Fix malfunction of `libsmctrl_test_gpc_info` if
  CUDA_VISIBLE_DEVICES is set.

Other minor changes:
- Adds make target to run all the tests.
- Fixes typos in comments.
- Enables -Wall build option.
- Upgrades supreme mask from 64 to 128 bits.
- Removes `detect_parker_soc()` from the global namespace.
- Adjusts test messages to be more succinct.
- Updates README with overview of how to partition unmodified
  applications, more details on the tests, and information on the
  new ECRTS'25 paper.

Major update for ECRTS'25: fix TPC to GPU mapping and add a "supreme" mask

2025-05-09T10:03:11+00:00

These updates are featured in the paper:

J. Bakita and J. H. Anderson, “Hardware Compute Partitioning on
NVIDIA GPUs for Composable Systems”, Proceedings of the 37th
Euromicro Conference on Real-Time Systems (ECRTS), to appear,
Jul 2025.

They:
1. Fix reported GPC to TPC mappings (requires nvdebug update).
2. Add support for a "supreme" mask, which overrides all others and
   can be set on a per-process basis via an environment variable,
   and optionally modified at runtime via the nvtaskset utility.
3. Add test for the supreme mask.

Makefile improvements

2025-03-20T20:46:21+00:00

- Add an "all" build target
- Fix build if libcuda.so is not on linker search path
- Do not assume that nvcc is available on $PATH
- Allow specifying CFLAGS and LDFLAGS when running make
- Allow passing non-standard CUDA build locations to make

Suggested usage if CUDA is installed in a non-standard location,
say, /playpen/jbakita/CUDA/cuda-archive/cuda-12.2:

make CUDA=/playpen/jbakita/CUDA/cuda-archive/cuda-12.2

Remove unused variables from Makefile

2024-12-19T20:04:26+00:00

Make automatically provides CXX and CC, and these manual
definitions were being ignored.

Also fix a missing space in one of the messages from the tests.

Bugfix stream-mask override, support old CUDA, and start Hopper support

2024-12-19T19:48:21+00:00

Use a different callback to intercept the TMD/QMD later in the
launch pipeline.

Major improvements:
- Fix bug with next mask not overriding stream mask on CUDA 11.0+
- Add CUDA 6.5-10.2 support for next- and global-granularity
  partitioning masks on x86_64 and aarch64 Jetson
- Remove libdl dependency
- Partially support TMD/QMD Version 4 (Hopper)

Minor improvements:
- Check for sufficient CUDA version before before attempting to
  apply a next-granularity partitioning mask
- Only check for sufficient CUDA version on the first call to
  `libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`,
  rather than checking every time (lowers overheads)
- Check that TMD version is sufficient before modifying it
- Improve documentation

Issues:
- Partitioning mask bits have a different meaning in TMD/QMD
  Version 4 and require floorsweeping and remapping information to
  properly construct. This information will be forthcoming in
  future releases of libsmctrl and nvdebug.

Add build and use instructions to the README

2024-02-19T20:37:46+00:00

Also allow building with an alternate version of g++ for backwards
compatibility.

Add test that higher-granularity masks override lower-granularity ones

2024-02-14T20:36:25+00:00

Stream-level masks should always override globally-set masks.
Next-kernel masks should always override both stream-level masks
and globally-set masks.

Tests reveal an issue with the next-kernel mask not overriding the
stream mask on CUDA 11.0+. CUDA appears to apply the per-stream
mask to the QMD/TMD after `launchCallback()` is triggered, making
it impossible to override as currently implemented.

Add a README and tests for stream masking and next masking

2023-11-29T23:24:25+00:00

Also rewrite the global masking test to be much more thorough.