libsmctrl.git, branch ecrts25-ae

Support installation on Linux4Tegra

2025-07-03T17:49:18+00:00

- Use alternate path if nvidia-cuda-mps-control is not found.
- Work-around missing CUDA_MPS_PIPE_DIRECTORY default in L4T.
- Install to L4T-specific paths if they exist.

Note that CUDA_MPS_PIPE_DIRECTORY must be manually set by the user
on L4T systems before they can call nvidia-cuda-mps-control, or to
have to have non-nvtaskset-launched applications run with MPS.

Tested on Jetson AGX Orin with L4T 36.3.0 and cuda-compat-12-6.

Speed up nvtaskset by skipping CUDA context creation if possible

2025-06-26T00:43:38+00:00

The GPU needs to be on before the GPC-to-TPC mapping registers can
be read. The easiest way to power on the GPU is to create a CUDA
context, but this is fairly expensive.

In nvtaskset, some GPU-using task will likely already be running,
and so we can skip CUDA context creation in the common case.

Bug fixes:
- Delete the temporary CUDA context created by nvtaskset after
  it is done with it. Fixes bug where nvtaskset would leak this
  context into any program it launches.

Additional changes:
- Style fixes in libsmctrl.c
- Remove superfluous newlines from error() calls in nvtaskset.c

Remove dependency on patchelf and use /usr/lib instead of /lib for install

2025-06-17T20:34:24+00:00

patchelf seems unreliable (--add-needed was not working in version
0.9), and it can be replaced with a simple sed command.

While on the original test system, /lib was a symlink to /usr/lib,
this does not seem to generally be the case, and CUDA is installed
in /usr/lib.

Rewrite nvtaskset and implementation of partitioning for unmodified tasks

2025-06-17T18:01:49+00:00

Rather than requiring libsmctrl.so to be preloaded, we now wrap
libcuda.so.1. All CUDA-using applications will load libcuda.so.1,
ensuring that our wrapper will always be dynamically loaded, no
matter if LD_PRELOAD is enabled, or if a program has been staticly
linked. All that needs to be done is that the location of our
"fake" libcuda.so.1 need to be put within the loader search path.
This can be done by setting LD_LIBRARY_PATH, or by installing
our wrapper into /lib/x86_64-linux-gnu.

The mask can still be set via the LIBSMCTRL_MASK environment
variable, but the easier-to-use nvtaskset tool is now the
recommended way to view or change the supreme TPC mask for any
CUDA-using application. This allows launching a program on the
first two GPCs via a command as simple as:
    ./nvtaskset -g 0-1 ./a_program a_program_args
(Note that use of the -g option requires the nvdebug kernel module
to first be loaded.)

These changes support the final version of the ECRTS'25 paper.

Note that nvtaskset does not yet fully support multi-GPU systems.

Bugfixes:
- Fix crash that would occur if both libsmctrl.so and libsmctrl.a
  were built into an application.
- Correctly use GPU ID when initializing a context in
  `libsmctrl_test_gpc_info`.
- Include `nvtaskset` as a prerequisite for
  `libsmctrl_test_supreme_mask`.
- Fix malfunction of `libsmctrl_test_gpc_info` if
  CUDA_VISIBLE_DEVICES is set.

Other minor changes:
- Adds make target to run all the tests.
- Fixes typos in comments.
- Enables -Wall build option.
- Upgrades supreme mask from 64 to 128 bits.
- Removes `detect_parker_soc()` from the global namespace.
- Adjusts test messages to be more succinct.
- Updates README with overview of how to partition unmodified
  applications, more details on the tests, and information on the
  new ECRTS'25 paper.

Correct how path of libsmctrl.so is specified in LD_PRELOAD by nvtaskset

2025-05-20T15:01:09+00:00

Indicate that the loader should search the standard library paths
by ommiting the "./" prefix.

Major update for ECRTS'25: fix TPC to GPU mapping and add a "supreme" mask

2025-05-09T10:03:11+00:00

These updates are featured in the paper:

J. Bakita and J. H. Anderson, “Hardware Compute Partitioning on
NVIDIA GPUs for Composable Systems”, Proceedings of the 37th
Euromicro Conference on Real-Time Systems (ECRTS), to appear,
Jul 2025.

They:
1. Fix reported GPC to TPC mappings (requires nvdebug update).
2. Add support for a "supreme" mask, which overrides all others and
   can be set on a per-process basis via an environment variable,
   and optionally modified at runtime via the nvtaskset utility.
3. Add test for the supreme mask.

De-duplicate CUDA version checks and omit when building with CUDA > 6.5

2025-05-05T07:13:30+00:00

Code built with CUDA > 6.5 cannot run on CUDA 6.5 or older, so the
check added unecessary overhead.

Tested on CUDA 6.5 and CUDA 10.2 to generate the correct code, and
global and next tested to work on GTX 1060 3 GB with either build
while using CUDA 10.2 at runtime.

Support stream masking on CUDA 12.7 (x86) and 12.8 (x86)

2025-04-07T20:45:50+00:00

Makefile improvements

2025-03-20T20:46:21+00:00

- Add an "all" build target
- Fix build if libcuda.so is not on linker search path
- Do not assume that nvcc is available on $PATH
- Allow specifying CFLAGS and LDFLAGS when running make
- Allow passing non-standard CUDA build locations to make

Suggested usage if CUDA is installed in a non-standard location,
say, /playpen/jbakita/CUDA/cuda-archive/cuda-12.2:

make CUDA=/playpen/jbakita/CUDA/cuda-archive/cuda-12.2

Correctly pass arguments to Python's ctypes find_library() function

2024-12-23T07:15:31+00:00

The prefix "lib" should not be included, per the documentation.

This was not caught in local tests as the fallback was always used
locally.

Patch courtesy of Guanbin Xu .