libsmctrl.git - Library to enable intra-context SM/TPC partitioning on NVIDIA GPUs

	Commit message (Collapse)	Author	Age
*	Makefile improvements	Joshua Bakita	2025-03-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Add an "all" build target - Fix build if libcuda.so is not on linker search path - Do not assume that nvcc is available on $PATH - Allow specifying CFLAGS and LDFLAGS when running make - Allow passing non-standard CUDA build locations to make Suggested usage if CUDA is installed in a non-standard location, say, /playpen/jbakita/CUDA/cuda-archive/cuda-12.2: make CUDA=/playpen/jbakita/CUDA/cuda-archive/cuda-12.2
*	Remove unused variables from Makefile	Joshua Bakita	2024-12-19
\| \| \| \| \| \| \|	Make automatically provides CXX and CC, and these manual definitions were being ignored. Also fix a missing space in one of the messages from the tests.
*	Bugfix stream-mask override, support old CUDA, and start Hopper support	Joshua Bakita	2024-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a different callback to intercept the TMD/QMD later in the launch pipeline. Major improvements: - Fix bug with next mask not overriding stream mask on CUDA 11.0+ - Add CUDA 6.5-10.2 support for next- and global-granularity partitioning masks on x86_64 and aarch64 Jetson - Remove libdl dependency - Partially support TMD/QMD Version 4 (Hopper) Minor improvements: - Check for sufficient CUDA version before before attempting to apply a next-granularity partitioning mask - Only check for sufficient CUDA version on the first call to `libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`, rather than checking every time (lowers overheads) - Check that TMD version is sufficient before modifying it - Improve documentation Issues: - Partitioning mask bits have a different meaning in TMD/QMD Version 4 and require floorsweeping and remapping information to properly construct. This information will be forthcoming in future releases of libsmctrl and nvdebug.
*	Add build and use instructions to the README	Joshua Bakita	2024-02-19
\| \| \| \| \|	Also allow building with an alternate version of g++ for backwards compatibility.
*	Add test that higher-granularity masks override lower-granularity ones	Joshua Bakita	2024-02-14
\| \| \| \| \| \| \| \| \| \| \|	Stream-level masks should always override globally-set masks. Next-kernel masks should always override both stream-level masks and globally-set masks. Tests reveal an issue with the next-kernel mask not overriding the stream mask on CUDA 11.0+. CUDA appears to apply the per-stream mask to the QMD/TMD after `launchCallback()` is triggered, making it impossible to override as currently implemented.
*	Add a README and tests for stream masking and next masking	Joshua Bakita	2023-11-29
\| \| \| \|	Also rewrite the global masking test to be much more thorough.
*	Build on CUDA 11.8+; Adds libdl dependency	Joshua Bakita	2023-11-29
\| \| \| \| \| \|	nvcc links against a stub version of libcuda.so by default which is missing a required symbol starting around CUDA 11.8. Use libdl to resolve the symbol at runtime instead.
*	Add test for libsmctrl_set_global_mask()	Joshua Bakita	2023-10-17
\| \| \| \| \|	Also use static linking for tests, to avoid a need to set LD_LIBRARY_PATH to include the libsmctrl directory.
*	Initial reimplementation of libsmctrl as a library	Joshua Bakita	2023-03-02
	- Tested working with cuda_scheduling_examiner - Supports everything described in the accepted RTAS'23 paper - Can be used as either a shared or staticly-linked library - Documented in libsmctrl.h