libsmctrl.git - Library to enable intra-context SM/TPC partitioning on NVIDIA GPUs

	Commit message (Collapse)	Author	Age
*	Major update for ECRTS'25: fix TPC to GPU mapping and add a "supreme" mask	Joshua Bakita	2025-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These updates are featured in the paper: J. Bakita and J. H. Anderson, “Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems”, Proceedings of the 37th Euromicro Conference on Real-Time Systems (ECRTS), to appear, Jul 2025. They: 1. Fix reported GPC to TPC mappings (requires nvdebug update). 2. Add support for a "supreme" mask, which overrides all others and can be set on a per-process basis via an environment variable, and optionally modified at runtime via the nvtaskset utility. 3. Add test for the supreme mask.
*	Bugfix stream-mask override, support old CUDA, and start Hopper support	Joshua Bakita	2024-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a different callback to intercept the TMD/QMD later in the launch pipeline. Major improvements: - Fix bug with next mask not overriding stream mask on CUDA 11.0+ - Add CUDA 6.5-10.2 support for next- and global-granularity partitioning masks on x86_64 and aarch64 Jetson - Remove libdl dependency - Partially support TMD/QMD Version 4 (Hopper) Minor improvements: - Check for sufficient CUDA version before before attempting to apply a next-granularity partitioning mask - Only check for sufficient CUDA version on the first call to `libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`, rather than checking every time (lowers overheads) - Check that TMD version is sufficient before modifying it - Improve documentation Issues: - Partitioning mask bits have a different meaning in TMD/QMD Version 4 and require floorsweeping and remapping information to properly construct. This information will be forthcoming in future releases of libsmctrl and nvdebug.
*	Support stream masking on CUDA 12.3 (x86) and 12.5 (x86)	Joshua Bakita	2024-11-26
\|
*	Support stream masking on CUDA 12.4 (x86) and 12.6 (x86, aarch64)	Joshua Bakita	2024-11-26
\| \| \| \|	Credit to Nordine Feddal for testing CUDA 12.4 on 550.544.14.
*	Fix stream masking on many platforms and support >64-bit stream masks	Joshua Bakita	2023-11-29
\| \| \| \| \| \| \| \| \|	Previously did not delineate between aarch64 and x86_64 stream offsets, causing incorrect offsets to be used in many circumstances. This has now been fixed. A new function, libsmctrl_set_stream_mask_ext() has also been added which supports masking up to 128 TPCs (rather than just 64).
*	Include stdint.h in libsmctrl.h	Joshua Bakita	2023-10-16
\| \| \| \| \|	Necessary for declarations of included functions. Absence would result in a compilation error for programs omitting this include.
*	Fix libsmctrl_set_stream_mask() on the TX2 with CUDA 9.0 + cleanup	Joshua Bakita	2023-10-16
\| \| \| \| \| \| \| \|	This function was previously unreliable when using CUDA 9.0 on the Jetson TX2. Also update some version comments and remove `set_sm_mask()`---a legacy partitioning function that's no longer used.
*	Introduce pysmctrl: A python interface to libsmctrl	Joshua Bakita	2023-03-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Initially supports the GPU information functions via: - pysmctrl.get_gpc_info(dev_id) - pysmctrl.get_tpc_info(dev_id) - pysmctrl.get_tpc_info_cuda(cuda_dev_id) All functions are extensively documented. See pysmctrl/__init__.py for details. Device partitioning functions have yet to be mapped into Python, as these will require more testing. As part of this: - libsmctrl_get_*_info() functions have been modified to consistently return positive error codes. - libsmctrl_get_tpc_info() now uses nvdebug-style device numbering and uses libsmctrl_get_gpc_info() under the covers. This should be more reliable. - libsmctrl_get_tpc_info_cuda() has been introduced as an improved version of the old libsmctrl_get_tpc_info() function. This continues to use CUDA-style device numbering, but is now resiliant to CUDA failures. - Various minor style improvements in libsmctrl.c
*	Correct a sign-extension issue in libsmctrl_get_gpc_info()	Joshua Bakita	2023-03-15
\| \| \| \| \|	This function would previously would yield invalid results for GPUs with more than 31 TPCs.
*	Initial reimplementation of libsmctrl as a library	Joshua Bakita	2023-03-02
	- Tested working with cuda_scheduling_examiner - Supports everything described in the accepted RTAS'23 paper - Can be used as either a shared or staticly-linked library - Documented in libsmctrl.h