aboutsummaryrefslogtreecommitdiffstats
path: root/libsmctrl.h
Commit message (Collapse)AuthorAge
* Major update for ECRTS'25: fix TPC to GPU mapping and add a "supreme" maskJoshua Bakita2025-05-09
| | | | | | | | | | | | | | | | These updates are featured in the paper: J. Bakita and J. H. Anderson, “Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems”, Proceedings of the 37th Euromicro Conference on Real-Time Systems (ECRTS), to appear, Jul 2025. They: 1. Fix reported GPC to TPC mappings (requires nvdebug update). 2. Add support for a "supreme" mask, which overrides all others and can be set on a per-process basis via an environment variable, and optionally modified at runtime via the nvtaskset utility. 3. Add test for the supreme mask.
* Bugfix stream-mask override, support old CUDA, and start Hopper supportJoshua Bakita2024-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use a different callback to intercept the TMD/QMD later in the launch pipeline. Major improvements: - Fix bug with next mask not overriding stream mask on CUDA 11.0+ - Add CUDA 6.5-10.2 support for next- and global-granularity partitioning masks on x86_64 and aarch64 Jetson - Remove libdl dependency - Partially support TMD/QMD Version 4 (Hopper) Minor improvements: - Check for sufficient CUDA version before before attempting to apply a next-granularity partitioning mask - Only check for sufficient CUDA version on the first call to `libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`, rather than checking every time (lowers overheads) - Check that TMD version is sufficient before modifying it - Improve documentation Issues: - Partitioning mask bits have a different meaning in TMD/QMD Version 4 and require floorsweeping and remapping information to properly construct. This information will be forthcoming in future releases of libsmctrl and nvdebug.
* Support stream masking on CUDA 12.3 (x86) and 12.5 (x86)Joshua Bakita2024-11-26
|
* Support stream masking on CUDA 12.4 (x86) and 12.6 (x86, aarch64)Joshua Bakita2024-11-26
| | | | Credit to Nordine Feddal for testing CUDA 12.4 on 550.544.14.
* Fix stream masking on many platforms and support >64-bit stream masksJoshua Bakita2023-11-29
| | | | | | | | | Previously did not delineate between aarch64 and x86_64 stream offsets, causing incorrect offsets to be used in many circumstances. This has now been fixed. A new function, libsmctrl_set_stream_mask_ext() has also been added which supports masking up to 128 TPCs (rather than just 64).
* Include stdint.h in libsmctrl.hJoshua Bakita2023-10-16
| | | | | Necessary for declarations of included functions. Absence would result in a compilation error for programs omitting this include.
* Fix libsmctrl_set_stream_mask() on the TX2 with CUDA 9.0 + cleanupJoshua Bakita2023-10-16
| | | | | | | | This function was previously unreliable when using CUDA 9.0 on the Jetson TX2. Also update some version comments and remove `set_sm_mask()`---a legacy partitioning function that's no longer used.
* Introduce pysmctrl: A python interface to libsmctrlJoshua Bakita2023-03-16
| | | | | | | | | | | | | | | | | | | | | | | | Initially supports the GPU information functions via: - pysmctrl.get_gpc_info(dev_id) - pysmctrl.get_tpc_info(dev_id) - pysmctrl.get_tpc_info_cuda(cuda_dev_id) All functions are extensively documented. See pysmctrl/__init__.py for details. Device partitioning functions have yet to be mapped into Python, as these will require more testing. As part of this: - libsmctrl_get_*_info() functions have been modified to consistently return positive error codes. - libsmctrl_get_tpc_info() now uses nvdebug-style device numbering and uses libsmctrl_get_gpc_info() under the covers. This should be more reliable. - libsmctrl_get_tpc_info_cuda() has been introduced as an improved version of the old libsmctrl_get_tpc_info() function. This continues to use CUDA-style device numbering, but is now resiliant to CUDA failures. - Various minor style improvements in libsmctrl.c
* Correct a sign-extension issue in libsmctrl_get_gpc_info()Joshua Bakita2023-03-15
| | | | | This function would previously would yield invalid results for GPUs with more than 31 TPCs.
* Initial reimplementation of libsmctrl as a libraryJoshua Bakita2023-03-02
- Tested working with cuda_scheduling_examiner - Supports everything described in the accepted RTAS'23 paper - Can be used as either a shared or staticly-linked library - Documented in libsmctrl.h