| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These updates are featured in the paper:
J. Bakita and J. H. Anderson, “Hardware Compute Partitioning on
NVIDIA GPUs for Composable Systems”, Proceedings of the 37th
Euromicro Conference on Real-Time Systems (ECRTS), to appear,
Jul 2025.
They:
1. Fix reported GPC to TPC mappings (requires nvdebug update).
2. Add support for a "supreme" mask, which overrides all others and
can be set on a per-process basis via an environment variable,
and optionally modified at runtime via the nvtaskset utility.
3. Add test for the supreme mask.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a different callback to intercept the TMD/QMD later in the
launch pipeline.
Major improvements:
- Fix bug with next mask not overriding stream mask on CUDA 11.0+
- Add CUDA 6.5-10.2 support for next- and global-granularity
partitioning masks on x86_64 and aarch64 Jetson
- Remove libdl dependency
- Partially support TMD/QMD Version 4 (Hopper)
Minor improvements:
- Check for sufficient CUDA version before before attempting to
apply a next-granularity partitioning mask
- Only check for sufficient CUDA version on the first call to
`libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`,
rather than checking every time (lowers overheads)
- Check that TMD version is sufficient before modifying it
- Improve documentation
Issues:
- Partitioning mask bits have a different meaning in TMD/QMD
Version 4 and require floorsweeping and remapping information to
properly construct. This information will be forthcoming in
future releases of libsmctrl and nvdebug.
|
| | |
|
| |
|
|
| |
Credit to Nordine Feddal for testing CUDA 12.4 on 550.544.14.
|
| |
|
|
|
|
|
|
|
| |
Previously did not delineate between aarch64 and x86_64 stream
offsets, causing incorrect offsets to be used in many circumstances.
This has now been fixed.
A new function, libsmctrl_set_stream_mask_ext() has also been added
which supports masking up to 128 TPCs (rather than just 64).
|
| |
|
|
|
| |
Necessary for declarations of included functions. Absence would
result in a compilation error for programs omitting this include.
|
| |
|
|
|
|
|
|
| |
This function was previously unreliable when using CUDA 9.0 on the
Jetson TX2.
Also update some version comments and remove `set_sm_mask()`---a
legacy partitioning function that's no longer used.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initially supports the GPU information functions via:
- pysmctrl.get_gpc_info(dev_id)
- pysmctrl.get_tpc_info(dev_id)
- pysmctrl.get_tpc_info_cuda(cuda_dev_id)
All functions are extensively documented. See pysmctrl/__init__.py
for details.
Device partitioning functions have yet to be mapped into Python, as
these will require more testing.
As part of this:
- libsmctrl_get_*_info() functions have been modified to consistently
return positive error codes.
- libsmctrl_get_tpc_info() now uses nvdebug-style device numbering and
uses libsmctrl_get_gpc_info() under the covers. This should be more
reliable.
- libsmctrl_get_tpc_info_cuda() has been introduced as an improved
version of the old libsmctrl_get_tpc_info() function. This continues
to use CUDA-style device numbering, but is now resiliant to CUDA
failures.
- Various minor style improvements in libsmctrl.c
|
| |
|
|
|
| |
This function would previously would yield invalid results for
GPUs with more than 31 TPCs.
|
|
|
- Tested working with cuda_scheduling_examiner
- Supports everything described in the accepted RTAS'23 paper
- Can be used as either a shared or staticly-linked library
- Documented in libsmctrl.h
|