| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
Code built with CUDA > 6.5 cannot run on CUDA 6.5 or older, so the
check added unecessary overhead.
Tested on CUDA 6.5 and CUDA 10.2 to generate the correct code, and
global and next tested to work on GTX 1060 3 GB with either build
while using CUDA 10.2 at runtime.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add an "all" build target
- Fix build if libcuda.so is not on linker search path
- Do not assume that nvcc is available on $PATH
- Allow specifying CFLAGS and LDFLAGS when running make
- Allow passing non-standard CUDA build locations to make
Suggested usage if CUDA is installed in a non-standard location,
say, /playpen/jbakita/CUDA/cuda-archive/cuda-12.2:
make CUDA=/playpen/jbakita/CUDA/cuda-archive/cuda-12.2
|
|
|
|
|
|
|
|
|
| |
The prefix "lib" should not be included, per the documentation.
This was not caught in local tests as the fallback was always used
locally.
Patch courtesy of Guanbin Xu <xugb@mail.ustc.edu.cn>.
|
| |
|
|
|
|
|
|
|
| |
Make automatically provides CXX and CC, and these manual
definitions were being ignored.
Also fix a missing space in one of the messages from the tests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a different callback to intercept the TMD/QMD later in the
launch pipeline.
Major improvements:
- Fix bug with next mask not overriding stream mask on CUDA 11.0+
- Add CUDA 6.5-10.2 support for next- and global-granularity
partitioning masks on x86_64 and aarch64 Jetson
- Remove libdl dependency
- Partially support TMD/QMD Version 4 (Hopper)
Minor improvements:
- Check for sufficient CUDA version before before attempting to
apply a next-granularity partitioning mask
- Only check for sufficient CUDA version on the first call to
`libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`,
rather than checking every time (lowers overheads)
- Check that TMD version is sufficient before modifying it
- Improve documentation
Issues:
- Partitioning mask bits have a different meaning in TMD/QMD
Version 4 and require floorsweeping and remapping information to
properly construct. This information will be forthcoming in
future releases of libsmctrl and nvdebug.
|
|
|
|
| |
Also test and note that stream masking on CUDA 6.5 seems impossible.
|
|
|
|
| |
Also update a comment
|
|
|
|
|
|
|
|
|
|
| |
Commit 3f9bda39 made an error by using the pre-CUDA-12 mask
structure layout on CUDA 12.6 on aarch64 Jetson. Switch to the
CUDA 12+ layout (as used on x86_64).
Tests work either way on the Jetson Orin, so this change is not
strictly required, but seems advisable to support potenital large
(PCIe-attached?) GPUs on Jetson/DRIVE platforms.
|
| |
|
|
|
|
| |
Credit to Nordine Feddal for testing CUDA 12.4 on 550.544.14.
|
| |
|
|
|
|
|
| |
Also allow building with an alternate version of g++ for backwards
compatibility.
|
|
|
|
|
|
|
|
|
|
|
| |
Stream-level masks should always override globally-set masks.
Next-kernel masks should always override both stream-level masks
and globally-set masks.
Tests reveal an issue with the next-kernel mask not overriding the
stream mask on CUDA 11.0+. CUDA appears to apply the per-stream
mask to the QMD/TMD after `launchCallback()` is triggered, making
it impossible to override as currently implemented.
|
| |
|
|
|
|
| |
Also rewrite the global masking test to be much more thorough.
|
|
|
|
|
|
|
|
|
| |
Previously did not delineate between aarch64 and x86_64 stream
offsets, causing incorrect offsets to be used in many circumstances.
This has now been fixed.
A new function, libsmctrl_set_stream_mask_ext() has also been added
which supports masking up to 128 TPCs (rather than just 64).
|
|
|
|
|
|
| |
nvcc links against a stub version of libcuda.so by default which is
missing a required symbol starting around CUDA 11.8. Use libdl to
resolve the symbol at runtime instead.
|
|
|
|
|
| |
Also use static linking for tests, to avoid a need to set
LD_LIBRARY_PATH to include the libsmctrl directory.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also improve documentation and abort with an error message if
attempting to set a global SM mask on an unsupported CUDA version.
(Would crash/corrupt state before.)
Also uncomment a line which errantly disabled global masking on
CUDA 10.2 on aarch64.
Tested with CUDA 10.2 on:
- x86_64 (GTX 1060 3GB, driver 440.100, jbakita-old.cs.unc.edu)
- aarch64 (Jetson TX2, driver r32.5, grizzly.cs.unc.edu)
|
|
|
|
|
| |
Necessary for declarations of included functions. Absence would
result in a compilation error for programs omitting this include.
|
|
|
|
|
|
|
|
| |
This function was previously unreliable when using CUDA 9.0 on the
Jetson TX2.
Also update some version comments and remove `set_sm_mask()`---a
legacy partitioning function that's no longer used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initially supports the GPU information functions via:
- pysmctrl.get_gpc_info(dev_id)
- pysmctrl.get_tpc_info(dev_id)
- pysmctrl.get_tpc_info_cuda(cuda_dev_id)
All functions are extensively documented. See pysmctrl/__init__.py
for details.
Device partitioning functions have yet to be mapped into Python, as
these will require more testing.
As part of this:
- libsmctrl_get_*_info() functions have been modified to consistently
return positive error codes.
- libsmctrl_get_tpc_info() now uses nvdebug-style device numbering and
uses libsmctrl_get_gpc_info() under the covers. This should be more
reliable.
- libsmctrl_get_tpc_info_cuda() has been introduced as an improved
version of the old libsmctrl_get_tpc_info() function. This continues
to use CUDA-style device numbering, but is now resiliant to CUDA
failures.
- Various minor style improvements in libsmctrl.c
|
|
|
|
|
| |
This function would previously would yield invalid results for
GPUs with more than 31 TPCs.
|
|
- Tested working with cuda_scheduling_examiner
- Supports everything described in the accepted RTAS'23 paper
- Can be used as either a shared or staticly-linked library
- Documented in libsmctrl.h
|