<feed xmlns='http://www.w3.org/2005/Atom'>
<title>libsmctrl.git, branch master</title>
<subtitle>Library to enable intra-context SM/TPC partitioning on NVIDIA GPUs</subtitle>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/'/>
<entry>
<title>De-duplicate CUDA version checks and omit when building with CUDA &gt; 6.5</title>
<updated>2025-05-05T07:13:30+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>bakitajoshua@gmail.com</email>
</author>
<published>2025-05-05T07:13:30+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=c250928930cb5c95bffc878913301f9a5d4efcb7'/>
<id>c250928930cb5c95bffc878913301f9a5d4efcb7</id>
<content type='text'>
Code built with CUDA &gt; 6.5 cannot run on CUDA 6.5 or older, so the
check added unecessary overhead.

Tested on CUDA 6.5 and CUDA 10.2 to generate the correct code, and
global and next tested to work on GTX 1060 3 GB with either build
while using CUDA 10.2 at runtime.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Code built with CUDA &gt; 6.5 cannot run on CUDA 6.5 or older, so the
check added unecessary overhead.

Tested on CUDA 6.5 and CUDA 10.2 to generate the correct code, and
global and next tested to work on GTX 1060 3 GB with either build
while using CUDA 10.2 at runtime.
</pre>
</div>
</content>
</entry>
<entry>
<title>Support stream masking on CUDA 12.7 (x86) and 12.8 (x86)</title>
<updated>2025-04-07T20:45:50+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>bakitajoshua@gmail.com</email>
</author>
<published>2025-04-07T20:45:50+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=72ba87e277572eddb25784563faa3eac111c9556'/>
<id>72ba87e277572eddb25784563faa3eac111c9556</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Makefile improvements</title>
<updated>2025-03-20T20:46:21+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>jbakita@cs.unc.edu</email>
</author>
<published>2025-03-20T20:28:52+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=39c57bca3cbb42b1939a28377d8ef6cfab872450'/>
<id>39c57bca3cbb42b1939a28377d8ef6cfab872450</id>
<content type='text'>
- Add an "all" build target
- Fix build if libcuda.so is not on linker search path
- Do not assume that nvcc is available on $PATH
- Allow specifying CFLAGS and LDFLAGS when running make
- Allow passing non-standard CUDA build locations to make

Suggested usage if CUDA is installed in a non-standard location,
say, /playpen/jbakita/CUDA/cuda-archive/cuda-12.2:

make CUDA=/playpen/jbakita/CUDA/cuda-archive/cuda-12.2
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Add an "all" build target
- Fix build if libcuda.so is not on linker search path
- Do not assume that nvcc is available on $PATH
- Allow specifying CFLAGS and LDFLAGS when running make
- Allow passing non-standard CUDA build locations to make

Suggested usage if CUDA is installed in a non-standard location,
say, /playpen/jbakita/CUDA/cuda-archive/cuda-12.2:

make CUDA=/playpen/jbakita/CUDA/cuda-archive/cuda-12.2
</pre>
</div>
</content>
</entry>
<entry>
<title>Correctly pass arguments to Python's ctypes find_library() function</title>
<updated>2024-12-23T07:15:31+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>jbakita@cs.unc.edu</email>
</author>
<published>2024-12-23T07:15:31+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=5173f55d57a944c4f6a3e018892fc1da55e15571'/>
<id>5173f55d57a944c4f6a3e018892fc1da55e15571</id>
<content type='text'>
The prefix "lib" should not be included, per the documentation.

This was not caught in local tests as the fallback was always used
locally.

Patch courtesy of Guanbin Xu &lt;xugb@mail.ustc.edu.cn&gt;.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The prefix "lib" should not be included, per the documentation.

This was not caught in local tests as the fallback was always used
locally.

Patch courtesy of Guanbin Xu &lt;xugb@mail.ustc.edu.cn&gt;.
</pre>
</div>
</content>
</entry>
<entry>
<title>CRITICAL: Remove stray brackets breaking build</title>
<updated>2024-12-23T07:12:48+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>jbakita@cs.unc.edu</email>
</author>
<published>2024-12-23T07:12:48+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=35f0f1eaf6a99886075a6ee12cc8e0e178aa8308'/>
<id>35f0f1eaf6a99886075a6ee12cc8e0e178aa8308</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove unused variables from Makefile</title>
<updated>2024-12-19T20:04:26+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>jbakita@cs.unc.edu</email>
</author>
<published>2024-12-19T19:59:15+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=32f89528085f3cdb183e50461ffcd109d1f9a58d'/>
<id>32f89528085f3cdb183e50461ffcd109d1f9a58d</id>
<content type='text'>
Make automatically provides CXX and CC, and these manual
definitions were being ignored.

Also fix a missing space in one of the messages from the tests.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make automatically provides CXX and CC, and these manual
definitions were being ignored.

Also fix a missing space in one of the messages from the tests.
</pre>
</div>
</content>
</entry>
<entry>
<title>Bugfix stream-mask override, support old CUDA, and start Hopper support</title>
<updated>2024-12-19T19:48:21+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>jbakita@cs.unc.edu</email>
</author>
<published>2024-12-19T19:20:38+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=d052c2df34ab41ba285f70965663e5a0832f6ac9'/>
<id>d052c2df34ab41ba285f70965663e5a0832f6ac9</id>
<content type='text'>
Use a different callback to intercept the TMD/QMD later in the
launch pipeline.

Major improvements:
- Fix bug with next mask not overriding stream mask on CUDA 11.0+
- Add CUDA 6.5-10.2 support for next- and global-granularity
  partitioning masks on x86_64 and aarch64 Jetson
- Remove libdl dependency
- Partially support TMD/QMD Version 4 (Hopper)

Minor improvements:
- Check for sufficient CUDA version before before attempting to
  apply a next-granularity partitioning mask
- Only check for sufficient CUDA version on the first call to
  `libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`,
  rather than checking every time (lowers overheads)
- Check that TMD version is sufficient before modifying it
- Improve documentation

Issues:
- Partitioning mask bits have a different meaning in TMD/QMD
  Version 4 and require floorsweeping and remapping information to
  properly construct. This information will be forthcoming in
  future releases of libsmctrl and nvdebug.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use a different callback to intercept the TMD/QMD later in the
launch pipeline.

Major improvements:
- Fix bug with next mask not overriding stream mask on CUDA 11.0+
- Add CUDA 6.5-10.2 support for next- and global-granularity
  partitioning masks on x86_64 and aarch64 Jetson
- Remove libdl dependency
- Partially support TMD/QMD Version 4 (Hopper)

Minor improvements:
- Check for sufficient CUDA version before before attempting to
  apply a next-granularity partitioning mask
- Only check for sufficient CUDA version on the first call to
  `libsmctrl_set_next_mask()` or `libsmctrl_set_global_mask()`,
  rather than checking every time (lowers overheads)
- Check that TMD version is sufficient before modifying it
- Improve documentation

Issues:
- Partitioning mask bits have a different meaning in TMD/QMD
  Version 4 and require floorsweeping and remapping information to
  properly construct. This information will be forthcoming in
  future releases of libsmctrl and nvdebug.
</pre>
</div>
</content>
</entry>
<entry>
<title>Support CUDA 12.2, 12.5, and 12.6 on Jetson aarch64</title>
<updated>2024-12-19T18:36:40+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>jbakita@cs.unc.edu</email>
</author>
<published>2024-12-19T18:36:40+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=aa63a02efa5fc8701f0c3418704bbbc2051c1042'/>
<id>aa63a02efa5fc8701f0c3418704bbbc2051c1042</id>
<content type='text'>
Also test and note that stream masking on CUDA 6.5 seems impossible.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Also test and note that stream masking on CUDA 6.5 seems impossible.
</pre>
</div>
</content>
</entry>
<entry>
<title>Check for read() errors on procfs files during libsmctrl_get_gpc_info()</title>
<updated>2024-12-19T18:32:30+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>jbakita@cs.unc.edu</email>
</author>
<published>2024-12-19T18:32:30+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=147c69f31f25c3dc79b7943a0c56c171fe306682'/>
<id>147c69f31f25c3dc79b7943a0c56c171fe306682</id>
<content type='text'>
Also update a comment
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Also update a comment
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix a potential bug with stream masking on CUDA 12.6 on aarch64 Jetson</title>
<updated>2024-12-18T20:49:48+00:00</updated>
<author>
<name>Joshua Bakita</name>
<email>jbakita@cs.unc.edu</email>
</author>
<published>2024-12-18T20:37:27+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/libsmctrl.git/commit/?id=de331f42ab6efd7739c9d02594bc796421779a5f'/>
<id>de331f42ab6efd7739c9d02594bc796421779a5f</id>
<content type='text'>
Commit 3f9bda39 made an error by using the pre-CUDA-12 mask
structure layout on CUDA 12.6 on aarch64 Jetson. Switch to the
CUDA 12+ layout (as used on x86_64).

Tests work either way on the Jetson Orin, so this change is not
strictly required, but seems advisable to support potenital large
(PCIe-attached?) GPUs on Jetson/DRIVE platforms.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 3f9bda39 made an error by using the pre-CUDA-12 mask
structure layout on CUDA 12.6 on aarch64 Jetson. Switch to the
CUDA 12+ layout (as used on x86_64).

Tests work either way on the Jetson Orin, so this change is not
strictly required, but seems advisable to support potenital large
(PCIe-attached?) GPUs on Jetson/DRIVE platforms.
</pre>
</div>
</content>
</entry>
</feed>
