nvdebug.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age
*	Ampere: disable/enable_channel, preempt/switch_to_tsg, and resubmit_runlist	Joshua Bakita	2024-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Modifes the user API from `echo 1 > /proc/gpuX/switch_to_tsg` to `echo 1 > /proc/gpuX/runlist0/switch_to_tsg` to switch to TSG 1 on runlist 0 on GPU X for pre-Ampere GPUs (for example). Feature changes: - switch_to_tsg only makes sense on a per-runlist level. Before, this always operated on runlist0; this commit allows operating on any runlist by moving the API to the per-runlist paths. - On Ampere+, channel and TSG IDs are per-runlist, and no longer GPU-global. Consequently, the disable/enable_channel and preempt_tsg APIs have been moved from GPU-global to per-runlist paths on Ampere+. Bug fixes: - `preempt_runlist()` is now supported on Maxwell and Pascal. - `resubmit_runlist()` detects too-old GPUs. - MAX_CHID corrected from 512 to 511 and documented. - switch_to_tsg now includes a runlist resubmit, which appears to be necessary on Turing+ GPUs. Tested on GK104 (Quadro K5000), GM204 (GTX 970), GP106 (GTX 1060 3GB), GP104 (GTX 1080 Ti), GP10B (Jetson TX2), GV11B (Jetson Xavier), GV100 (Titan V), TU102 (RTX 2080 Ti), and AD102 (RTX 6000 Ada).
*	Cleanup in nvdebug_entry.c	Joshua Bakita	2024-09-16
\| \| \| \| \| \| \| \| \| \| \|	- Fix pointer corruption when `compat_ops()` is called more than once on the same struct. - Add support for detecting Jetson Orin on newer releases of L4T. - Reorder some initialization steps such that the order matches for both PCIe- and platform-bus devices. - Remove a duplicate check in `nvdebug_exit()`. Tested on ga10b (Jetson Orin) with L4T r36.3.
*	Support printing the runlist and channels on Ampere+ GPUs	Joshua Bakita	2024-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Modifes the user API from `cat /proc/gpuX/runlist0` to `cat /proc/gpuX/runlist0/runlist` to support runlist-scoped registers - Count number of runlists via Ampere-style PTOP parsing. - Create a ProcFS directory for each runlist, and create the runlist printing file in this directory. - Document the newly-added/-formatted Runlist RAM and Channel RAM registers. - Add a helper function `get_runlist_ram()` to obtain the location of each runlist's registers. - Support printing Ampere-style Channel RAM entries. Tested on Jetson Orin (ga10b), A100, H100, and AD102 (RTX 6000 Ada)
*	Documentation and style cleanup. No functional changes.	Joshua Bakita	2024-09-10
\|
*	Add Hopper and Blackwell support to bus.c	Joshua Bakita	2024-09-10
\| \| \| \| \| \| \| \|	- Handle relocation of PRAMIN window configuration register - Handle new format for BAR2 configuration - Catch unreadable PRAMIN configuration Tested on A100, H100, and AD102 (RTX 6000 Ada).
*	Style and documentation cleanup	Joshua Bakita	2024-04-23
\| \| \| \| \| \| \|	- Document topology registers (PTOP) on Ampere+ - Document graphics copy engine configuration registers - Move resubmit_runlist range checks into runlist.c - Miscellaneous spacing, typo, and minor documentation fixes
*	Document Turing- Channel Ram and improve channel detail printer	Joshua Bakita	2024-04-23
\|
*	Fix page-table traversal for version 1 page tables	Joshua Bakita	2024-04-22
\| \| \| \| \| \| \| \| \|	- Correct V1 page table defines using information from kern_gmmu_fmt_gm10x.c in NVIDIA's open-gpu-kernel-modules repo. - Verify page table format in search_v1_page_directory() - Better, more controllable logging in mmu.c Tested on GM204 (GTX 970).
*	Add /proc/gpu#/resubmit_runlist API	Joshua Bakita	2024-04-21
\| \| \| \| \| \| \| \| \| \| \| \|	Resubmits the runlist in an identical configuration. Causes the runlist scheduler to: 1. Reload and cache timeslice and scale values from TSGs. 2. Restart scheduling from the head of the runlist [may cause a preempt to be scheduled for the currently-running task (?)]. 3. Address (?) an errata on Turing where re-enabled channels are not always detected. Above behavior tested on GV100 and partially tested on TU102.
*	Better logging and errno-defined errors in bus.c	Joshua Bakita	2024-04-21
\|
*	Match aperture when doing page table searches	Joshua Bakita	2024-04-21
\| \| \| \| \| \|	Previously, addresses of any aperture would match. This can result in very confusing results when attempting to verify that a mapping correctly exists.
*	Always dereference the post-I/O-MMU-translated address in mmu.c	Joshua Bakita	2024-04-21
\| \| \| \| \| \| \|	Previously attempted to dereference non-I/O-MMU translated addresses, which resulted in accessing effectively arbitrary physical addresses, causing errononus results from search_page_directory() in particular.
*	Misc non-functional cleanup and documentation	Joshua Bakita	2024-04-13
\|
*	Add /proc/gpu#/local_memory API for getting VRAM size	Joshua Bakita	2024-04-13
\|
*	Linux 5.17+ support and allow including nvdebug.h independently	Joshua Bakita	2024-04-11
\| \| \| \| \|	- Move Linux-specific functions to nvdebug_linux.h and .c - Workaround PDE_DATA() being pde_data() on Linux 5.17+
*	Support page directories outside PRAMIN or in SYS_MEM	Joshua Bakita	2024-04-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Re-read PRAMIN configuration after update to verify change applies - Return a page_dir_config_t rather than just an address and page. table version from `get_bar2_pdb()`. - Less verbose logging for MMU-related functions by default. - Perform all conversion from SYS_MEM/VID_MEM addresses to kernel addresses inside the translation functions, via the new function 'pd_deref()`. - Support use of an I/O MMU, page tables/directories outside the current PRAMIN window, and page tables/directories arbitrarially located in SYS_MEM or VID_MEM on different levels of the same tree. - Heavily improve documentation and add references for Version 1 and Version 0 page tables. - Improve logging in `runlist.c` to include runlist and chip IDs. - Update all users of search_page_directory* to use the new API. - Remove now-unused supporting functions from `mmu.c`. Tested on GTX 970, GTX 1060 3GB, Jetson TX2, Titan V, Jetson Xavier, and RTX 2080 Ti.
*	Correctly check return code from vram2PRAMIN()	Joshua Bakita	2024-04-09
\| \| \| \| \| \|	Blindly using an invalid return address was resulting in undefined behavior due to traversal of non-page-table addresses as though they were part of the page table.
*	Improve debugging messages for reverse page translation	Joshua Bakita	2024-04-09
\|
*	Fix an off-by-one error in V2 reverse page table lookups	Joshua Bakita	2024-04-09
\| \| \| \| \| \| \| \|	This would occationally manifest as an inability to find the runlist page in BAR2, as only part of the page table was being traversed. Also includes non-functional changes to documentation, scoping, and structure layout.
*	Correctly handle startup errors and fix gpc*_mask APIs	Joshua Bakita	2024-04-09
\| \| \| \| \| \| \| \| \| \|	- Do not create gpc_mask files on pre-Maxwell GPUs (tested unavailable on the K5000s) - Use correct register offsets for gpc_mask files on Ampere+ GPUs - Document GPC and TPC count and fuse registers. - Correctly handle errors for creation of all ProcFS files - Remove unecessary error-handling temp variables in nvdebug_entry - Misc naming, comment, and layout cleanup
*	Return const pointers to string constants.	Joshua Bakita	2024-04-09
\| \| \| \| \|	Also update how Instance Pointers are aligned in the runlist output to make them more easily distinguishable from other fields.
*	Correctly use Volta-based runlist layout on the GV100 GPU	Joshua Bakita	2024-04-08
\| \| \| \| \|	Fixes a bug that caused the runlist output to be garbled on the GV100 GPU (the Titan V).
*	Heavily refactor runlist code for correctness and Turing support	Joshua Bakita	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	- Support differently-formatted runlist registers on Turing - Support different runlist register offsets on Turing - Fix incorrect indenting when printing the runlist - Fix `preempt_tsg` and `switch_to_tsg` API implementations to correctly interface with the hardware (previously, they would try to disable scheduling for the last-updated runlist pointer, which was nonsense, and just an artifact of my early misunderstandings of how the NV_PFIFO_RUNLIST* registers worked). - Remove misused NV_PFIFO_RUNLIST and NV_PFIFO_RUNLIST_BASE registers - Refactor `runlist.c` to use the APIs from `bus.c`
*	Add Jetson TX1 support	Joshua Bakita	2024-04-08
\|
*	Put PRAMIN-pointer and BAR2-page-table-PRAMIN-pointer logic into bus.c	Joshua Bakita	2024-04-08
\| \| \| \| \|	Derived from logic in `runlist.c` and `mmu.c`. The new functions are not directly used in this commit.
*	Rework LCE<->PCE and GRCE->LCE configuration printing APIarchive/saman63-wip	Joshua Bakita	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than up to dozens of individual files exposing part of each copy engine's configuration, have one file which exposes a unified view of the full topology. Example new output on RTX 2080 Ti: $ cat /proc/gpu0/copy_topology GRCE0 -> LCE04 GRCE1 -> LCE03 LCE02 -> PCE02 LCE03 -> PCE03 LCE04 -> PCE01 Old output: $ tail -n 1 /proc/gpu0/lce_for_pce* ==> /proc/gpu0/lce_for_pce0 <== 0xf ==> /proc/gpu0/lce_for_pce1 <== 0x4 ==> /proc/gpu0/lce_for_pce2 <== 0x2 ==> /proc/gpu0/lce_for_pce3 <== 0x3 $ tail -n 1 /proc/gpu1/shared_lce_for_grce* ==> /proc/gpu0/shared_lce_for_grce0 <== 0x4 ==> /proc/gpu0/shared_lce_for_grce1 <== 0x3 Specifically: - Add `copy_topology` API - Remove `shared_lce_for_grce#` and `lce_for_pce#` APIs - Move logic from `nvdebug_entry.c` to `copy_topology_procfs.c` - Do not print PCE or Shared LCE configuration if flagged absent - Refer to LCE0 and LCE1 as GRCE0 and GRCE1 - Print by LCE ID, which is move helpful when attempting to trace how a given copy runlist maps to a physical copy engine. - Document two errata with CE registers Tested working on Pascal Integrated, Pascal, Volta Integrated Volta, Turing, and Ampere Integrated on Linux 4.9 through 5.10.
*	Expand support for printing LCE<->PCE and GRCE->LCE configurationrtas24-ae	Joshua J Bakita	2023-11-08
\| \| \| \| \| \| \| \|	Tested working on Pascal, Volta, Volta Integrated, Turing, Ampere, and Ada. Also clean up minor spacing issues, an errantly added file (nvdebug.mod), and fix some inconsistencies with upstream.
*	Created new read function in device_info for GRCE mappings and Pascal LCE ↵	Saman Sahebi	2023-10-29
\| \| \| \|	mappings
*	patched issues with GPU compatability for CE_MAP	Saman Sahebi	2023-10-29
\|
*	implemented GRCE to PCE mapping	Saman Sahebi	2023-10-29
\|
*	Updated for loop formatting and fixed a casting error	Saman Sahebi	2023-10-29
\|
*	added offsets for lce mapping in nvdebug.h and code to read lce for each pce ↵	Saman Sahebi	2023-10-29
\| \| \| \|	in nvdebug_entry.c
*	Add nvdebug.mod to .gitignore	Joshua Bakita	2023-10-29
\|
*	Support printing device info on Ampere+ GPUs. By Benjamin Hadad IV	Joshua Bakita	2023-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit c3d6f2c852eb046e9d4f4f1e6527b52c746b2693 Author: Joshua Bakita <bakitajoshua@gmail.com> Date: Sun Oct 29 14:37:51 2023 -0400 Print Ampere+ device_info fields with correct offsets/widths Everything now has been checked against how nvgpu handles it commit b70849d1ce67a58f9f69b37dc62122f789f4cdf7 Author: Joshua Bakita <jbakita@cs.unc.edu> Date: Wed Sep 20 14:27:38 2023 -0400 Rearrange, fix an off-by-one error, and remove an unused define The code in nvdebug.h has been rearranged to enable an easier merge against the jbakita-wip branch. commit 51f808e092846a60ea6c88ea3a1d2e349c92977b Author: Joshua Bakita <jbakita@cs.unc.edu> Date: Wed Sep 20 13:09:17 2023 -0400 Bug fixes and cleanup for new device_info logic - Update comments to match new structure - Make show() function idempotent - Skip empty table entries without aborting - Include names for new engine types - Add warning log messages for skipped table entries - Remove non-functional runlist file creation logic for Ampere+ commit 1d7adc3be1aef5ac9c144bb24008fd8cc5d688a5 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Sat Aug 19 12:47:18 2023 -0400 Debugging changes made to restore functionality following refactoring. - Debugged data display errors. - Debugged crash bugs. - Debugged memory issue. commit 9e6cc03cdf736fbd817ed53fa9a7f506bc91a244 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Wed Aug 16 22:00:20 2023 -0400 A variety of changes have been made as part of the code review. - Functions have been consolidated. - Code was clarified and tidied up overall. - Unnecessary elements were removed. commit 845960fc1b15995fdbd6d61c384567652a150bc4 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Fri Jul 28 11:39:28 2023 -0400 Refactored various systems and debugged minor issues - Added device_info_iter - Merged functions in device_info_procfs.c - Separated device_info data structs by version in nvdebug.h - Fixed issue with device_info runlist ID data commit 8a57aaeba41c43233c323d7e0fc8bf1a81ebc65e Author: Benjamin Hadad IV <bh4@unc.edu> Date: Fri Jul 21 11:32:51 2023 -0400 I have updated the ptop_device_info_t comment in nvdebug.h. commit 33c915f08f5dc63674b158ecc18897494256a6d0 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Wed Jul 19 13:02:52 2023 -0400 Debugged device_info functionality - Fixed device_info crash bugs - Made further edits to display functionality - Refactored code to enhance readability commit bfb4dcf0e78954c0163f3a06a5a088c4d1b437a8 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Thu Jul 13 12:13:17 2023 -0400 This commit is to update the repo for display during a meeting. - Added an Ampere version of the device info data. - Added Ampere versions of auxillary functions. - Modified display functions to accommodate Ampere data. - Made other various small modifications. commit 068e7f4e7208d6c9250ad72208e0b36fd9fdf2f6 Merge: 3725b15 073e897 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Mon Jul 10 12:39:12 2023 -0400 Merge branch 'jbakita-wip' of ssh://rtsrv.cs.unc.edu/public/nvdebug into wip I am merging Mr. Bakita's changes (046d7d2) into this repository. commit 3725b15d5da3e06ef202045d710aa5f15eb72fcc Author: Benjamin Hadad IV <bh4@unc.edu> Date: Mon Jul 3 04:30:54 2023 -0400 I modified nvdebug.h for Ampere.
*	Rather than abort, print placeholders for missing runlist channels	Joshua Bakita	2023-10-29
\| \| \| \| \|	Sometimes such "malformed" runlists appear on the TX2, yet they seem to work fine, so support printing them in full.
*	Support PRAMIN-based runlist access fallback (optional; on by default)	Joshua Bakita	2023-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \|	Using this may be hazardous---we don't know if some of the GPU drivers use this after initial bring-up. If they do, and we race with them in setting it, or we unexpectedly change it under them, arbitrary state corruption could occur. This is only entirely safe to use if you don't trust the GPU state after the first use of this fallback. In limited experiments vs the `nvgpu` (Tegra) and `nvidia` (closed-source discrete) drivers, no ill side effects have yet been observed, but still please use with caution.
*	Zero-out unused/invalid fields in g_nvdebug_device on Tegra devices	Joshua Bakita	2023-10-29
\| \| \| \| \| \|	Previously, unloading the module could cause a segfault on Tegra, as pcid would be unitilized and possibly non-zero, causing us to attempt PCIe-device-style deinitialization on a non-PCEe device.
*	Include <linux/seq_file.h> in nvdebug.h and sort includes	Joshua Bakita	2023-10-29
\| \| \| \| \|	Fixes the build for some kernel versions where this is no longer transatively included.
*	Update includes to L4T r32.7.4 and drop nvgpu/gk20a.h dependency	Joshua Bakita	2023-10-29
\| \| \| \| \|	Also add instructions for updating `include/`. These files are now only needed to build on Linux 4.9-based Tegra platforms.
*	Only log interrupts if explictly enabled	Joshua Bakita	2023-10-05
\|
*	Add architecture decode for Ada, Hopper, and Blackwell	Joshua Bakita	2023-09-01
\|
*	Improve copy engine register documentation in nvdebug.h + cleanup	Joshua Bakita	2023-07-20
\|
*	Fail reads which return the flag value for a non-existent register	Joshua Bakita	2023-07-18
\| \| \| \|	Also check for read success in get_runlist_iter().
*	Fix addressing, zero-init, and compatibility bugs in a3fe378	Joshua Bakita	2023-07-03
\| \| \| \| \| \| \| \| \|	- Including missing dereference to finish getting the address of the control register range - Add zero-initialization to the proc_ops structure in copat_ops to insure that all intentionally unset fields remain unset - Set .llseek in all the file_operations structures, as recent kernels require this to be explictly set
*	Hacky support for Linux 5.6+ and the Jetson AGX Orin	Joshua Bakita	2023-06-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Works around change in parameters to proc initialization functions via a hacky function which rewrites the layout. This also required making all the struct file_operations writable. Also start reducing dependency on nvgpu headers. Known issues: - Incorrect message printed in log after module is loaded. Unclear if this is because the register detection logic is broken, or if the layout of the data at NV_MC_BOOT_0 has changed. - Not tested
*	Fix deinitialization on non-PCIe devices	Joshua Bakita	2023-06-28
\|
*	Include nvgpu headers	Joshua Bakita	2023-06-28
\| \| \| \| \| \|	These are needed to build on NVIDIA's Jetson boards for the time being. Only a couple structs are required, so it should be fairly easy to remove this dependency at some point in the future.
*	Quick dump of current state for Ben to review.	Joshua Bakita	2023-06-22
\|
*	Comment fixup and abort if runlist is stored in VRAM	Joshua Bakita	2021-10-03
\|
*	Add APIs to enable/disable a channel and switch to or preempt a specific TSG	Joshua Bakita	2021-09-23
\| \| \| \| \| \| \| \| \| \| \|	Adds: - /proc/preempt_tsg which takes a TSG ID - /proc/disable_channel which takes a channel ID - /proc/enable_channel which takes a channel ID - /proc/switch_to_tsg which takes a TSG ID Also significantly expands documentation and structs available in nvdebug.h.