aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Delete no-longer-needed nvgpu headersHEADmasterjbakita-wipJoshua Bakita2024-09-25
| | | | The dependency on these was removed in commit 8340d234.
* Remove dependency on Jetson (nvgpu) driver internalsJoshua Bakita2024-09-25
| | | | | | | | | | | | For integrated (Jetson) GPUs: - Directly retrieve and map GPU register region 0 - Directly check GPU power-on state before a register read/write - Resume the GPU as needed for a register read/write Most nvgpu APIs can now be called on TX2+ integrated GPUs without first having to start some task on the GPU to make it non-suspended. Tested on Jetson TX1, TX2, Xavier, and Orin.
* Add a READMEJoshua Bakita2024-09-25
| | | | See also the RTAS'23 and RTAS'24 papers.
* Correct an off-by-one error in addr_to_pramin_mut()Joshua Bakita2024-09-25
| | | | | Not known to cause any current bugs, but could cause the returned address to be inaccessible.
* Add IDs and names of new Hopper+ enginesJoshua Bakita2024-09-25
| | | | Draws from NVIDIA's open-gpu-kernel-modules project.
* Return an error, rather than a flag value, from `nvdebug_reg32_read()`Joshua Bakita2024-09-19
| | | | | | This is used to back APIs like `num_gpcs`. Better to return an error to the caller, rather than -1 (which may be confused for an actual result).
* Correctly check for read errors in the nvdebug_read* functionsJoshua Bakita2024-09-19
| | | | Follows how NVIDIA's open-source GPU driver checks for bad reads.
* Ampere: disable/enable_channel, preempt/switch_to_tsg, and resubmit_runlistJoshua Bakita2024-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | **Modifes the user API from `echo 1 > /proc/gpuX/switch_to_tsg` to `echo 1 > /proc/gpuX/runlist0/switch_to_tsg` to switch to TSG 1 on runlist 0 on GPU X for pre-Ampere GPUs (for example).** Feature changes: - switch_to_tsg only makes sense on a per-runlist level. Before, this always operated on runlist0; this commit allows operating on any runlist by moving the API to the per-runlist paths. - On Ampere+, channel and TSG IDs are per-runlist, and no longer GPU-global. Consequently, the disable/enable_channel and preempt_tsg APIs have been moved from GPU-global to per-runlist paths on Ampere+. Bug fixes: - `preempt_runlist()` is now supported on Maxwell and Pascal. - `resubmit_runlist()` detects too-old GPUs. - MAX_CHID corrected from 512 to 511 and documented. - switch_to_tsg now includes a runlist resubmit, which appears to be necessary on Turing+ GPUs. Tested on GK104 (Quadro K5000), GM204 (GTX 970), GP106 (GTX 1060 3GB), GP104 (GTX 1080 Ti), GP10B (Jetson TX2), GV11B (Jetson Xavier), GV100 (Titan V), TU102 (RTX 2080 Ti), and AD102 (RTX 6000 Ada).
* Cleanup in nvdebug_entry.cJoshua Bakita2024-09-16
| | | | | | | | | | | - Fix pointer corruption when `compat_ops()` is called more than once on the same struct. - Add support for detecting Jetson Orin on newer releases of L4T. - Reorder some initialization steps such that the order matches for both PCIe- and platform-bus devices. - Remove a duplicate check in `nvdebug_exit()`. Tested on ga10b (Jetson Orin) with L4T r36.3.
* Support printing the runlist and channels on Ampere+ GPUsJoshua Bakita2024-09-16
| | | | | | | | | | | | | | | | | **Modifes the user API from `cat /proc/gpuX/runlist0` to `cat /proc/gpuX/runlist0/runlist` to support runlist-scoped registers** - Count number of runlists via Ampere-style PTOP parsing. - Create a ProcFS directory for each runlist, and create the runlist printing file in this directory. - Document the newly-added/-formatted Runlist RAM and Channel RAM registers. - Add a helper function `get_runlist_ram()` to obtain the location of each runlist's registers. - Support printing Ampere-style Channel RAM entries. Tested on Jetson Orin (ga10b), A100, H100, and AD102 (RTX 6000 Ada)
* Documentation and style cleanup. No functional changes.Joshua Bakita2024-09-10
|
* Add Hopper and Blackwell support to bus.cJoshua Bakita2024-09-10
| | | | | | | | - Handle relocation of PRAMIN window configuration register - Handle new format for BAR2 configuration - Catch unreadable PRAMIN configuration Tested on A100, H100, and AD102 (RTX 6000 Ada).
* Style and documentation cleanupJoshua Bakita2024-04-23
| | | | | | | - Document topology registers (PTOP) on Ampere+ - Document graphics copy engine configuration registers - Move resubmit_runlist range checks into runlist.c - Miscellaneous spacing, typo, and minor documentation fixes
* Document Turing- Channel Ram and improve channel detail printerJoshua Bakita2024-04-23
|
* Fix page-table traversal for version 1 page tablesJoshua Bakita2024-04-22
| | | | | | | | | - Correct V1 page table defines using information from kern_gmmu_fmt_gm10x.c in NVIDIA's open-gpu-kernel-modules repo. - Verify page table format in search_v1_page_directory() - Better, more controllable logging in mmu.c Tested on GM204 (GTX 970).
* Add /proc/gpu#/resubmit_runlist APIJoshua Bakita2024-04-21
| | | | | | | | | | | | Resubmits the runlist in an identical configuration. Causes the runlist scheduler to: 1. Reload and cache timeslice and scale values from TSGs. 2. Restart scheduling from the head of the runlist [may cause a preempt to be scheduled for the currently-running task (?)]. 3. Address (?) an errata on Turing where re-enabled channels are not always detected. Above behavior tested on GV100 and partially tested on TU102.
* Better logging and errno-defined errors in bus.cJoshua Bakita2024-04-21
|
* Match aperture when doing page table searchesJoshua Bakita2024-04-21
| | | | | | Previously, addresses of any aperture would match. This can result in very confusing results when attempting to verify that a mapping correctly exists.
* Always dereference the post-I/O-MMU-translated address in mmu.cJoshua Bakita2024-04-21
| | | | | | | Previously attempted to dereference non-I/O-MMU translated addresses, which resulted in accessing effectively arbitrary physical addresses, causing errononus results from search_page_directory() in particular.
* Misc non-functional cleanup and documentationJoshua Bakita2024-04-13
|
* Add /proc/gpu#/local_memory API for getting VRAM sizeJoshua Bakita2024-04-13
|
* Linux 5.17+ support and allow including nvdebug.h independentlyJoshua Bakita2024-04-11
| | | | | - Move Linux-specific functions to nvdebug_linux.h and .c - Workaround PDE_DATA() being pde_data() on Linux 5.17+
* Support page directories outside PRAMIN or in SYS_MEMJoshua Bakita2024-04-11
| | | | | | | | | | | | | | | | | | | | | - Re-read PRAMIN configuration after update to verify change applies - Return a page_dir_config_t rather than just an address and page. table version from `get_bar2_pdb()`. - Less verbose logging for MMU-related functions by default. - Perform all conversion from SYS_MEM/VID_MEM addresses to kernel addresses inside the translation functions, via the new function 'pd_deref()`. - Support use of an I/O MMU, page tables/directories outside the current PRAMIN window, and page tables/directories arbitrarially located in SYS_MEM or VID_MEM on different levels of the same tree. - Heavily improve documentation and add references for Version 1 and Version 0 page tables. - Improve logging in `runlist.c` to include runlist and chip IDs. - Update all users of search_page_directory* to use the new API. - Remove now-unused supporting functions from `mmu.c`. Tested on GTX 970, GTX 1060 3GB, Jetson TX2, Titan V, Jetson Xavier, and RTX 2080 Ti.
* Correctly check return code from vram2PRAMIN()Joshua Bakita2024-04-09
| | | | | | Blindly using an invalid return address was resulting in undefined behavior due to traversal of non-page-table addresses as though they were part of the page table.
* Improve debugging messages for reverse page translationJoshua Bakita2024-04-09
|
* Fix an off-by-one error in V2 reverse page table lookupsJoshua Bakita2024-04-09
| | | | | | | | This would occationally manifest as an inability to find the runlist page in BAR2, as only part of the page table was being traversed. Also includes non-functional changes to documentation, scoping, and structure layout.
* Correctly handle startup errors and fix gpc*_mask APIsJoshua Bakita2024-04-09
| | | | | | | | | | - Do not create gpc*_mask files on pre-Maxwell GPUs (tested unavailable on the K5000s) - Use correct register offsets for gpc*_mask files on Ampere+ GPUs - Document GPC and TPC count and fuse registers. - Correctly handle errors for creation of all ProcFS files - Remove unecessary error-handling temp variables in nvdebug_entry - Misc naming, comment, and layout cleanup
* Return const pointers to string constants.Joshua Bakita2024-04-09
| | | | | Also update how Instance Pointers are aligned in the runlist output to make them more easily distinguishable from other fields.
* Correctly use Volta-based runlist layout on the GV100 GPUJoshua Bakita2024-04-08
| | | | | Fixes a bug that caused the runlist output to be garbled on the GV100 GPU (the Titan V).
* Heavily refactor runlist code for correctness and Turing supportJoshua Bakita2024-04-08
| | | | | | | | | | | | | - Support differently-formatted runlist registers on Turing - Support different runlist register offsets on Turing - Fix incorrect indenting when printing the runlist - Fix `preempt_tsg` and `switch_to_tsg` API implementations to correctly interface with the hardware (previously, they would try to disable scheduling for the last-updated runlist pointer, which was nonsense, and just an artifact of my early misunderstandings of how the NV_PFIFO_RUNLIST* registers worked). - Remove misused NV_PFIFO_RUNLIST and NV_PFIFO_RUNLIST_BASE registers - Refactor `runlist.c` to use the APIs from `bus.c`
* Add Jetson TX1 supportJoshua Bakita2024-04-08
|
* Put PRAMIN-pointer and BAR2-page-table-PRAMIN-pointer logic into bus.cJoshua Bakita2024-04-08
| | | | | Derived from logic in `runlist.c` and `mmu.c`. The new functions are not directly used in this commit.
* Rework LCE<->PCE and GRCE->LCE configuration printing APIarchive/saman63-wipJoshua Bakita2024-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than up to dozens of individual files exposing part of each copy engine's configuration, have one file which exposes a unified view of the full topology. Example new output on RTX 2080 Ti: $ cat /proc/gpu0/copy_topology GRCE0 -> LCE04 GRCE1 -> LCE03 LCE02 -> PCE02 LCE03 -> PCE03 LCE04 -> PCE01 Old output: $ tail -n 1 /proc/gpu0/lce_for_pce* ==> /proc/gpu0/lce_for_pce0 <== 0xf ==> /proc/gpu0/lce_for_pce1 <== 0x4 ==> /proc/gpu0/lce_for_pce2 <== 0x2 ==> /proc/gpu0/lce_for_pce3 <== 0x3 $ tail -n 1 /proc/gpu1/shared_lce_for_grce* ==> /proc/gpu0/shared_lce_for_grce0 <== 0x4 ==> /proc/gpu0/shared_lce_for_grce1 <== 0x3 Specifically: - Add `copy_topology` API - Remove `shared_lce_for_grce#` and `lce_for_pce#` APIs - Move logic from `nvdebug_entry.c` to `copy_topology_procfs.c` - Do not print PCE or Shared LCE configuration if flagged absent - Refer to LCE0 and LCE1 as GRCE0 and GRCE1 - Print by LCE ID, which is move helpful when attempting to trace how a given copy runlist maps to a physical copy engine. - Document two errata with CE registers Tested working on Pascal Integrated, Pascal, Volta Integrated Volta, Turing, and Ampere Integrated on Linux 4.9 through 5.10.
* Expand support for printing LCE<->PCE and GRCE->LCE configurationrtas24-aeJoshua J Bakita2023-11-08
| | | | | | | | Tested working on Pascal, Volta, Volta Integrated, Turing, Ampere, and Ada. Also clean up minor spacing issues, an errantly added file (nvdebug.mod), and fix some inconsistencies with upstream.
* Created new read function in device_info for GRCE mappings and Pascal LCE ↵Saman Sahebi2023-10-29
| | | | mappings
* patched issues with GPU compatability for CE_MAPSaman Sahebi2023-10-29
|
* implemented GRCE to PCE mappingSaman Sahebi2023-10-29
|
* Updated for loop formatting and fixed a casting errorSaman Sahebi2023-10-29
|
* added offsets for lce mapping in nvdebug.h and code to read lce for each pce ↵Saman Sahebi2023-10-29
| | | | in nvdebug_entry.c
* Add nvdebug.mod to .gitignoreJoshua Bakita2023-10-29
|
* Support printing device info on Ampere+ GPUs. By Benjamin Hadad IVJoshua Bakita2023-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c3d6f2c852eb046e9d4f4f1e6527b52c746b2693 Author: Joshua Bakita <bakitajoshua@gmail.com> Date: Sun Oct 29 14:37:51 2023 -0400 Print Ampere+ device_info fields with correct offsets/widths Everything now has been checked against how nvgpu handles it commit b70849d1ce67a58f9f69b37dc62122f789f4cdf7 Author: Joshua Bakita <jbakita@cs.unc.edu> Date: Wed Sep 20 14:27:38 2023 -0400 Rearrange, fix an off-by-one error, and remove an unused define The code in nvdebug.h has been rearranged to enable an easier merge against the jbakita-wip branch. commit 51f808e092846a60ea6c88ea3a1d2e349c92977b Author: Joshua Bakita <jbakita@cs.unc.edu> Date: Wed Sep 20 13:09:17 2023 -0400 Bug fixes and cleanup for new device_info logic - Update comments to match new structure - Make show() function idempotent - Skip empty table entries without aborting - Include names for new engine types - Add warning log messages for skipped table entries - Remove non-functional runlist file creation logic for Ampere+ commit 1d7adc3be1aef5ac9c144bb24008fd8cc5d688a5 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Sat Aug 19 12:47:18 2023 -0400 Debugging changes made to restore functionality following refactoring. - Debugged data display errors. - Debugged crash bugs. - Debugged memory issue. commit 9e6cc03cdf736fbd817ed53fa9a7f506bc91a244 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Wed Aug 16 22:00:20 2023 -0400 A variety of changes have been made as part of the code review. - Functions have been consolidated. - Code was clarified and tidied up overall. - Unnecessary elements were removed. commit 845960fc1b15995fdbd6d61c384567652a150bc4 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Fri Jul 28 11:39:28 2023 -0400 Refactored various systems and debugged minor issues - Added device_info_iter - Merged functions in device_info_procfs.c - Separated device_info data structs by version in nvdebug.h - Fixed issue with device_info runlist ID data commit 8a57aaeba41c43233c323d7e0fc8bf1a81ebc65e Author: Benjamin Hadad IV <bh4@unc.edu> Date: Fri Jul 21 11:32:51 2023 -0400 I have updated the ptop_device_info_t comment in nvdebug.h. commit 33c915f08f5dc63674b158ecc18897494256a6d0 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Wed Jul 19 13:02:52 2023 -0400 Debugged device_info functionality - Fixed device_info crash bugs - Made further edits to display functionality - Refactored code to enhance readability commit bfb4dcf0e78954c0163f3a06a5a088c4d1b437a8 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Thu Jul 13 12:13:17 2023 -0400 This commit is to update the repo for display during a meeting. - Added an Ampere version of the device info data. - Added Ampere versions of auxillary functions. - Modified display functions to accommodate Ampere data. - Made other various small modifications. commit 068e7f4e7208d6c9250ad72208e0b36fd9fdf2f6 Merge: 3725b15 073e897 Author: Benjamin Hadad IV <bh4@unc.edu> Date: Mon Jul 10 12:39:12 2023 -0400 Merge branch 'jbakita-wip' of ssh://rtsrv.cs.unc.edu/public/nvdebug into wip I am merging Mr. Bakita's changes (046d7d2) into this repository. commit 3725b15d5da3e06ef202045d710aa5f15eb72fcc Author: Benjamin Hadad IV <bh4@unc.edu> Date: Mon Jul 3 04:30:54 2023 -0400 I modified nvdebug.h for Ampere.
* Rather than abort, print placeholders for missing runlist channelsJoshua Bakita2023-10-29
| | | | | Sometimes such "malformed" runlists appear on the TX2, yet they seem to work fine, so support printing them in full.
* Support PRAMIN-based runlist access fallback (optional; on by default)Joshua Bakita2023-10-29
| | | | | | | | | | | | | Using this may be hazardous---we don't know if some of the GPU drivers use this after initial bring-up. If they do, and we race with them in setting it, or we unexpectedly change it under them, arbitrary state corruption could occur. This is only entirely safe to use if you don't trust the GPU state after the first use of this fallback. In limited experiments vs the `nvgpu` (Tegra) and `nvidia` (closed-source discrete) drivers, no ill side effects have yet been observed, but still please use with caution.
* Zero-out unused/invalid fields in g_nvdebug_device on Tegra devicesJoshua Bakita2023-10-29
| | | | | | Previously, unloading the module could cause a segfault on Tegra, as pcid would be unitilized and possibly non-zero, causing us to attempt PCIe-device-style deinitialization on a non-PCEe device.
* Include <linux/seq_file.h> in nvdebug.h and sort includesJoshua Bakita2023-10-29
| | | | | Fixes the build for some kernel versions where this is no longer transatively included.
* Update includes to L4T r32.7.4 and drop nvgpu/gk20a.h dependencyJoshua Bakita2023-10-29
| | | | | Also add instructions for updating `include/`. These files are now only needed to build on Linux 4.9-based Tegra platforms.
* Only log interrupts if explictly enabledJoshua Bakita2023-10-05
|
* Add architecture decode for Ada, Hopper, and BlackwellJoshua Bakita2023-09-01
|
* Improve copy engine register documentation in nvdebug.h + cleanupJoshua Bakita2023-07-20
|
* Fail reads which return the flag value for a non-existent registerJoshua Bakita2023-07-18
| | | | Also check for read success in get_runlist_iter().