| Commit message (Collapse) | Author | Age |
|
|
|
| |
See also the RTAS'23 and RTAS'24 papers.
|
|
|
|
|
| |
Not known to cause any current bugs, but could cause the returned
address to be inaccessible.
|
|
|
|
| |
Draws from NVIDIA's open-gpu-kernel-modules project.
|
|
|
|
|
|
| |
This is used to back APIs like `num_gpcs`. Better to return an error
to the caller, rather than -1 (which may be confused for an actual
result).
|
|
|
|
| |
Follows how NVIDIA's open-source GPU driver checks for bad reads.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**Modifes the user API from `echo 1 > /proc/gpuX/switch_to_tsg` to
`echo 1 > /proc/gpuX/runlist0/switch_to_tsg` to switch to TSG 1 on
runlist 0 on GPU X for pre-Ampere GPUs (for example).**
Feature changes:
- switch_to_tsg only makes sense on a per-runlist level. Before, this
always operated on runlist0; this commit allows operating on any
runlist by moving the API to the per-runlist paths.
- On Ampere+, channel and TSG IDs are per-runlist, and no longer
GPU-global. Consequently, the disable/enable_channel and
preempt_tsg APIs have been moved from GPU-global to per-runlist
paths on Ampere+.
Bug fixes:
- `preempt_runlist()` is now supported on Maxwell and Pascal.
- `resubmit_runlist()` detects too-old GPUs.
- MAX_CHID corrected from 512 to 511 and documented.
- switch_to_tsg now includes a runlist resubmit, which appears to be
necessary on Turing+ GPUs.
Tested on GK104 (Quadro K5000), GM204 (GTX 970), GP106 (GTX 1060 3GB),
GP104 (GTX 1080 Ti), GP10B (Jetson TX2), GV11B (Jetson Xavier), GV100
(Titan V), TU102 (RTX 2080 Ti), and AD102 (RTX 6000 Ada).
|
|
|
|
|
|
|
|
|
|
|
| |
- Fix pointer corruption when `compat_ops()` is called more than
once on the same struct.
- Add support for detecting Jetson Orin on newer releases of L4T.
- Reorder some initialization steps such that the order matches for
both PCIe- and platform-bus devices.
- Remove a duplicate check in `nvdebug_exit()`.
Tested on ga10b (Jetson Orin) with L4T r36.3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**Modifes the user API from `cat /proc/gpuX/runlist0` to
`cat /proc/gpuX/runlist0/runlist` to support runlist-scoped
registers**
- Count number of runlists via Ampere-style PTOP parsing.
- Create a ProcFS directory for each runlist, and create the runlist
printing file in this directory.
- Document the newly-added/-formatted Runlist RAM and Channel RAM
registers.
- Add a helper function `get_runlist_ram()` to obtain the location
of each runlist's registers.
- Support printing Ampere-style Channel RAM entries.
Tested on Jetson Orin (ga10b), A100, H100, and AD102 (RTX 6000 Ada)
|
| |
|
|
|
|
|
|
|
|
| |
- Handle relocation of PRAMIN window configuration register
- Handle new format for BAR2 configuration
- Catch unreadable PRAMIN configuration
Tested on A100, H100, and AD102 (RTX 6000 Ada).
|
|
|
|
|
|
|
| |
- Document topology registers (PTOP) on Ampere+
- Document graphics copy engine configuration registers
- Move resubmit_runlist range checks into runlist.c
- Miscellaneous spacing, typo, and minor documentation fixes
|
| |
|
|
|
|
|
|
|
|
|
| |
- Correct V1 page table defines using information from
kern_gmmu_fmt_gm10x.c in NVIDIA's open-gpu-kernel-modules repo.
- Verify page table format in search_v1_page_directory()
- Better, more controllable logging in mmu.c
Tested on GM204 (GTX 970).
|
|
|
|
|
|
|
|
|
|
|
|
| |
Resubmits the runlist in an identical configuration. Causes the
runlist scheduler to:
1. Reload and cache timeslice and scale values from TSGs.
2. Restart scheduling from the head of the runlist [may cause a
preempt to be scheduled for the currently-running task (?)].
3. Address (?) an errata on Turing where re-enabled channels are
not always detected.
Above behavior tested on GV100 and partially tested on TU102.
|
| |
|
|
|
|
|
|
| |
Previously, addresses of any aperture would match. This can result
in very confusing results when attempting to verify that a mapping
correctly exists.
|
|
|
|
|
|
|
| |
Previously attempted to dereference non-I/O-MMU translated
addresses, which resulted in accessing effectively arbitrary
physical addresses, causing errononus results from
search_page_directory() in particular.
|
| |
|
| |
|
|
|
|
|
| |
- Move Linux-specific functions to nvdebug_linux.h and .c
- Workaround PDE_DATA() being pde_data() on Linux 5.17+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Re-read PRAMIN configuration after update to verify change applies
- Return a page_dir_config_t rather than just an address and page.
table version from `get_bar2_pdb()`.
- Less verbose logging for MMU-related functions by default.
- Perform all conversion from SYS_MEM/VID_MEM addresses to kernel
addresses inside the translation functions, via the new function
'pd_deref()`.
- Support use of an I/O MMU, page tables/directories outside the
current PRAMIN window, and page tables/directories arbitrarially
located in SYS_MEM or VID_MEM on different levels of the same tree.
- Heavily improve documentation and add references for Version 1 and
Version 0 page tables.
- Improve logging in `runlist.c` to include runlist and chip IDs.
- Update all users of search_page_directory* to use the new API.
- Remove now-unused supporting functions from `mmu.c`.
Tested on GTX 970, GTX 1060 3GB, Jetson TX2, Titan V, Jetson Xavier,
and RTX 2080 Ti.
|
|
|
|
|
|
| |
Blindly using an invalid return address was resulting in undefined
behavior due to traversal of non-page-table addresses as though they
were part of the page table.
|
| |
|
|
|
|
|
|
|
|
| |
This would occationally manifest as an inability to find the runlist
page in BAR2, as only part of the page table was being traversed.
Also includes non-functional changes to documentation, scoping, and
structure layout.
|
|
|
|
|
|
|
|
|
|
| |
- Do not create gpc*_mask files on pre-Maxwell GPUs (tested
unavailable on the K5000s)
- Use correct register offsets for gpc*_mask files on Ampere+ GPUs
- Document GPC and TPC count and fuse registers.
- Correctly handle errors for creation of all ProcFS files
- Remove unecessary error-handling temp variables in nvdebug_entry
- Misc naming, comment, and layout cleanup
|
|
|
|
|
| |
Also update how Instance Pointers are aligned in the runlist output
to make them more easily distinguishable from other fields.
|
|
|
|
|
| |
Fixes a bug that caused the runlist output to be garbled on the GV100
GPU (the Titan V).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Support differently-formatted runlist registers on Turing
- Support different runlist register offsets on Turing
- Fix incorrect indenting when printing the runlist
- Fix `preempt_tsg` and `switch_to_tsg` API implementations to
correctly interface with the hardware (previously, they would try
to disable scheduling for the last-updated runlist pointer, which
was nonsense, and just an artifact of my early misunderstandings
of how the NV_PFIFO_RUNLIST* registers worked).
- Remove misused NV_PFIFO_RUNLIST and NV_PFIFO_RUNLIST_BASE registers
- Refactor `runlist.c` to use the APIs from `bus.c`
|
| |
|
|
|
|
|
| |
Derived from logic in `runlist.c` and `mmu.c`. The new functions
are not directly used in this commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than up to dozens of individual files exposing part of each
copy engine's configuration, have one file which exposes a unified
view of the full topology. Example new output on RTX 2080 Ti:
$ cat /proc/gpu0/copy_topology
GRCE0 -> LCE04
GRCE1 -> LCE03
LCE02 -> PCE02
LCE03 -> PCE03
LCE04 -> PCE01
Old output:
$ tail -n 1 /proc/gpu0/lce_for_pce*
==> /proc/gpu0/lce_for_pce0 <==
0xf
==> /proc/gpu0/lce_for_pce1 <==
0x4
==> /proc/gpu0/lce_for_pce2 <==
0x2
==> /proc/gpu0/lce_for_pce3 <==
0x3
$ tail -n 1 /proc/gpu1/shared_lce_for_grce*
==> /proc/gpu0/shared_lce_for_grce0 <==
0x4
==> /proc/gpu0/shared_lce_for_grce1 <==
0x3
Specifically:
- Add `copy_topology` API
- Remove `shared_lce_for_grce#` and `lce_for_pce#` APIs
- Move logic from `nvdebug_entry.c` to `copy_topology_procfs.c`
- Do not print PCE or Shared LCE configuration if flagged absent
- Refer to LCE0 and LCE1 as GRCE0 and GRCE1
- Print by LCE ID, which is move helpful when attempting to trace
how a given copy runlist maps to a physical copy engine.
- Document two errata with CE registers
Tested working on Pascal Integrated, Pascal, Volta Integrated
Volta, Turing, and Ampere Integrated on Linux 4.9 through 5.10.
|
|
|
|
|
|
|
|
| |
Tested working on Pascal, Volta, Volta Integrated, Turing, Ampere,
and Ada.
Also clean up minor spacing issues, an errantly added file
(nvdebug.mod), and fix some inconsistencies with upstream.
|
|
|
|
| |
mappings
|
| |
|
| |
|
| |
|
|
|
|
| |
in nvdebug_entry.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit c3d6f2c852eb046e9d4f4f1e6527b52c746b2693
Author: Joshua Bakita <bakitajoshua@gmail.com>
Date: Sun Oct 29 14:37:51 2023 -0400
Print Ampere+ device_info fields with correct offsets/widths
Everything now has been checked against how nvgpu handles it
commit b70849d1ce67a58f9f69b37dc62122f789f4cdf7
Author: Joshua Bakita <jbakita@cs.unc.edu>
Date: Wed Sep 20 14:27:38 2023 -0400
Rearrange, fix an off-by-one error, and remove an unused define
The code in nvdebug.h has been rearranged to enable an easier merge
against the jbakita-wip branch.
commit 51f808e092846a60ea6c88ea3a1d2e349c92977b
Author: Joshua Bakita <jbakita@cs.unc.edu>
Date: Wed Sep 20 13:09:17 2023 -0400
Bug fixes and cleanup for new device_info logic
- Update comments to match new structure
- Make show() function idempotent
- Skip empty table entries without aborting
- Include names for new engine types
- Add warning log messages for skipped table entries
- Remove non-functional runlist file creation logic for Ampere+
commit 1d7adc3be1aef5ac9c144bb24008fd8cc5d688a5
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Sat Aug 19 12:47:18 2023 -0400
Debugging changes made to restore functionality following refactoring.
- Debugged data display errors.
- Debugged crash bugs.
- Debugged memory issue.
commit 9e6cc03cdf736fbd817ed53fa9a7f506bc91a244
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Wed Aug 16 22:00:20 2023 -0400
A variety of changes have been made as part of the code review.
- Functions have been consolidated.
- Code was clarified and tidied up overall.
- Unnecessary elements were removed.
commit 845960fc1b15995fdbd6d61c384567652a150bc4
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Fri Jul 28 11:39:28 2023 -0400
Refactored various systems and debugged minor issues
- Added device_info_iter
- Merged functions in device_info_procfs.c
- Separated device_info data structs by version in nvdebug.h
- Fixed issue with device_info runlist ID data
commit 8a57aaeba41c43233c323d7e0fc8bf1a81ebc65e
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Fri Jul 21 11:32:51 2023 -0400
I have updated the ptop_device_info_t comment in nvdebug.h.
commit 33c915f08f5dc63674b158ecc18897494256a6d0
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Wed Jul 19 13:02:52 2023 -0400
Debugged device_info functionality
- Fixed device_info crash bugs
- Made further edits to display functionality
- Refactored code to enhance readability
commit bfb4dcf0e78954c0163f3a06a5a088c4d1b437a8
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Thu Jul 13 12:13:17 2023 -0400
This commit is to update the repo for display during a meeting.
- Added an Ampere version of the device info data.
- Added Ampere versions of auxillary functions.
- Modified display functions to accommodate Ampere data.
- Made other various small modifications.
commit 068e7f4e7208d6c9250ad72208e0b36fd9fdf2f6
Merge: 3725b15 073e897
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Mon Jul 10 12:39:12 2023 -0400
Merge branch 'jbakita-wip' of ssh://rtsrv.cs.unc.edu/public/nvdebug into wip
I am merging Mr. Bakita's changes (046d7d2) into this repository.
commit 3725b15d5da3e06ef202045d710aa5f15eb72fcc
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Mon Jul 3 04:30:54 2023 -0400
I modified nvdebug.h for Ampere.
|
|
|
|
|
| |
Sometimes such "malformed" runlists appear on the TX2, yet they
seem to work fine, so support printing them in full.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using this may be hazardous---we don't know if some of the GPU drivers
use this after initial bring-up. If they do, and we race with them in
setting it, or we unexpectedly change it under them, arbitrary state
corruption could occur.
This is only entirely safe to use if you don't trust the GPU state
after the first use of this fallback. In limited experiments vs the
`nvgpu` (Tegra) and `nvidia` (closed-source discrete) drivers, no
ill side effects have yet been observed, but still please use with
caution.
|
|
|
|
|
|
| |
Previously, unloading the module could cause a segfault on Tegra,
as pcid would be unitilized and possibly non-zero, causing us to
attempt PCIe-device-style deinitialization on a non-PCEe device.
|
|
|
|
|
| |
Fixes the build for some kernel versions where this is no longer
transatively included.
|
|
|
|
|
| |
Also add instructions for updating `include/`. These files are now
only needed to build on Linux 4.9-based Tegra platforms.
|
| |
|
| |
|
| |
|
|
|
|
| |
Also check for read success in get_runlist_iter().
|
|
|
|
|
|
|
|
|
| |
- Including missing dereference to finish getting the address of
the control register range
- Add zero-initialization to the proc_ops structure in copat_ops to
insure that all intentionally unset fields remain unset
- Set .llseek in all the file_operations structures, as recent
kernels require this to be explictly set
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Works around change in parameters to proc initialization functions
via a hacky function which rewrites the layout. This also required
making all the struct file_operations writable.
Also start reducing dependency on nvgpu headers.
Known issues:
- Incorrect message printed in log after module is loaded. Unclear
if this is because the register detection logic is broken, or if
the layout of the data at NV_MC_BOOT_0 has changed.
- Not tested
|