| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
**Modifes the user API from `cat /proc/gpuX/runlist0` to
`cat /proc/gpuX/runlist0/runlist` to support runlist-scoped
registers**
- Count number of runlists via Ampere-style PTOP parsing.
- Create a ProcFS directory for each runlist, and create the runlist
printing file in this directory.
- Document the newly-added/-formatted Runlist RAM and Channel RAM
registers.
- Add a helper function `get_runlist_ram()` to obtain the location
of each runlist's registers.
- Support printing Ampere-style Channel RAM entries.
Tested on Jetson Orin (ga10b), A100, H100, and AD102 (RTX 6000 Ada)
|
| |
|
|
|
|
|
|
|
|
| |
- Handle relocation of PRAMIN window configuration register
- Handle new format for BAR2 configuration
- Catch unreadable PRAMIN configuration
Tested on A100, H100, and AD102 (RTX 6000 Ada).
|
|
|
|
|
|
|
| |
- Document topology registers (PTOP) on Ampere+
- Document graphics copy engine configuration registers
- Move resubmit_runlist range checks into runlist.c
- Miscellaneous spacing, typo, and minor documentation fixes
|
| |
|
|
|
|
|
|
|
|
|
| |
- Correct V1 page table defines using information from
kern_gmmu_fmt_gm10x.c in NVIDIA's open-gpu-kernel-modules repo.
- Verify page table format in search_v1_page_directory()
- Better, more controllable logging in mmu.c
Tested on GM204 (GTX 970).
|
|
|
|
|
|
|
|
|
|
|
|
| |
Resubmits the runlist in an identical configuration. Causes the
runlist scheduler to:
1. Reload and cache timeslice and scale values from TSGs.
2. Restart scheduling from the head of the runlist [may cause a
preempt to be scheduled for the currently-running task (?)].
3. Address (?) an errata on Turing where re-enabled channels are
not always detected.
Above behavior tested on GV100 and partially tested on TU102.
|
|
|
|
|
|
| |
Previously, addresses of any aperture would match. This can result
in very confusing results when attempting to verify that a mapping
correctly exists.
|
| |
|
| |
|
|
|
|
|
| |
- Move Linux-specific functions to nvdebug_linux.h and .c
- Workaround PDE_DATA() being pde_data() on Linux 5.17+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Re-read PRAMIN configuration after update to verify change applies
- Return a page_dir_config_t rather than just an address and page.
table version from `get_bar2_pdb()`.
- Less verbose logging for MMU-related functions by default.
- Perform all conversion from SYS_MEM/VID_MEM addresses to kernel
addresses inside the translation functions, via the new function
'pd_deref()`.
- Support use of an I/O MMU, page tables/directories outside the
current PRAMIN window, and page tables/directories arbitrarially
located in SYS_MEM or VID_MEM on different levels of the same tree.
- Heavily improve documentation and add references for Version 1 and
Version 0 page tables.
- Improve logging in `runlist.c` to include runlist and chip IDs.
- Update all users of search_page_directory* to use the new API.
- Remove now-unused supporting functions from `mmu.c`.
Tested on GTX 970, GTX 1060 3GB, Jetson TX2, Titan V, Jetson Xavier,
and RTX 2080 Ti.
|
|
|
|
|
|
| |
Blindly using an invalid return address was resulting in undefined
behavior due to traversal of non-page-table addresses as though they
were part of the page table.
|
| |
|
|
|
|
|
|
|
|
| |
This would occationally manifest as an inability to find the runlist
page in BAR2, as only part of the page table was being traversed.
Also includes non-functional changes to documentation, scoping, and
structure layout.
|
|
|
|
|
|
|
|
|
|
| |
- Do not create gpc*_mask files on pre-Maxwell GPUs (tested
unavailable on the K5000s)
- Use correct register offsets for gpc*_mask files on Ampere+ GPUs
- Document GPC and TPC count and fuse registers.
- Correctly handle errors for creation of all ProcFS files
- Remove unecessary error-handling temp variables in nvdebug_entry
- Misc naming, comment, and layout cleanup
|
|
|
|
|
| |
Also update how Instance Pointers are aligned in the runlist output
to make them more easily distinguishable from other fields.
|
|
|
|
|
| |
Fixes a bug that caused the runlist output to be garbled on the GV100
GPU (the Titan V).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Support differently-formatted runlist registers on Turing
- Support different runlist register offsets on Turing
- Fix incorrect indenting when printing the runlist
- Fix `preempt_tsg` and `switch_to_tsg` API implementations to
correctly interface with the hardware (previously, they would try
to disable scheduling for the last-updated runlist pointer, which
was nonsense, and just an artifact of my early misunderstandings
of how the NV_PFIFO_RUNLIST* registers worked).
- Remove misused NV_PFIFO_RUNLIST and NV_PFIFO_RUNLIST_BASE registers
- Refactor `runlist.c` to use the APIs from `bus.c`
|
|
|
|
|
| |
Derived from logic in `runlist.c` and `mmu.c`. The new functions
are not directly used in this commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than up to dozens of individual files exposing part of each
copy engine's configuration, have one file which exposes a unified
view of the full topology. Example new output on RTX 2080 Ti:
$ cat /proc/gpu0/copy_topology
GRCE0 -> LCE04
GRCE1 -> LCE03
LCE02 -> PCE02
LCE03 -> PCE03
LCE04 -> PCE01
Old output:
$ tail -n 1 /proc/gpu0/lce_for_pce*
==> /proc/gpu0/lce_for_pce0 <==
0xf
==> /proc/gpu0/lce_for_pce1 <==
0x4
==> /proc/gpu0/lce_for_pce2 <==
0x2
==> /proc/gpu0/lce_for_pce3 <==
0x3
$ tail -n 1 /proc/gpu1/shared_lce_for_grce*
==> /proc/gpu0/shared_lce_for_grce0 <==
0x4
==> /proc/gpu0/shared_lce_for_grce1 <==
0x3
Specifically:
- Add `copy_topology` API
- Remove `shared_lce_for_grce#` and `lce_for_pce#` APIs
- Move logic from `nvdebug_entry.c` to `copy_topology_procfs.c`
- Do not print PCE or Shared LCE configuration if flagged absent
- Refer to LCE0 and LCE1 as GRCE0 and GRCE1
- Print by LCE ID, which is move helpful when attempting to trace
how a given copy runlist maps to a physical copy engine.
- Document two errata with CE registers
Tested working on Pascal Integrated, Pascal, Volta Integrated
Volta, Turing, and Ampere Integrated on Linux 4.9 through 5.10.
|
|
|
|
|
|
|
|
| |
Tested working on Pascal, Volta, Volta Integrated, Turing, Ampere,
and Ada.
Also clean up minor spacing issues, an errantly added file
(nvdebug.mod), and fix some inconsistencies with upstream.
|
|
|
|
| |
mappings
|
| |
|
| |
|
|
|
|
| |
in nvdebug_entry.c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit c3d6f2c852eb046e9d4f4f1e6527b52c746b2693
Author: Joshua Bakita <bakitajoshua@gmail.com>
Date: Sun Oct 29 14:37:51 2023 -0400
Print Ampere+ device_info fields with correct offsets/widths
Everything now has been checked against how nvgpu handles it
commit b70849d1ce67a58f9f69b37dc62122f789f4cdf7
Author: Joshua Bakita <jbakita@cs.unc.edu>
Date: Wed Sep 20 14:27:38 2023 -0400
Rearrange, fix an off-by-one error, and remove an unused define
The code in nvdebug.h has been rearranged to enable an easier merge
against the jbakita-wip branch.
commit 51f808e092846a60ea6c88ea3a1d2e349c92977b
Author: Joshua Bakita <jbakita@cs.unc.edu>
Date: Wed Sep 20 13:09:17 2023 -0400
Bug fixes and cleanup for new device_info logic
- Update comments to match new structure
- Make show() function idempotent
- Skip empty table entries without aborting
- Include names for new engine types
- Add warning log messages for skipped table entries
- Remove non-functional runlist file creation logic for Ampere+
commit 1d7adc3be1aef5ac9c144bb24008fd8cc5d688a5
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Sat Aug 19 12:47:18 2023 -0400
Debugging changes made to restore functionality following refactoring.
- Debugged data display errors.
- Debugged crash bugs.
- Debugged memory issue.
commit 9e6cc03cdf736fbd817ed53fa9a7f506bc91a244
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Wed Aug 16 22:00:20 2023 -0400
A variety of changes have been made as part of the code review.
- Functions have been consolidated.
- Code was clarified and tidied up overall.
- Unnecessary elements were removed.
commit 845960fc1b15995fdbd6d61c384567652a150bc4
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Fri Jul 28 11:39:28 2023 -0400
Refactored various systems and debugged minor issues
- Added device_info_iter
- Merged functions in device_info_procfs.c
- Separated device_info data structs by version in nvdebug.h
- Fixed issue with device_info runlist ID data
commit 8a57aaeba41c43233c323d7e0fc8bf1a81ebc65e
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Fri Jul 21 11:32:51 2023 -0400
I have updated the ptop_device_info_t comment in nvdebug.h.
commit 33c915f08f5dc63674b158ecc18897494256a6d0
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Wed Jul 19 13:02:52 2023 -0400
Debugged device_info functionality
- Fixed device_info crash bugs
- Made further edits to display functionality
- Refactored code to enhance readability
commit bfb4dcf0e78954c0163f3a06a5a088c4d1b437a8
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Thu Jul 13 12:13:17 2023 -0400
This commit is to update the repo for display during a meeting.
- Added an Ampere version of the device info data.
- Added Ampere versions of auxillary functions.
- Modified display functions to accommodate Ampere data.
- Made other various small modifications.
commit 068e7f4e7208d6c9250ad72208e0b36fd9fdf2f6
Merge: 3725b15 073e897
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Mon Jul 10 12:39:12 2023 -0400
Merge branch 'jbakita-wip' of ssh://rtsrv.cs.unc.edu/public/nvdebug into wip
I am merging Mr. Bakita's changes (046d7d2) into this repository.
commit 3725b15d5da3e06ef202045d710aa5f15eb72fcc
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Mon Jul 3 04:30:54 2023 -0400
I modified nvdebug.h for Ampere.
|
|
|
|
|
| |
Fixes the build for some kernel versions where this is no longer
transatively included.
|
|
|
|
|
| |
Also add instructions for updating `include/`. These files are now
only needed to build on Linux 4.9-based Tegra platforms.
|
| |
|
| |
|
|
|
|
| |
Also check for read success in get_runlist_iter().
|
|
|
|
|
|
|
|
|
| |
- Including missing dereference to finish getting the address of
the control register range
- Add zero-initialization to the proc_ops structure in copat_ops to
insure that all intentionally unset fields remain unset
- Set .llseek in all the file_operations structures, as recent
kernels require this to be explictly set
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Works around change in parameters to proc initialization functions
via a hacky function which rewrites the layout. This also required
making all the struct file_operations writable.
Also start reducing dependency on nvgpu headers.
Known issues:
- Incorrect message printed in log after module is loaded. Unclear
if this is because the register detection logic is broken, or if
the layout of the data at NV_MC_BOOT_0 has changed.
- Not tested
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Adds:
- /proc/preempt_tsg which takes a TSG ID
- /proc/disable_channel which takes a channel ID
- /proc/enable_channel which takes a channel ID
- /proc/switch_to_tsg which takes a TSG ID
Also significantly expands documentation and structs available in
nvdebug.h.
|
|
|
|
|
|
|
|
|
|
| |
- The sequence file infrastructure prior to kernel version 4.19
has a bug in the retry code when the write buffer overflows that
causes our private iterator state to be corrupted. Work around
this by tracking some info out-of-band.
- Now supports including detailed channel status information from
channel RAM when printing the runlist.
- Adds helper function to probe for and return struct gk20a*.
|
|
|
|
|
|
|
| |
`cat /proc/runlist` to print the current runlist.
Also break nvdebug.c into nvdebug_entry.c, runlist.c, and
runlist_procfs.c.
|
|
Supports accessing and printing the runlist on the Jetson Xavier to
dmesg. May work on other Jetson boards. Currently requires the nvgpu
headers from NVIDIA's Linux4Tegra (L4T) source tree.
|