| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
| |
- Do not create gpc*_mask files on pre-Maxwell GPUs (tested
unavailable on the K5000s)
- Use correct register offsets for gpc*_mask files on Ampere+ GPUs
- Document GPC and TPC count and fuse registers.
- Correctly handle errors for creation of all ProcFS files
- Remove unecessary error-handling temp variables in nvdebug_entry
- Misc naming, comment, and layout cleanup
|
|
|
|
|
| |
Also update how Instance Pointers are aligned in the runlist output
to make them more easily distinguishable from other fields.
|
|
|
|
|
| |
Fixes a bug that caused the runlist output to be garbled on the GV100
GPU (the Titan V).
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Support differently-formatted runlist registers on Turing
- Support different runlist register offsets on Turing
- Fix incorrect indenting when printing the runlist
- Fix `preempt_tsg` and `switch_to_tsg` API implementations to
correctly interface with the hardware (previously, they would try
to disable scheduling for the last-updated runlist pointer, which
was nonsense, and just an artifact of my early misunderstandings
of how the NV_PFIFO_RUNLIST* registers worked).
- Remove misused NV_PFIFO_RUNLIST and NV_PFIFO_RUNLIST_BASE registers
- Refactor `runlist.c` to use the APIs from `bus.c`
|
| |
|
|
|
|
|
| |
Derived from logic in `runlist.c` and `mmu.c`. The new functions
are not directly used in this commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than up to dozens of individual files exposing part of each
copy engine's configuration, have one file which exposes a unified
view of the full topology. Example new output on RTX 2080 Ti:
$ cat /proc/gpu0/copy_topology
GRCE0 -> LCE04
GRCE1 -> LCE03
LCE02 -> PCE02
LCE03 -> PCE03
LCE04 -> PCE01
Old output:
$ tail -n 1 /proc/gpu0/lce_for_pce*
==> /proc/gpu0/lce_for_pce0 <==
0xf
==> /proc/gpu0/lce_for_pce1 <==
0x4
==> /proc/gpu0/lce_for_pce2 <==
0x2
==> /proc/gpu0/lce_for_pce3 <==
0x3
$ tail -n 1 /proc/gpu1/shared_lce_for_grce*
==> /proc/gpu0/shared_lce_for_grce0 <==
0x4
==> /proc/gpu0/shared_lce_for_grce1 <==
0x3
Specifically:
- Add `copy_topology` API
- Remove `shared_lce_for_grce#` and `lce_for_pce#` APIs
- Move logic from `nvdebug_entry.c` to `copy_topology_procfs.c`
- Do not print PCE or Shared LCE configuration if flagged absent
- Refer to LCE0 and LCE1 as GRCE0 and GRCE1
- Print by LCE ID, which is move helpful when attempting to trace
how a given copy runlist maps to a physical copy engine.
- Document two errata with CE registers
Tested working on Pascal Integrated, Pascal, Volta Integrated
Volta, Turing, and Ampere Integrated on Linux 4.9 through 5.10.
|
|
|
|
|
|
|
|
| |
Tested working on Pascal, Volta, Volta Integrated, Turing, Ampere,
and Ada.
Also clean up minor spacing issues, an errantly added file
(nvdebug.mod), and fix some inconsistencies with upstream.
|
|
|
|
| |
mappings
|
| |
|
| |
|
| |
|
|
|
|
| |
in nvdebug_entry.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit c3d6f2c852eb046e9d4f4f1e6527b52c746b2693
Author: Joshua Bakita <bakitajoshua@gmail.com>
Date: Sun Oct 29 14:37:51 2023 -0400
Print Ampere+ device_info fields with correct offsets/widths
Everything now has been checked against how nvgpu handles it
commit b70849d1ce67a58f9f69b37dc62122f789f4cdf7
Author: Joshua Bakita <jbakita@cs.unc.edu>
Date: Wed Sep 20 14:27:38 2023 -0400
Rearrange, fix an off-by-one error, and remove an unused define
The code in nvdebug.h has been rearranged to enable an easier merge
against the jbakita-wip branch.
commit 51f808e092846a60ea6c88ea3a1d2e349c92977b
Author: Joshua Bakita <jbakita@cs.unc.edu>
Date: Wed Sep 20 13:09:17 2023 -0400
Bug fixes and cleanup for new device_info logic
- Update comments to match new structure
- Make show() function idempotent
- Skip empty table entries without aborting
- Include names for new engine types
- Add warning log messages for skipped table entries
- Remove non-functional runlist file creation logic for Ampere+
commit 1d7adc3be1aef5ac9c144bb24008fd8cc5d688a5
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Sat Aug 19 12:47:18 2023 -0400
Debugging changes made to restore functionality following refactoring.
- Debugged data display errors.
- Debugged crash bugs.
- Debugged memory issue.
commit 9e6cc03cdf736fbd817ed53fa9a7f506bc91a244
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Wed Aug 16 22:00:20 2023 -0400
A variety of changes have been made as part of the code review.
- Functions have been consolidated.
- Code was clarified and tidied up overall.
- Unnecessary elements were removed.
commit 845960fc1b15995fdbd6d61c384567652a150bc4
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Fri Jul 28 11:39:28 2023 -0400
Refactored various systems and debugged minor issues
- Added device_info_iter
- Merged functions in device_info_procfs.c
- Separated device_info data structs by version in nvdebug.h
- Fixed issue with device_info runlist ID data
commit 8a57aaeba41c43233c323d7e0fc8bf1a81ebc65e
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Fri Jul 21 11:32:51 2023 -0400
I have updated the ptop_device_info_t comment in nvdebug.h.
commit 33c915f08f5dc63674b158ecc18897494256a6d0
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Wed Jul 19 13:02:52 2023 -0400
Debugged device_info functionality
- Fixed device_info crash bugs
- Made further edits to display functionality
- Refactored code to enhance readability
commit bfb4dcf0e78954c0163f3a06a5a088c4d1b437a8
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Thu Jul 13 12:13:17 2023 -0400
This commit is to update the repo for display during a meeting.
- Added an Ampere version of the device info data.
- Added Ampere versions of auxillary functions.
- Modified display functions to accommodate Ampere data.
- Made other various small modifications.
commit 068e7f4e7208d6c9250ad72208e0b36fd9fdf2f6
Merge: 3725b15 073e897
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Mon Jul 10 12:39:12 2023 -0400
Merge branch 'jbakita-wip' of ssh://rtsrv.cs.unc.edu/public/nvdebug into wip
I am merging Mr. Bakita's changes (046d7d2) into this repository.
commit 3725b15d5da3e06ef202045d710aa5f15eb72fcc
Author: Benjamin Hadad IV <bh4@unc.edu>
Date: Mon Jul 3 04:30:54 2023 -0400
I modified nvdebug.h for Ampere.
|
|
|
|
|
| |
Sometimes such "malformed" runlists appear on the TX2, yet they
seem to work fine, so support printing them in full.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using this may be hazardous---we don't know if some of the GPU drivers
use this after initial bring-up. If they do, and we race with them in
setting it, or we unexpectedly change it under them, arbitrary state
corruption could occur.
This is only entirely safe to use if you don't trust the GPU state
after the first use of this fallback. In limited experiments vs the
`nvgpu` (Tegra) and `nvidia` (closed-source discrete) drivers, no
ill side effects have yet been observed, but still please use with
caution.
|
|
|
|
|
|
| |
Previously, unloading the module could cause a segfault on Tegra,
as pcid would be unitilized and possibly non-zero, causing us to
attempt PCIe-device-style deinitialization on a non-PCEe device.
|
|
|
|
|
| |
Fixes the build for some kernel versions where this is no longer
transatively included.
|
|
|
|
|
| |
Also add instructions for updating `include/`. These files are now
only needed to build on Linux 4.9-based Tegra platforms.
|
| |
|
| |
|
| |
|
|
|
|
| |
Also check for read success in get_runlist_iter().
|
|
|
|
|
|
|
|
|
| |
- Including missing dereference to finish getting the address of
the control register range
- Add zero-initialization to the proc_ops structure in copat_ops to
insure that all intentionally unset fields remain unset
- Set .llseek in all the file_operations structures, as recent
kernels require this to be explictly set
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Works around change in parameters to proc initialization functions
via a hacky function which rewrites the layout. This also required
making all the struct file_operations writable.
Also start reducing dependency on nvgpu headers.
Known issues:
- Incorrect message printed in log after module is loaded. Unclear
if this is because the register detection logic is broken, or if
the layout of the data at NV_MC_BOOT_0 has changed.
- Not tested
|
| |
|
|
|
|
|
|
| |
These are needed to build on NVIDIA's Jetson boards for the time
being. Only a couple structs are required, so it should be fairly
easy to remove this dependency at some point in the future.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Adds:
- /proc/preempt_tsg which takes a TSG ID
- /proc/disable_channel which takes a channel ID
- /proc/enable_channel which takes a channel ID
- /proc/switch_to_tsg which takes a TSG ID
Also significantly expands documentation and structs available in
nvdebug.h.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- The sequence file infrastructure prior to kernel version 4.19
has a bug in the retry code when the write buffer overflows that
causes our private iterator state to be corrupted. Work around
this by tracking some info out-of-band.
- Now supports including detailed channel status information from
channel RAM when printing the runlist.
- Adds helper function to probe for and return struct gk20a*.
|
|
|
|
|
|
|
| |
`cat /proc/runlist` to print the current runlist.
Also break nvdebug.c into nvdebug_entry.c, runlist.c, and
runlist_procfs.c.
|
|
Supports accessing and printing the runlist on the Jetson Xavier to
dmesg. May work on other Jetson boards. Currently requires the nvgpu
headers from NVIDIA's Linux4Tegra (L4T) source tree.
|