aboutsummaryrefslogtreecommitdiffstats
path: root/nvidia_preemption.md
blob: 051d4a5cfe4302f35f2fad9633344721bc9358b6 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# NVIDIA GPU Preemption

MVP: Preempt current work on the GPU on the Jetson Xavier

Summary of approach: Create new runlist that excludes the current work and point the GPU to it

1. Obtain current runlist
2. Copy runlist to new location, skipping TSG of target to preempt
3. Write new runlist address to NV_PFIFO_RUNLIST, which will preempt current work

It's unclear if this approach is lower-overhead than that of Capodieci et al.
See approach Alternate 1 which is our new priority.

Notes:
- Each TSG (timeslice group) corresponds to one context (?)
- Runlist base must be 4k aligned
- nvgpu driver gets gk20a struct via container_of an inode which is a struct nvgpu_os_linux
- gk20a_writel is nvgpu_writel. Define is: `void nvgpu_writel(struct gk20a *g, u32 reg_addr, u32 value);`
- gk20a_readl is nvgpu_readl. Define is: `u32 nvgpu_readl(struct gk20a *g, u32 reg_addr);`

## Other approaches:

### Alternate 1:
    "2. Disable all channels in the containing TSG by writing ENABLE_CLR to TRUE
        in their channel RAM entries in NV_PCCSR_CHANNEL (see dev_fifo.ref).
     3. Initiate a preempt of the TSG via NV_PFIFO_PREEMPT or
        NV_PFIFO_RUNLIST_PREEMPT." (PBDMA, "Recovery procedure")

### Alternate 2:
 "3. Initiate a preempt of the engine by writing the bit associated with its
     runlist to NV_PFIFO_RUNLIST_PREEMPT.  This allows us to begin the preempt
     process prior to doing the slow register reads needed to determine whether
     the context has hit any interrupts or is hung.  Do not poll
     NV_PFIFO_RUNLIST_PREEMPT for the preempt to complete." (FIFO, "Context TSG tear-down procedure")

See `nvdebug.c` and `nvdebug.h` for implementation details.