diff options
author | Konsta Holtta <kholtta@nvidia.com> | 2018-05-08 10:02:25 -0400 |
---|---|---|
committer | mobile promotions <svcmobile_promotions@nvidia.com> | 2018-05-11 12:53:35 -0400 |
commit | 07310de8c1d043b0e5efdcf2d38c28c432b1c9ce (patch) | |
tree | 46c3193d7b16f901ebda079c8ada4f4e704aa9ab /include/trace/events/gk20a.h | |
parent | a7288b58676f14a847592b6d6dcbe9080dfb9edb (diff) |
gpu: nvgpu: poll watchdog status actively
Read GP_GET and GET from hardware every time when the poll timer expires
instead of when the watchdog timer expires. Restart the watchdog timer
if the get pointers have increased since the previous read. This way
stuck channels are detected quicker.
Previously it could have taken at most twice the watchdog timeout limit
for a stuck channel to get recovered; with this change, a coarse sliding
window is used. The polling period is still 100 ms.
The difference is illustrated in the following diagram:
time 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
get a b b b b b b b b b b b b b b b b b b b b
prev - n n n n n n n n n A n n n n n n n n n S
next - A s s s s s s s s s S
"time" represents wall time in polling units; 0 is submit time. For
simplicity, watchdog timeout is ten units. "get" is the GP_GET that
advances a little from a and then gets stuck at b. "prev" is the
previous behaviour, "next" is after this patch. "A" is when the channel
is detected as advanced, and "S" when it's found stuck and recovered;
small "s" is when it's found stuck but when the time limit has not yet
expired and "n" is when the hw state is not read.
Bug 1700277
Bug 1982826
Change-Id: Ie2921920d5396cee652729c6a7162b740d7a1f06
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1710554
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Diffstat (limited to 'include/trace/events/gk20a.h')
0 files changed, 0 insertions, 0 deletions