summaryrefslogtreecommitdiffstats
path: root/drivers
diff options
context:
space:
mode:
authorSeema Khowala <seemaj@nvidia.com>2019-03-01 14:48:32 -0500
committermobile promotions <svcmobile_promotions@nvidia.com>2019-05-02 05:43:42 -0400
commit889271dc04a1912d25f1f1ff35c3e4cb67be415e (patch)
tree15f14756baacb126e81f95c4d2ced49029b7ed12 /drivers
parentdd282e229a46a23c1e0260435e05fa8ab47529f6 (diff)
gpu: nvgpu: change err to info print if failing eng id is -1
For handle_sched_error, change err to info print for failing eng id returned as -1 i.e. FIFO_INVAL_ENGINE_ID as no engine is found busy doing ctxsw. May be ctxsw already finished for the context for which ctxsw timeout intr was triggered. Possible Causes: a) On hitting engine reset, h/w drops the ctxsw_status to INVALID in fifo_engine_status register. Also while the engine is held in reset h/w passes busy/idle straight through. fifo_engine_status registers are correct in that there is no context switch outstanding as the CTXSW is aborted when reset is asserted. This is just a side effect of how gv100 and earlier versions of ctxsw_timeout behave. With gv10b and later, h/w snaps the context at the point of error so that s/w can see the tsg_id which caused the HW timeout. b) If engines are not busy and ctxsw state is valid then intr occurred in the past and if the ctxsw state has moved on to VALID from LOAD or SAVE, it means that whatever timed out eventually finished anyways. The problem with this is that s/w cannot conclude which context caused the problem as maybe more switches occurred before intr is handled. Bug 2092051 Bug 2429295 Bug 2484211 Bug 1890287 Change-Id: Ia79bee6e860fb179ee39024c963671d4f8245227 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2030866 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry-picked from d27f875d2c7839d3b1ec7db80d83594509ff2ea8 in dev-kernel) Reviewed-on: https://git-master.nvidia.com/r/2076126 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Diffstat (limited to 'drivers')
-rw-r--r--drivers/gpu/nvgpu/gk20a/fifo_gk20a.c28
1 files changed, 28 insertions, 0 deletions
diff --git a/drivers/gpu/nvgpu/gk20a/fifo_gk20a.c b/drivers/gpu/nvgpu/gk20a/fifo_gk20a.c
index dbed9880..78f777ae 100644
--- a/drivers/gpu/nvgpu/gk20a/fifo_gk20a.c
+++ b/drivers/gpu/nvgpu/gk20a/fifo_gk20a.c
@@ -2410,6 +2410,34 @@ bool gk20a_fifo_handle_sched_error(struct gk20a *g)
2410 sched_error = gk20a_readl(g, fifo_intr_sched_error_r()); 2410 sched_error = gk20a_readl(g, fifo_intr_sched_error_r());
2411 2411
2412 engine_id = gk20a_fifo_get_failing_engine_data(g, &id, &is_tsg); 2412 engine_id = gk20a_fifo_get_failing_engine_data(g, &id, &is_tsg);
2413 /*
2414 * Could not find the engine
2415 * Possible Causes:
2416 * a)
2417 * On hitting engine reset, h/w drops the ctxsw_status to INVALID in
2418 * fifo_engine_status register. Also while the engine is held in reset
2419 * h/w passes busy/idle straight through. fifo_engine_status registers
2420 * are correct in that there is no context switch outstanding
2421 * as the CTXSW is aborted when reset is asserted.
2422 * This is just a side effect of how gv100 and earlier versions of
2423 * ctxsw_timeout behave.
2424 * With gv11b and later, h/w snaps the context at the point of error
2425 * so that s/w can see the tsg_id which caused the HW timeout.
2426 * b)
2427 * If engines are not busy and ctxsw state is valid then intr occurred
2428 * in the past and if the ctxsw state has moved on to VALID from LOAD
2429 * or SAVE, it means that whatever timed out eventually finished
2430 * anyways. The problem with this is that s/w cannot conclude which
2431 * context caused the problem as maybe more switches occurred before
2432 * intr is handled.
2433 */
2434 if (engine_id == FIFO_INVAL_ENGINE_ID) {
2435 nvgpu_info(g, "fifo sched error: 0x%08x, failed to find engine "
2436 "that is busy doing ctxsw. "
2437 "May be ctxsw already happened", sched_error);
2438 ret = false;
2439 goto err;
2440 }
2413 2441
2414 /* could not find the engine - should never happen */ 2442 /* could not find the engine - should never happen */
2415 if (!gk20a_fifo_is_valid_engine_id(g, engine_id)) { 2443 if (!gk20a_fifo_is_valid_engine_id(g, engine_id)) {