nvgpu.git - Tegra GPU Driver. Originally from nv-tegra.nvidia.com/linux-nvgpu.git.

	Commit message (Collapse)	Author	Age
*	gpu: nvgpu: APIs to enable/disable TSG	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	export below APIs for TSGs : gk20a_enable_tsg() - enable only TSG gk20a_disable_tsg() - disable only TSG gk20a_enable_channel_tsg() - if channel is part of TSG, enable TSG otherwise enable channel gk20a_disable_channel_tsg() - if channel is part of TSG, disable TSG otherwise disable channel Bug 200156699 Change-Id: Icdaca35235c3f323687f839fe32c6c5fe964b230 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/927131 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: stop timer on failing channel	Deepak Nibade	2016-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_timeout_handler(), below deadlock scenario is possible : thread 1: - take global lock g->ch_wdt_lock - identify timed out channel (as ch1) - check engine status which is stuck - identify failing channel on engine as ch2 - we need to trigger recovery with ch2 - as part of recovery, call channel_abort() for ch2 - in channel_abort(), we wait to cancel the timer wq - but timer wq for ch2 never completes due to thread 2 thread 2: - ch2 has already timed out - to process, we wait for global lock g->ch_wdt_lock - this lock needs to be released by thread 1 To fix this, cancel the timer (through flag) of ch2 (failing channel on engine) before triggering recovery on that channel Bug 200164753 Change-Id: Idb42d01c8440a53f43cb5e87e41f1c283f7e8fcf Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/929924 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: disable ctxsw instead of all engines activity	Deepak Nibade	2016-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_timeout_handler(), we currently disable all engine activity before checking for fence completion and before we identify timed out channel But disabling all engine activity could be overkill for this process. Also, as part of disabling engine activity we preempt the channel on engine. But it is possible that channel preemption times out since channel has already timed out And this can lead to races and deadlock Hence, instead of disabling all engine activity, just disable the context switch which should also do the same trick Bug 1716062 Change-Id: I596515ed670a2e134f7bcd9758488a4aa0bf16f7 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/929421 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: add high priority channel interleave	Peter Pipkorn	2016-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Interleave all high priority channels between all other channels. This reduces the latency for high priority work when there are a lot of lower priority work present, imposing an upper bound on the latency. Change the default high priority timeslice from 5.2ms to 3.0 in the process, to prevent long running high priority apps from hogging the GPU too much. Introduce a new debugfs node to enable/disable high priority channel interleaving. It is currently enabled by default. Adds new runlist length max register, used for allocating suitable sized runlist. Limit the number of interleaved channels to 32. This change reduces the maximum time a lower priority job is running (one timeslice) before we check that high priority jobs are running. Tested with gles2_context_priority (still passes) Basic sanity testing is done with graphics_submit (one app is high priority) Also more functional testing using lots of parallel runs with: NVRM_GPU_CHANNEL_PRIORITY=3 ./gles2_expensive_draw –drawsperframe 20000 –triangles 50 –runtime 30 –finish plus multiple: NVRM_GPU_CHANNEL_PRIORITY=2 ./gles2_expensive_draw –drawsperframe 20000 –triangles 50 –runtime 30 -finish Previous to this change, the relative performance between high priority work and normal priority work comes down to timeslice value. This means that when there are many low priority channels, the high priority work will still drop quite a lot. But with this change, the high priority work will roughly get about half the entire GPU time, meaning that after the initial lower performance, it is less likely to get lower in performance due to more apps running on the system. This change makes a large step towards real priority levels. It is not perfect and there are no guarantees on anything, but it is a step forwards without any additional CPU overhead or other complications. It will also serve as a baseline to judge other algorithms against. Support for priorities with TSG is future work. Support for interleave mid + high priority channels, instead of just high, is also future work. Bug 1419900 Change-Id: I0f7d0ce83b6598fe86000577d72e14d312fdad98 Signed-off-by: Peter Pipkorn <ppipkorn@nvidia.com> Reviewed-on: http://git-master/r/805961 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: enable semaphore acquire timeout	Richard Zhao	2016-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It'll detect dead semaphore acquire. The worst case is when ACQUIRE_SWITCH is disabled, semaphore acquire will poll and consume full gpu timeslicees. The timeout value is set to half of channel WDT. Bug 1636800 Change-Id: Ida6ccc534006a191513edf47e7b82d4b5b758684 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/928827 GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: preempt before adjusting fences	Deepak Nibade	2015-12-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current sequence in gk20a_disable_channel() is - disable channel in gk20a_channel_abort() - adjust pending fence in gk20a_channel_abort() - preempt channel But this leads to scenarios where syncpoint has min > max value Hence to fix this, make sequence in gk20a_disable_channel() - disable channel in gk20a_channel_abort() - preempt channel in gk20a_channel_abort() - adjust pending fence in gk20a_channel_abort() If gk20a_channel_abort() is called from other API where preemption is not needed, then use channel_preempt flag and do not preempt channel in those cases Bug 1683059 Change-Id: I4d46d4294cf8597ae5f05f79dfe1b95c4187f2e3 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/921290 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: call channel_update() always in channel_abort()	Deepak Nibade	2015-12-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_abort(), we disable the channel and adjust the fences. And then we call gk20a_channel_update() only if we released the post-fence semaphore Instead of this, call gk20a_channel_update() always to ensure that all the clean up after fence completion is done always Bug 1683059 Change-Id: Ife00ba43e808c0833264d79c98c74417ffadf802 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/842999 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Immediate channel release	Terje Bergstrom	2015-12-10
\| \| \| \| \| \| \| \| \| \| \| \|	When closing channel, disable and preempt it immediately instead of waiting for it to finish all work. Bug 1683059 Change-Id: Ia5f5fc6a072dc3ddb1e9bf63534814ff0a60b5b4 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/836746
*	gpu: nvgpu: create sync_fence only if needed	Deepak Nibade	2015-12-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we create sync_fence (from nvhost_sync_create_fence()) for every submit But not all submits request for a sync_fence. Also, nvhost_sync_create_fence() API takes about 1/3rd of the total submit path. Hence to optimize, we can allocate sync_fence only when user explicitly asks for it using (NVGPU_SUBMIT_GPFIFO_FLAGS_FENCE_GET && NVGPU_SUBMIT_GPFIFO_FLAGS_SYNC_FENCE) Also, in CDE path from gk20a_prepare_compressible_read(), we reuse existing fence stored in "state" and that can result into not returning sync_fence_fd when user asked for it Hence, force allocation of sync_fence when job submission comes from CDE path Bug 200141116 Change-Id: Ia921701bf0e2432d6b8a5e8b7d91160e7f52db1e Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/812845 (cherry picked from commit 5fd47015eeed00352cc8473eff969a66c94fee98) Reviewed-on: http://git-master/r/837662 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: move check_gp_put() and update_gp_get() to worker	Deepak Nibade	2015-12-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently call check_gp_put() and update_gp_get() in submit path and this takes about 5uS for both checks check_gp_put() - 3.5 uS update_gp_get() - 1.5 uS But this book-keeping can be moved to gk20a_channel_update() to save some submit time Note that check_gp_put() needs to be done inside submit lock Bug 200141116 Change-Id: I276400111be0421eb673695e2f2899ff52e344b4 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/839232 (cherry picked from commit 289617e8bf01bde9aab45dfa3a1c6a1241e6eb78) Reviewed-on: http://git-master/r/839713 GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: IOCTL to disable watchdog per-channel	Deepak Nibade	2015-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add IOCTL NVGPU_IOCTL_CHANNEL_WDT to disable/enable watchdog per-channel Also, if watchdog is disabled, we currently schedule the worker with MAX timeout. Instead of this, do not schedule any worker if watchdog is disabled Bug 1683059 Bug 1700277 Change-Id: I7f6bec84adeedb74e014ed6d1471317b854df84c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/837962 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix possible memory leak	Deepak Nibade	2015-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix possible memory leak in error condition Coverity id : 20022 Bug 1703084 Change-Id: Ie1085a6e9c206ba0d71fe6677986df2a357bb5d1 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/838776 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: fix Coverity issues	Deepak Nibade	2015-11-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- operands not affecting result (id = 12845) - logically dead code (id = 12890) - dereference after null check (id = 12968) - unsigned compared to 0 (id = 13176) - resource leak (id = 13338, 18673) - unused pointer value (id = 13916) Bug 1703084 Change-Id: I2f401dd93126af27748c53fa1b3a59cb154af36b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/835143 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: set aggressive_sync_destroy at runtime	Deepak Nibade	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently set "aggressive_destroy" flag to destroy sync object statically and for each sync object Move this flag to per-platform structure so that it can be set per-platform for all the sync objects Also, set the default value of this flag as "false" and set it to "true" once we have more than 64 channels in use Bug 200141116 Change-Id: I1bc271df4f468a4087a06a27c7289ee0ec3ef29c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/822041 (cherry picked from commit 98741e7e88066648f4f14490c76b61dbff745103) Reviewed-on: http://git-master/r/835800 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: rework private command buffer free path	Deepak Nibade	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently allocate private command buffers (wait_cmd and incr_cmd) before submitting the job but we never free them explicitly. When private command queue of the channel is full, we then try to recycle/remove free command buffers. But this recycling happens during submit path, and hence that particular submit path takes much longer Rework this as below : - add reference of command buffers to job structure - when job completes, free the command buffers explicitly - remove the code to recycle buffers since it should not be needed now Note that command buffers need to be freed in order of their allocation. Ensure this with error print before freeing the command buffer entry Bug 200141116 Bug 1698667 Change-Id: Id4b69429d7ad966307e0d122a71ad55076684307 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/827638 (cherry picked from commit c6cefd69b71c9b70d6df5343b13dfcfb3fa99598) Reviewed-on: http://git-master/r/835802 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: support skipping buffer refcounting in submit	Deepak Nibade	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In job submission path, we always take refcount on all the mapped buffers to safeguard against case where user space releases the buffer early But in case user space itself is doing proper buffer management, kernel need not take refcounts on all the buffers - which is also a overhead in submit path Hence, provide a new submit flag NVGPU_SUBMIT_GPFIFO_FLAGS_SKIP_BUFFER_REFCOUNTING to optionally skip taking refcounts on all the buffers Also, if we do not take refcounts, then no need to drop any refcounts in gk20a_channel_update() as well Bug 1698667 Bug 200141116 Change-Id: I81bb7a03240300b691c70bcec04ea1badd5934f4 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/824718 (cherry picked from commit 8c8978fa303ec4e6db0233becdbdcbad4a248173) Reviewed-on: http://git-master/r/835801 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: remove temporary gpfifo allocation in submit path	Deepak Nibade	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In GPU job submit path gk20a_ioctl_channel_submit_gpfifo(), we currently allocate a temporary gpfifo, copy user space gpfifo content into this temporary buffer, and then copy temp buffer content into channel's gpfifo. Allocation/copy/free of temporary buffer adds additional overhead Rewrite this sequence such that gk20a_submit_channel_gpfifo() can receive either a pre-filled gpfifo or pointer to user provided args. And then we can direclty copy the user provided gpfifo into the channel's gpfifo Also, if command buffer tracing is enabled, we still need to copy user provided gpfifo into temporaty buffer for reading But that should not cause overhead in real world use case Bug 200141116 Change-Id: I7166c9271da2694059da9853ab8839e98457b941 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/823386 (cherry picked from commit 3e0702db006c262dd8737a567b8e06f7ff005e2c) Reviewed-on: http://git-master/r/835799 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: remove shift initial value 3 when set runlist timeslice	Richard Zhao	2015-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \|	Bug 1695718 Change-Id: I595694e4a92ccaf8b8cd389e215d28709ed98c50 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/827829 (cherry picked from commit be5397109404b632c51a57758ee0dcdd2d2f5cc7) Reviewed-on: http://git-master/r/831556 GVS: Gerrit_Virtual_Submit Reviewed-by: Kirill Artamonov <kartamonov@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: use platform data for ptimer source rate	Seshendra Gadagottu	2015-11-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of depending on clock frame-work, use platform data for ptimer source rate. Removed ptimerscaling10x platform data, and use ptimer source frequency to calculate ptimerscaling factor. Reviewed-on: http://git-master/r/819030 (cherry picked from commit dd291334d54dab80cab7eb1656dffc48a59610b4) Change-Id: I7638ce9875a6e440bbfc2ba2da0d0b094b2700ff Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/827300 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: IOCTL to set TSG timeslice	Deepak Nibade	2015-11-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new IOCTL NVGPU_IOCTL_TSG_SET_PRIORITY to allow setting timeslice for entire TSG Return error from channel specific IOCTL_CHANNEL_SET_PRIORITY if the channel is part of TSG Separate out API gk20a_channel_get_timescale_from_timeslice() to get timeslice_timeout and scale from timeslice period Use this API to get timeslice_timeout and scale for TSG and store it in tsg_gk20a structure Then trigger runlist update so that new timeslice values will be re-written to runlist for TSG Bug 200146615 Change-Id: I555467d034f81b372b31372f0835d72b1c159508 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/824206 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Get rid of legacy gpfifo type	Janne Hellsten	2015-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get rid of the duplicate gpfifo struct to emphasize the fact that nvgpu_gpfifo is the only memory layout for gpfifo entries that works. This is the same layout that HW uses. Also, add a local pointer to the gpfifo memory in gk20a_submit_channel_gpfifo to get rid of repeated typecasts. Bug 1592391 Bug 1550886 Change-Id: I5432859ef8e7c1aab5907e44098994d7bb807f50 Signed-off-by: Janne Hellsten <jhellsten@nvidia.com> Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/677341 (cherry picked from commit 724c8c6228af81dd440e825bddf545dd6b2b8bd7) Reviewed-on: http://git-master/r/822548 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Arto Merilainen <amerilainen@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Disable only channel at zcull bind	Terje Bergstrom	2015-10-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	At zcull bind we disable whole GR engine. This is unnecessary, so instead disable only the channel and make sure it's unloaded. Introduces also an API in fifo_gk20a.c to do the channel disable. gr_gk20a_ctx_zcull_setup() was always passed true as last parameter, so remove parameter. Change-Id: I7ae6e101ec7d1ab3f6ee4e9bcc442d23dbd21247 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/787570
*	gpu: nvgpu: Protect sync by an own lock	Terje Bergstrom	2015-10-22
\| \| \| \| \| \| \| \| \| \| \|	Protect creation and deletion of sync by an own mutex. This prevents deadlock in channel abort when abort is called from submit path. Bug 200147887 Change-Id: I5d6308b773c1d1a6a89d4590e2e74c74d691f79d Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/821127
*	gpu: nvgpu: restart timer instead of cancel	Deepak Nibade	2015-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_fifo_handle_sched_error(), we currently cancel the timeout on all the channels But this could cause us to miss one of stuck channel hence, instead of cancelling, restart the timeout of channel on which it is already active Bug 200133289 Change-Id: I40e7e0e5394911fc110ab6fde39592b885dfaf7d Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/816133 Reviewed-by: Ishan Mittal <imittal@nvidia.com> Tested-by: Ishan Mittal <imittal@nvidia.com>
*	gpu: nvgpu: fix deadlock on timeout lock	Deepak Nibade	2015-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_timeout_stop(), we take the channel's timeout lock and then cancel the timeout worker thread Timeout worker thread also tries to acquire same timeout lock. Hence, while cancelling the timeout in gk20a_channel_timeout_stop() if the timeout_handler is already scheduled, we will have a deadlock Fix this by moving cancel_delayed_work_sync() out of the locks Bug 200133289 Bug 1695481 Change-Id: Iea78770180b483a63e5e176efba27831174e9dde Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/815922 Reviewed-by: Ishan Mittal <imittal@nvidia.com> Tested-by: Ishan Mittal <imittal@nvidia.com>
*	Revert "gpu: nvgpu: WAR for bad GPFIFO entries from userspace"	Terje Bergstrom	2015-10-08
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit aeb74fc7952718ffab6281c687951499510c4333. User space was fixed not to send zero-length GPFIFO entries. Bug 1662670 Change-Id: Iec6bf1870a19db4e8daa2ed4512650b92a37ba92 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/811783
*	gpu: nvgpu: cancel all wdt timeouts while handling SCHED errors	Deepak Nibade	2015-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A SCHED error might cause multiple channels' watchdogs to trigger simultaneously Hence, to avoid this conflict cancel watchdog timeout on all channels before recovering from SCHED errors Also, define API gk20a_channel_timeout_stop_all_channels() to cancel wdt timeout on all channels Bug 200133289 Change-Id: I8324c397891f0a711327b77d0677cd6718af6d01 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/810959 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: make wdt timeout per-platform	Deepak Nibade	2015-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Channel watchdog timeout is set to a costant value of 5s as of now Make this timeout platform specific and set it to 5s for gm20b and 7s for gk20a Bug 200133289 Change-Id: I6e7f0fed93a8d5b197ae46807131311196c6636f Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/810956 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: set correct timeslice value	Kirill Artamonov	2015-10-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Scale timeslice register value based on platform specific ptimer scale koefficient. Expose timeslice values through debugfs to simplify performance tuning. bug 1605552 bug 1603226 Change-Id: I49f86f22d58d26a366ee1b5f5a9ab9d7f896ad25 Signed-off-by: Kirill Artamonov <kartamonov@nvidia.com> Reviewed-on: http://git-master/r/800007 (cherry picked from commit 00c85ef24cf28ffaa81eb53fff7edef1c699220a) Reviewed-on: http://git-master/r/808251 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: implement per-channel watchdog	Deepak Nibade	2015-09-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement per-channel watchdog/timer as per below rules : - start the timer while submitting first job on channel or if no timer is already running - cancel the timer when job completes - re-start the timer if there is any incomplete job left in the channel's queue - trigger appropriate recovery method as part of timeout handling mechanism Handle the timeout as per below : - get timed out channel, and job data - disable activity on all engines - check if fence is really pending - get information on failing engine - if no engine is failing, just abort the channel - if engine is failing, trigger the recovery Also, add flag "ch_wdt_enabled" to enable/disable channel watchdog mechanism. Watchdog can also be disabled using global flag "timeouts_enabled" Set the watchdog time to be 5s using macro NVGPU_CHANNEL_WATCHDOG_DEFAULT_TIMEOUT_MS Bug 200133289 Change-Id: I401cf14dd34a210bc429f31bd5216a361edf1237 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/797072 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix runlist update timeout handling	Vijayakumar	2015-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	bug 1625901 1) disable ELPG before doing GR reset when runlist update times out 2) add mutex for GR reset to avoid multiple threads resetting GR 3) protect GR reset with FECS mutex so that no one else submits methods Change-Id: I02993fd1eabe6875ab1c58a40a06e6c79fcdeeae Signed-off-by: Vijayakumar <vsubbu@nvidia.com> Reviewed-on: http://git-master/r/793643 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: disable channel before adjusting syncpoints	Deepak Nibade	2015-09-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As per current sequence in gk20a_channel_abort(), we first balance the syncpoint values associated with failing channel, and then abort it Reverse this sequence so that we first disable the channel and then only balance the syncpoints Bug 200133289 Change-Id: I5a748afce437e728a5ff6c8a030a75d0f627c622 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/797071 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: cyclestats snapshot permissions rework	Leonid Moiseichuk	2015-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cyclestats snapshot feature is expected for new devices. The detection code was isolated in separate function and run-time check added to validate/allow ioctl calls on the current GPU. Bug 1674079 Change-Id: Icc2f1e5cc50d39b395d31d5292c314f99d67f3eb Signed-off-by: Leonid Moiseichuk <lmoiseichuk@nvidia.com> Reviewed-on: http://git-master/r/781697 (cherry picked from commit bdd23136b182c933841f91dd2829061e278a46d4) Reviewed-on: http://git-master/r/793630 Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: WAR for bad GPFIFO entries from userspace	Alex Waterman	2015-06-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Userspace sometimes sends GPFIFO entries with a zero length. This is a problem when the address of the pushbuffer of zero length is larger than 32 bits. The high bits are interpreted as an opcode and either triggers an operation that should not happen or is trated as invalid. Oddly, this WAR is only necessary on simulation. Change-Id: I8be007c50f46d3e35c6a0e8512be88a8e68ee739 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/753379 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/755814 Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: Do not pre-allocate cmdbuf entries	Terje Bergstrom	2015-06-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not preallocate cmdbuf tracking entries. Allocate them only when needed. Bug 200104160 Change-Id: I12f8392723c301a368af1e280893ff993480477f Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/743953 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/755148
*	gpu: nvgpu: add per-channel refcounting	Konsta Holtta	2015-06-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add reference counting for channels, and wait for reference count to get to 0 in gk20a_channel_free() before actually freeing the channel. Also, change free channel tracking a bit by employing a list of free channels, which simplifies the procedure of finding available channels with reference counting. Each use of a channel must have a reference taken before use or held by the caller. Taking a reference of a wild channel pointer may fail, if the channel is either not opened or in a process of being closed. Also, add safeguards for protecting accidental use of closed channels, specifically, by setting ch->g = NULL in channel free. This will make it obvious if freed channel is attempted to be used. The last user of a channel might be the deferred interrupt handler, so wait for deferred interrupts to be processed twice in the channel free procedure: once for providing last notifications to the channel and once to make sure there are no stale pointers left after referencing to the channel has been denied. Finally, fix some races in channel and TSG force reset IOCTL path, by pausing the channel scheduler in gk20a_fifo_recover_ch() and gk20a_fifo_recover_tsg(), while the affected engines have been identified, the appropriate MMU faults triggered, and the MMU faults handled. In this case, make sure that the MMU fault does not attempt to query the hardware about the failing channel or TSG ids. This should make channel recovery more safe also in the regular (i.e., not in the interrupt handler) context. Bug 1530226 Bug 1597493 Bug 1625901 Bug 200076344 Bug 200071810 Change-Id: Ib274876908e18219c64ea41e50ca443df81d957b Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Signed-off-by: Sami Kiminki <skiminki@nvidia.com> Reviewed-on: http://git-master/r/448463 (cherry picked from commit 3f03aeae64ef2af4829e06f5f63062e8ebd21353) Reviewed-on: http://git-master/r/755147 Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: Use vmalloc only when size >4K	Terje Bergstrom	2015-06-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When allocation size is 4k or below, we should use kmalloc. vmalloc should be used only for larged allocations. Introduce nvgpu_alloc, which checks the size, and decides the API to use. Change-Id: I593110467cd319851b27e57d1bfe8d228d3f2909 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/743974 (cherry picked from commit 7f56aa1f0ecafbfde7286353b60e25e494674d26) Reviewed-on: http://git-master/r/753276 Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: cyclestats mode E snapshots support	Leonid Moiseichuk	2015-06-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	That is a kernel supporting code for cyclestats mode E. Cyclestats mode E implemented following Windows-design in user-space and required the following operations to be implemented: - attach a client for shared hardware buffer of device - detach client from shared hardware buffer - flush means copy of available data from hardware buffer to private client buffers according to perfmon IDs assigned for clients - perfmon IDs management for user-space clients - a NVGPU_GPU_FLAGS_SUPPORT_CYCLE_STATS_SNAPSHOT capability added Bug 1573150 Change-Id: I9e09f0fbb2be5a95c47e6d80a2e23fa839b46f9a Signed-off-by: Leonid Moiseichuk <lmoiseichuk@nvidia.com> Reviewed-on: http://git-master/r/740653 (cherry picked from commit 79fe89fd4cea39d8ab9dbef0558cd806ddfda87f) Reviewed-on: http://git-master/r/753274 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu:nvgpu: update channel_setup_ramfc interface	Seshendra Gadagottu	2015-06-02
\| \| \| \| \| \| \| \| \| \| \| \| \|	Pass flags parameter to channel_setup_ramfc for indicating nvgpu_alloc_gpfifo_args characteristics. Bug 1645628 Change-Id: Ia40b37c5c7b208d459aa84f1b022036dd5e1b599 Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/744526 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix channel leak with immediate close	Deepak Nibade	2015-05-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a GPU channel is closed immediately after opening without performing any operation on it, we leak that channel e.g. below command leaks a channel echo > /dev/nvhost-gpu Fix this leak by releasing the channel before returning Change-Id: I2598e3cabec6996cb1cf8066a1e6d7d5864ae02b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/743235 (cherry picked from commit 428771509b4431ebe88e38061b495cabc5192327) Reviewed-on: http://git-master/r/744279 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: check sync existence in channel update	Konsta Holtta	2015-05-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The channel sync object can get deleted before all channel updates have finished if the channel is freed before them, so work around a null dereference by testing if the sync exists. Channel and/or c->sync refcounting would be necessary for proper fix. Bug 200076344 Change-Id: Ica8ef2df9cd95cfa593cd4f41768dbb6641357b2 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/734266 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Do not send WFI when finishing channel	Terje Bergstrom	2015-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The channel teardown process sends a WFI method to ensure that all work has been completed. But we also preempt the channel a while later, which also ensures that all work is completed. Remove the code for submitting WFI, and rely on preemption to handle idling the pipe. Change-Id: I2af029184440ee73e70d377f15690ddaf9b8599f Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/735067 Reviewed-on: http://git-master/r/737527 Reviewed-by: Alexander Van Brunt <avanbrunt@nvidia.com> Tested-by: Alexander Van Brunt <avanbrunt@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: Reconfigure instance block with syncpt	Terje Bergstrom	2015-05-05
\| \| \| \| \| \| \| \| \| \| \| \|	Resetup RAMFC once sync point id is allocated for a channel. Change-Id: Idbac406bea1c94c89ef587dda08fddc740c1fadb Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/711302 Reviewed-on: http://git-master/r/737526 Reviewed-by: Alexander Van Brunt <avanbrunt@nvidia.com> Tested-by: Alexander Van Brunt <avanbrunt@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: Sem wakeup to post event	Terje Bergstrom	2015-04-04
\| \| \| \| \| \| \| \|	Add posting a channel event whenever we do a wakeup due to semaphore. Change-Id: Id1765123de93bcbc0822af7926d7f4e9919ffe10 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/726420
*	gpu: nvgpu: Fix the GPFIFO submit condition	Gagan Grover	2015-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GPFIFO HW with fifo size N can actually accept N-1 entries. Also, kernel can insert two extra entries, before and after the user GPFIFOs. So, GPFIFO with N size can accept max N-3 user gpfifos. Bug 1613125 Change-Id: I173526afb70dddf1b2b9ec0a99890335c81f0e02 Signed-off-by: Gagan Grover <ggrover@nvidia.com> Reviewed-on: http://git-master/r/725380 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Increase GPFIFO priv cmd queue size	Terje Bergstrom	2015-04-04
\| \| \| \| \| \| \| \| \|	We're doing two sync point increments, so we need to allocate space for it. Change-Id: I663abb18a930eb3955379d5f8dd1b08c8fa56897 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/719884
*	gpu: nvgpu: improve error print in check_gp_put()	Deepak Nibade	2015-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve error print in check_gp_put() and dump put pointer stored in channel's gpfifo structure and the put pointer that we read from memory Bug 200089835 Change-Id: I7e854398811330a43d9d9bf86bf29c5986ab22b7 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/722536 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Use common allocator for GPFIFO	Terje Bergstrom	2015-04-04
\| \| \| \| \| \| \| \| \| \| \| \|	Reduce amount of duplicate code around memory allocation by using common helpers, and common data structure for storing results of allocations. Bug 1605769 Change-Id: I81701427ae29b298039a77f1634af9c14237812e Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/719872
*	gpu: nvgpu: Use common allocator for cmd queue	Terje Bergstrom	2015-04-04
\| \| \| \| \| \| \| \| \| \| \| \|	Reduce amount of duplicate code around memory allocation by using common helpers, and common data structure for storing results of allocations. Bug 1605769 Change-Id: If93063acbbfaa92aef530208241988427b5df8eb Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/719871
*	gpu: nvgpu: Use gk20a_mem_phys instead of sg_phys	Terje Bergstrom	2015-04-04
\| \| \| \| \| \| \| \| \|	There were still a couple of places using sg_phys directly. Use new gk20a_mem_phys() to make the code shorter. Change-Id: I6eb9b14e0c14a27ec39bacd06ab24e31e99769ca Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/717502