nvgpu.git - Tegra GPU Driver. Originally from nv-tegra.nvidia.com/linux-nvgpu.git.

	Commit message (Collapse)	Author	Age
*	gpu: nvgpu: API to post channel events	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new API gk20a_channel_post_event() which adds channel event and also calls wake_up() for channel's semaphore wq Bug 200156699 Change-Id: If56f1bf8edcce79c9248809f8476ed853b7d2d9d Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/927132 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: APIs to enable/disable TSG	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	export below APIs for TSGs : gk20a_enable_tsg() - enable only TSG gk20a_disable_tsg() - disable only TSG gk20a_enable_channel_tsg() - if channel is part of TSG, enable TSG otherwise enable channel gk20a_disable_channel_tsg() - if channel is part of TSG, disable TSG otherwise disable channel Bug 200156699 Change-Id: Icdaca35235c3f323687f839fe32c6c5fe964b230 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/927131 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: API to extract context id	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add new API gr_gk20a_get_ctx_id() to get/extract context id from GR context Bug 200156699 Change-Id: If0e8887a9a6b139cd795bf03f5def64fd664d12b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/927130 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: APIs to suspend/resume single SM	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add below APIs to suspend or resume single SM : gk20a_suspend_single_sm() gk20a_resume_single_sm() Also, update gk20a_suspend_all_sms() to make it more generic by passing global_esr_mask and check_errors flag as parameter Bug 200156699 Change-Id: If40f4bcae74a8132673b4dca10b7d9898f23c164 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/925884 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: support preprocessing of SM exceptions	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support preprocessing of SM exceptions if API pointer pre_process_sm_exception() is defined Also, expose some common APIs Bug 200156699 Change-Id: I1303642c1c4403c520b62efb6fd83e95eaeb519b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/925883 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: stop timer on failing channel	Deepak Nibade	2016-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_timeout_handler(), below deadlock scenario is possible : thread 1: - take global lock g->ch_wdt_lock - identify timed out channel (as ch1) - check engine status which is stuck - identify failing channel on engine as ch2 - we need to trigger recovery with ch2 - as part of recovery, call channel_abort() for ch2 - in channel_abort(), we wait to cancel the timer wq - but timer wq for ch2 never completes due to thread 2 thread 2: - ch2 has already timed out - to process, we wait for global lock g->ch_wdt_lock - this lock needs to be released by thread 1 To fix this, cancel the timer (through flag) of ch2 (failing channel on engine) before triggering recovery on that channel Bug 200164753 Change-Id: Idb42d01c8440a53f43cb5e87e41f1c283f7e8fcf Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/929924 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: disable ctxsw instead of all engines activity	Deepak Nibade	2016-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_timeout_handler(), we currently disable all engine activity before checking for fence completion and before we identify timed out channel But disabling all engine activity could be overkill for this process. Also, as part of disabling engine activity we preempt the channel on engine. But it is possible that channel preemption times out since channel has already timed out And this can lead to races and deadlock Hence, instead of disabling all engine activity, just disable the context switch which should also do the same trick Bug 1716062 Change-Id: I596515ed670a2e134f7bcd9758488a4aa0bf16f7 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/929421 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: add high priority channel interleave	Peter Pipkorn	2016-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Interleave all high priority channels between all other channels. This reduces the latency for high priority work when there are a lot of lower priority work present, imposing an upper bound on the latency. Change the default high priority timeslice from 5.2ms to 3.0 in the process, to prevent long running high priority apps from hogging the GPU too much. Introduce a new debugfs node to enable/disable high priority channel interleaving. It is currently enabled by default. Adds new runlist length max register, used for allocating suitable sized runlist. Limit the number of interleaved channels to 32. This change reduces the maximum time a lower priority job is running (one timeslice) before we check that high priority jobs are running. Tested with gles2_context_priority (still passes) Basic sanity testing is done with graphics_submit (one app is high priority) Also more functional testing using lots of parallel runs with: NVRM_GPU_CHANNEL_PRIORITY=3 ./gles2_expensive_draw –drawsperframe 20000 –triangles 50 –runtime 30 –finish plus multiple: NVRM_GPU_CHANNEL_PRIORITY=2 ./gles2_expensive_draw –drawsperframe 20000 –triangles 50 –runtime 30 -finish Previous to this change, the relative performance between high priority work and normal priority work comes down to timeslice value. This means that when there are many low priority channels, the high priority work will still drop quite a lot. But with this change, the high priority work will roughly get about half the entire GPU time, meaning that after the initial lower performance, it is less likely to get lower in performance due to more apps running on the system. This change makes a large step towards real priority levels. It is not perfect and there are no guarantees on anything, but it is a step forwards without any additional CPU overhead or other complications. It will also serve as a baseline to judge other algorithms against. Support for priorities with TSG is future work. Support for interleave mid + high priority channels, instead of just high, is also future work. Bug 1419900 Change-Id: I0f7d0ce83b6598fe86000577d72e14d312fdad98 Signed-off-by: Peter Pipkorn <ppipkorn@nvidia.com> Reviewed-on: http://git-master/r/805961 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: enable semaphore acquire timeout	Richard Zhao	2016-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It'll detect dead semaphore acquire. The worst case is when ACQUIRE_SWITCH is disabled, semaphore acquire will poll and consume full gpu timeslicees. The timeout value is set to half of channel WDT. Bug 1636800 Change-Id: Ida6ccc534006a191513edf47e7b82d4b5b758684 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/928827 GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: vgpu: add regops support	Richard Zhao	2016-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added new RM Server command for regops. JIRA VFND-1128 Bug 1700139 Change-Id: Ia1cc63e993c29c91f87440c241077fa91edb9e53 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/923235 (cherry picked from commit 7de22e42cfd2e419ad64178b9f1f1ee16273bd03) Reviewed-on: http://git-master/r/841330 Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: vgpu: add SM exception support	Richard Zhao	2016-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When TEGRA_VGPU_GR_INTR_SM_EXCEPTION comes, post debugger event. Bug 1594604 JIRA VFND-1120 Change-Id: I7229c3994220a7c6f117d38a1af2e766187a47c6 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/923234 (cherry picked from commit bdd414d9366133380a202d88b1a50038b70c068d) Reviewed-on: http://git-master/r/840646 Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: vgpu: add set sm debug mode support	Richard Zhao	2016-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA VFND-1006 Bug 1594604 Change-Id: If6eb7ae22b5b0557faddd3d68deb791abb24bec4 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/923233 (cherry picked from commit 9e14ca393c3044be702c50524a9ef3a2c3a6270c) Reviewed-on: http://git-master/r/841866 Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: abstract set sm debug mode	Richard Zhao	2016-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new operation g->ops.gr.set_sm_debug_mode and move native implementation to gr_gk20a.c It's preparing for adding vgpu set sm debug mode hook. JIRA VFND-1006 Bug 1594604 Change-Id: Ia5ca06a86085a690e70bfa9c62f57ec3830ea933 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/923232 (cherry picked from commit 032552b54c570952d1e36c08191e9f70b9c59447) Reviewed-on: http://git-master/r/835614 Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	Revert "gpu: nvgpu: Enable ELPG when disabled due to reset"	Seshendra Gadagottu	2016-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit f6ab5bd17d16f3605b78c3c2ee80513d5823c594. Fix for graphics_submit regresssion. Bug 200164812 Change-Id: I5e37b8263758ee389cdba3ec6e3758afbdd9c910 Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/929605 Tested-by: Hoang Pham <hopham@nvidia.com> Reviewed-by: Hoang Pham <hopham@nvidia.com>
*	gpu: nvgpu: Control comptagline assignment from kernel	Terje Bergstrom	2016-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Maxwell comptaglines are assigned per 128k, but preferred big page size for graphics is 64k. Bit 16 of GPU VA is used for determining which half of comptagline is used. This creates problems if user space wants to map a page multiple times and to arbitrary GPU VA. In one mapping the page might be mapped to lower half of 128k comptagline, and in another mapping the page might be mapped to upper half. Turn on mode where MSB of comptagline in PTE is used instead of bit 16 for determining the comptagline lower/upper half selection. Bug 1704834 Change-Id: If87e8f6ac0fc9c5624e80fa1ba2ceeb02781355b Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/924322 Reviewed-by: Alex Waterman <alexw@nvidia.com>
*	gpu: nvgpu: disable secure allocations on linsim	Deepak Nibade	2016-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Disable all secure allocations on linsim by returning an error from gk20a_tegra_secure_page_alloc() With this failure, no more secure allocations will be done from nvgpu Bug 200163671 Change-Id: I26604e45a684dde29c092dc34cc89259f5de5d91 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/928280 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com> Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
*	gpu: nvgpu: Enable ELPG when disabled due to reset	Mahantesh Kumbar	2016-01-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable ELPG back whenever ELPG disable is done due to reset or recovery. Otherwise elpg_refcnt mismatch doesn’t engage ELPG correctly Bug 200156347 Change-Id: Ic01f85b9e1eff10cfb9cb180b50b045f67d4b33c Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-on: http://git-master/r/925763 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: gm20b: use jiffies for wait on PMU	Vijayakumar	2015-12-30
\| \| \| \| \| \| \| \| \| \| \|	bug 200157852 Change-Id: Ib5ab6ed5f3d8356efd527ce5ff6e4134ac60da7d Signed-off-by: Vijayakumar <vsubbu@nvidia.com> Reviewed-on: http://git-master/r/921711 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
*	gpu: nvgpu: Add comptag offset to part mappings	Terje Bergstrom	2015-12-14
\| \| \| \| \| \| \| \| \| \|	Add offset to comptags when mapping partial buffers. Bug 1704834 Change-Id: I3405b465bb1373bcc79eb5ecbd93dd1b866abfb4 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/837401
*	gpu: nvgpu: Add 3 functions to regops interface.	Ashutosh Jain	2015-12-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds the following IOCTLS: - NVGPU_GPU_IOCTL_RESUME_FROM_PAUSE - NVGPU_GPU_IOCTL_TRIGGER_SUSPEND - NVGPU_GPU_IOCTL_CLEAR_SM_ERRORS Bug 1619430 Change-Id: Iac37d515a753d8b799e631224eae2fa168b43e2c Signed-off-by: ashutosh jain <ashutoshj@nvidia.com> Reviewed-on: http://git-master/r/921378 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: update last debug ioctl	Chris Dragan	2015-12-14
\| \| \| \| \| \| \| \| \| \| \| \| \|	Update last IOCTL number to point to GET_TIMEOUT to allow its use. Bug 1706457 Change-Id: I9c8cc4e972fc25e14ca5aff075eca72bc1807a0b Signed-off-by: Chris Dragan <kdragan@nvidia.com> (cherry-picked from commit f73444b0f92737919dafd1623a2dde60c467b25b) Reviewed-on: http://git-master/r/921453 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: make local function static	Amit Sharma	2015-12-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed the following sparse warning by restricting the scope of API's within file. - gr_gk20a.c:7273: warning: symbol 'gr_gk20a_bpt_reg_info' was not declared. Should it be static? - gr_gm20b.c:1053: warning: symbol 'gr_gm20b_bpt_reg_info' was not declared. Should it be static? Bug 200088648 Change-Id: I63bba55b1432e4284c9074d2729a176f1767a83a Signed-off-by: Amit Sharma <amisharma@nvidia.com> Reviewed-on: http://git-master/r/842260 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: fix possible NULL dereference	Deepak Nibade	2015-12-14
\| \| \| \| \| \| \| \| \| \| \| \| \|	Coverity id : 20260 Bug 1416640 Change-Id: I6ca8df2ed001df99ad46e476b1fe4de9f1346786 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/922362 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: fix dereference after null check	Deepak Nibade	2015-12-11
\| \| \| \| \| \| \| \| \| \| \| \|	Coverity id : 20234 Bug 1703084 Change-Id: I7b82a44dde39bf6b811ae144815c0c45468aa740 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/921326 Reviewed-by: Sachin Nikam <snikam@nvidia.com> Tested-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: preempt before adjusting fences	Deepak Nibade	2015-12-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current sequence in gk20a_disable_channel() is - disable channel in gk20a_channel_abort() - adjust pending fence in gk20a_channel_abort() - preempt channel But this leads to scenarios where syncpoint has min > max value Hence to fix this, make sequence in gk20a_disable_channel() - disable channel in gk20a_channel_abort() - preempt channel in gk20a_channel_abort() - adjust pending fence in gk20a_channel_abort() If gk20a_channel_abort() is called from other API where preemption is not needed, then use channel_preempt flag and do not preempt channel in those cases Bug 1683059 Change-Id: I4d46d4294cf8597ae5f05f79dfe1b95c4187f2e3 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/921290 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: call channel_update() always in channel_abort()	Deepak Nibade	2015-12-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_abort(), we disable the channel and adjust the fences. And then we call gk20a_channel_update() only if we released the post-fence semaphore Instead of this, call gk20a_channel_update() always to ensure that all the clean up after fence completion is done always Bug 1683059 Change-Id: Ife00ba43e808c0833264d79c98c74417ffadf802 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/842999 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Immediate channel release	Terje Bergstrom	2015-12-10
\| \| \| \| \| \| \| \| \| \| \| \|	When closing channel, disable and preempt it immediately instead of waiting for it to finish all work. Bug 1683059 Change-Id: Ia5f5fc6a072dc3ddb1e9bf63534814ff0a60b5b4 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/836746
*	gpu: nvgpu: optimize map calls in patch smpc	Konsta Holtta	2015-12-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Copy the necessary ctx patching prologues and epilogues from patch_write to gr_gk20a_exec_ctx_ops next to mapping of gr ctx, around a loop that applies patches multiple times, in order to optimize the number of maps/unmaps on the channel's patch context. Bug 200075565 Change-Id: I7125be5c778192d639f0bbed1731bb900c7015da Signed-off-by: Konsta Holtta <kholtta@nvidia.com> (cherry picked from commit TODO-FILL-THIS-WHEN-MERGED) Reviewed-on: http://git-master/r/839203 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: add API to extract GPU timeout mode	Chris Dragan	2015-12-09
\| \| \| \| \| \| \| \| \| \| \|	Bug 1706457 Change-Id: Iab76bcb7cabc55d99b5acd932716d30da6f01b46 Signed-off-by: Chris Dragan <kdragan@nvidia.com> Reviewed-on: http://git-master/r/835852 Reviewed-on: http://git-master/r/836454 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Add grad slowdown HW register masks	Terje Bergstrom	2015-12-08
\| \| \| \| \| \| \| \| \| \|	Change-Id: I8d64c36d569e79cad3648bad248624290319ac2d Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/839367 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
*	gpu: nvgpu: create sync_fence only if needed	Deepak Nibade	2015-12-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we create sync_fence (from nvhost_sync_create_fence()) for every submit But not all submits request for a sync_fence. Also, nvhost_sync_create_fence() API takes about 1/3rd of the total submit path. Hence to optimize, we can allocate sync_fence only when user explicitly asks for it using (NVGPU_SUBMIT_GPFIFO_FLAGS_FENCE_GET && NVGPU_SUBMIT_GPFIFO_FLAGS_SYNC_FENCE) Also, in CDE path from gk20a_prepare_compressible_read(), we reuse existing fence stored in "state" and that can result into not returning sync_fence_fd when user asked for it Hence, force allocation of sync_fence when job submission comes from CDE path Bug 200141116 Change-Id: Ia921701bf0e2432d6b8a5e8b7d91160e7f52db1e Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/812845 (cherry picked from commit 5fd47015eeed00352cc8473eff969a66c94fee98) Reviewed-on: http://git-master/r/837662 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: move check_gp_put() and update_gp_get() to worker	Deepak Nibade	2015-12-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently call check_gp_put() and update_gp_get() in submit path and this takes about 5uS for both checks check_gp_put() - 3.5 uS update_gp_get() - 1.5 uS But this book-keeping can be moved to gk20a_channel_update() to save some submit time Note that check_gp_put() needs to be done inside submit lock Bug 200141116 Change-Id: I276400111be0421eb673695e2f2899ff52e344b4 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/839232 (cherry picked from commit 289617e8bf01bde9aab45dfa3a1c6a1241e6eb78) Reviewed-on: http://git-master/r/839713 GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: gm20b: Add tile caching registers	Terje Bergstrom	2015-12-07
\| \| \| \| \| \| \| \| \| \|	Add tile caching related registers to access map. Bug 1692373 Change-Id: I4516812dd571bed3be2dfa2b210abe3177e794fe Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/812354
*	gpu: nvgpu: Make access map chip specific	Terje Bergstrom	2015-12-07
\| \| \| \| \| \| \| \|	Bug 1692373 Change-Id: Ie3fc3e02fa7b0636da464d6ee1c28da7a4543ec2 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/812353
*	gpu: nvgpu: Handle interrupt even if no channel	Terje Bergstrom	2015-12-05
\| \| \| \| \| \| \| \| \| \| \| \|	If we could not find the channel for GR interrupt we skipped resetting it. Change the logic to do the reset, and find the channel via FIFO engine status. Bug 1706962 Change-Id: I8a0d283b36d88ea1cec541dbf90db7929104ad7c Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/839465
*	gpu: nvgpu: Wait for pause for SMs	sujeet baranwal	2015-12-04
\| \| \| \| \| \| \| \| \| \| \| \|	SM locking & register reads Order has been changed. Also, functions have been implemented based on gk20a and gm20b. Change-Id: Iaf720d088130f84c4b2ca318d9860194c07966e1 Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com> Signed-off-by: ashutosh jain <ashutoshj@nvidia.com> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/837236
*	gpu: nvgpu: vgpu: add set mmu debug mode support	Richard Zhao	2015-12-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA VFND-1005 Bug 1594604 Change-Id: Ic159a1aff9cee508194f1f5dff7a16eb0e47ad64 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/833498 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	drm/tegra: Support NVIDIA downstream	Arto Merilainen	2015-12-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch modifies TegraDRM to compile on NVIDIA downstream kernel. Currently buffers are pinned to virtual drm device. In upstream this is ok since the buffers will be remapped into the TegraDRM maintained IOMMU domain. However, NVIDIA downstream kernel relies on DMA mapping API and hence requires that the buffers are pinned to the real hardware device. In addition, this patch modifies code to use downstream power-management APIs if the downstream option is enabled. Bug 1698151 Change-Id: I1cb7a1a0ad0ae767c48bfecb608d899e484d6b40 Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Reviewed-on: http://git-master/r/823640
*	gpu: nvgpu: IOCTL to disable watchdog per-channel	Deepak Nibade	2015-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add IOCTL NVGPU_IOCTL_CHANNEL_WDT to disable/enable watchdog per-channel Also, if watchdog is disabled, we currently schedule the worker with MAX timeout. Instead of this, do not schedule any worker if watchdog is disabled Bug 1683059 Bug 1700277 Change-Id: I7f6bec84adeedb74e014ed6d1471317b854df84c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/837962 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix possible memory leak	Deepak Nibade	2015-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix possible memory leak in error condition Coverity id : 20022 Bug 1703084 Change-Id: Ie1085a6e9c206ba0d71fe6677986df2a357bb5d1 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/838776 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: gm20b: use jiffies for wait on PMU	Vijayakumar	2015-11-25
\| \| \| \| \| \| \| \| \| \|	bug 200153970 Signed-off-by: Vijayakumar <vsubbu@nvidia.com> Change-Id: Ia5f616269bfeb834540bf4da6ecfc6e399682819 Reviewed-on: http://git-master/r/836966 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix Coverity issues	Deepak Nibade	2015-11-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- operands not affecting result (id = 12845) - logically dead code (id = 12890) - dereference after null check (id = 12968) - unsigned compared to 0 (id = 13176) - resource leak (id = 13338, 18673) - unused pointer value (id = 13916) Bug 1703084 Change-Id: I2f401dd93126af27748c53fa1b3a59cb154af36b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/835143 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: ucode update with ELPG hang WAR	Mahantesh Kumbar	2015-11-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- UCODE WAR to disable ELCG during a brief time instant during ELPG entry and exit. - UCODE app version - 20120791 Bug 1696192 Change-Id: Ia6ddf5cd86f3024d40dfa75ec610ba0d1dd4f1fe Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-on: http://git-master/r/835894 (cherry picked from commit 7bb55ae2b1a59f062f2875d1eebd113d66c2af14) Reviewed-on: http://git-master/r/836577 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: abstract set mmu debug mode	Richard Zhao	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new operaton g->ops.mm.set_debug_mode and let other places that set debug mode call this callback. It's preparing for adding vgpu set mmu debug mode hook. JIRA VFND-1005 Bug 1594604 Change-Id: I1d227a0c0f96adb0035ae16ae1f4fbfa739bf0a7 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/833497 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: correct register setting for debug mode	Richard Zhao	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	correct register settings for both set mmu debug mode and set sm debug mode. JIRA VFND-1005 Bug 1594604 Change-Id: I1d4b1d4b4cdd9d24d3b00481e0e22c4217f5a4b3 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/833490 Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: set aggressive_sync_destroy at runtime	Deepak Nibade	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently set "aggressive_destroy" flag to destroy sync object statically and for each sync object Move this flag to per-platform structure so that it can be set per-platform for all the sync objects Also, set the default value of this flag as "false" and set it to "true" once we have more than 64 channels in use Bug 200141116 Change-Id: I1bc271df4f468a4087a06a27c7289ee0ec3ef29c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/822041 (cherry picked from commit 98741e7e88066648f4f14490c76b61dbff745103) Reviewed-on: http://git-master/r/835800 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: update name for gpu debugfs node	Mahantesh Kumbar	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create constant name for gpu debugfs node across all chip. Bug n/a Change-Id: I359b82b5389c49d8fe2a31ace49ff6daa1edfb10 Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/805397 Signed-off-by: Seema Khowala <seemaj@nvidia.com> (cherry-picked from commit 17a3882cde09412c68f7a0ee4765f45be1a51c45) Reviewed-on: http://git-master/r/817014
*	gpu: nvgpu: rework private command buffer free path	Deepak Nibade	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently allocate private command buffers (wait_cmd and incr_cmd) before submitting the job but we never free them explicitly. When private command queue of the channel is full, we then try to recycle/remove free command buffers. But this recycling happens during submit path, and hence that particular submit path takes much longer Rework this as below : - add reference of command buffers to job structure - when job completes, free the command buffers explicitly - remove the code to recycle buffers since it should not be needed now Note that command buffers need to be freed in order of their allocation. Ensure this with error print before freeing the command buffer entry Bug 200141116 Bug 1698667 Change-Id: Id4b69429d7ad966307e0d122a71ad55076684307 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/827638 (cherry picked from commit c6cefd69b71c9b70d6df5343b13dfcfb3fa99598) Reviewed-on: http://git-master/r/835802 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: support skipping buffer refcounting in submit	Deepak Nibade	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In job submission path, we always take refcount on all the mapped buffers to safeguard against case where user space releases the buffer early But in case user space itself is doing proper buffer management, kernel need not take refcounts on all the buffers - which is also a overhead in submit path Hence, provide a new submit flag NVGPU_SUBMIT_GPFIFO_FLAGS_SKIP_BUFFER_REFCOUNTING to optionally skip taking refcounts on all the buffers Also, if we do not take refcounts, then no need to drop any refcounts in gk20a_channel_update() as well Bug 1698667 Bug 200141116 Change-Id: I81bb7a03240300b691c70bcec04ea1badd5934f4 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/824718 (cherry picked from commit 8c8978fa303ec4e6db0233becdbdcbad4a248173) Reviewed-on: http://git-master/r/835801 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: remove temporary gpfifo allocation in submit path	Deepak Nibade	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In GPU job submit path gk20a_ioctl_channel_submit_gpfifo(), we currently allocate a temporary gpfifo, copy user space gpfifo content into this temporary buffer, and then copy temp buffer content into channel's gpfifo. Allocation/copy/free of temporary buffer adds additional overhead Rewrite this sequence such that gk20a_submit_channel_gpfifo() can receive either a pre-filled gpfifo or pointer to user provided args. And then we can direclty copy the user provided gpfifo into the channel's gpfifo Also, if command buffer tracing is enabled, we still need to copy user provided gpfifo into temporaty buffer for reading But that should not cause overhead in real world use case Bug 200141116 Change-Id: I7166c9271da2694059da9853ab8839e98457b941 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/823386 (cherry picked from commit 3e0702db006c262dd8737a567b8e06f7ff005e2c) Reviewed-on: http://git-master/r/835799 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>