nvgpu.git - Tegra GPU Driver. Originally from nv-tegra.nvidia.com/linux-nvgpu.git.

	Commit message (Collapse)	Author	Age
*	gpu: nvgpu: Store pending sema waits	Alex Waterman	2016-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Store pending sema waits so that they can be explicitly handled when the driver dies. If the sema_wait is freed before the pending wait is either handled or canceled problems occur. Internally the sync_fence_wait_async() function uses the kernel timers. That uses a linked list of possible events. That means every so often the kernel iterates through this list. If the list node that is in the sync_fence_waiter struct is freed before it can be removed from the pending timers list then the kernel timers list can be corrupted. When the kernel then iterates through this list crashes and other related problems can happen. Bug 1816516 Bug 1807277 Change-Id: Iddc4be64583c19bfdd2d88b9098aafc6ae5c6475 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1250025 (cherry picked from commit 01889e21bd31dbd7ee85313e98079138ed1d63be) Reviewed-on: http://git-master/r/1261920 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Fix signed comparison bugs	Terje Bergstrom	2016-11-17
\| \| \| \| \| \| \| \| \| \| \| \|	Fix small problems related to signed versus unsigned comparisons throughout the driver. Bump up the warning level to prevent such problems from occuring in future. Change-Id: I8ff5efb419f664e8a2aedadd6515ae4d18502ae0 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1252068 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: make deferred clean-up conditional	Sachit Kadle	2016-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change makes the invocation of the deferred job clean-up mechanism conditional. For submissions that require job tracking, deferred clean-up is only required if any of the following conditions are met: 1) Channel's deterministic flag is not set 2) Rail-gating is enabled 3) Channel WDT is enabled 4) Buffer refcounting is enabled 5) Dependency on Sync Framework In case deferred clean-up is not needed, we clean-up a single job tracking resource in the submit path. For deterministic channels, we do not allow deferred clean-up to occur and fail any submits that require it. Bug 1795076 Change-Id: I4021dffe8a71aa58f12db6b58518d3f4021f3313 Signed-off-by: Sachit Kadle <skadle@nvidia.com> Reviewed-on: http://git-master/r/1220920 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> (cherry picked from commit b09f7589d5ad3c496e7350f1ed583a4fe2db574a) Reviewed-on: http://git-master/r/1223941 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: use inplace allocation in sync framework	Sachit Kadle	2016-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change is the first of a series of changes to support the usage of pre-allocated job tracking resources in the submit path. With this change, we still maintain a dynamically-allocated joblist, but make the necessary changes in the channel_sync & fence framework to use in-place allocations. Specifically, we: 1) Update channel sync framework routines to take in pre-allocated priv_cmd_entry(s) & gk20a_fence(s) rather than dynamically allocating themselves 2) Move allocation of priv_cmd_entry(s) & gk20a_fence(s) to gk20a_submit_prepare_syncs 3) Modify fence framework to have seperate allocation and init APIs. We expose allocation as a seperate API, so the client can allocate the object before passing it into the channel sync framework. 4) Fix clean_up logic in channel sync framework Bug 1795076 Change-Id: I96db457683cd207fd029c31c45f548f98055e844 Signed-off-by: Sachit Kadle <skadle@nvidia.com> Reviewed-on: http://git-master/r/1206725 (cherry picked from commit 9d196fd10db6c2f934c2a53b1fc0500eb4626624) Reviewed-on: http://git-master/r/1223933 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Fix invalid test for signaled sync_fences	Alex Waterman	2016-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a check that was backwards for signaled sync_fences. This would cause the code to not wait on some sync_fences that had not already signaled and wait on other fences that had signaled. Bug 1787348 Reviewed-on: http://git-master/r/1204710 (cherry picked from commit 75b94bb30f79c3a7a9992773dc8a93b507121006) Change-Id: I00b0f8a373a9954a5ad9ab31aff6423e91574153 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1221044 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Greatly simplify the semaphore detection	Alex Waterman	2016-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Greatly simplify and make more robust the gpu semaphore detection in sync_fences. Instead of using a magic number use the parent timeline of sync_pts. This will also work with multi-GPU setups using nvgpu since the timeline ops pointer will be the same across all instances of nvgpu. Bug 1732449 Reviewed-on: http://git-master/r/1203834 (cherry picked from commit 66eeb577eae5d10741fd15f3659e843c70792cd6) Change-Id: I4c6619d70b5531e2676e18d1330724e8f8b9bcb3 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1221042 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Optimize sync fence creation	Alex Waterman	2016-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only create sync-fences in the semaphore synchronization path when they are actually needed (i.e requested by userspace). Bug 1795076 Reviewed-on: http://git-master/r/1201564 (cherry picked from commit dc52d424a839e6c064c02b7f02905dd6a59a50af) Change-Id: Ieac6aef415678d4ea982683a955897c64959436e Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1221041 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Add debugging to the semaphore code	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add GPU debugging to the semaphore code. Bug 1732449 JIRA DNVGPU-12 Change-Id: I98466570cf8d234b49a7f85d88c834648ddaaaee Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1198594 (cherry picked from commit 420809cc31fcdddde32b8e59721676c67b45f592) Reviewed-on: http://git-master/r/1153671 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Revamp semaphore support	Alex Waterman	2016-06-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revamp the support the nvgpu driver has for semaphores. The original problem with nvgpu's semaphore support is that it required a SW based wait for every semaphore release. This was because for every fence that gk20a_channel_semaphore_wait_fd() waited on a new semaphore was created. This semaphore would then get released by SW when the fence signaled. This meant that for every release there was necessarily a sync_fence_wait_async() call which could block. The latency of this SW wait was enough to cause massive degredation in performance. To fix this a fast path was implemented. When a fence is passed to gk20a_channel_semaphore_wait_fd() that is backed by a GPU semaphore a semaphore acquire is directly used to block the GPU. No longer is a sync_fence_wait_async() performed nor is there an extra semaphore created. To implement this fast path the semaphore memory had to be shared between channels. Previously since a new semaphore was created every time through gk20a_channel_semaphore_wait_fd() what address space a semaphore was mapped into was irrelevant. However, when using the fast path a sempahore may be released on one address space but acquired in another. Sharing the semaphore memory was done by making a fixed GPU mapping in all channels. This mapping points to the semaphore memory (the so called semaphore sea). This global fixed mapping is read-only to make sure no semaphores can be incremented (i.e released) by a malicious channel. Each channel then gets a RW mapping of it's own semaphore. This way a channel may only acquire other channel's semaphores but may both acquire and release its own semaphore. The gk20a fence code was updated to allow introspection of the GPU backed fences. This allows detection of when the fast path can be taken. If the fast path cannot be used (for example when a fence is sync-pt backed) the original slow path is still present. This gets used when the GPU needs to wait on an event from something which only understands how to use sync-pts. Bug 1732449 JIRA DNVGPU-12 Change-Id: Ic0fea74994da5819a771deac726bb0d47a33c2de Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1133792 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Balance curly braces	Alex Waterman	2016-06-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some of the conditionally compiled code in the nvgpu driver there are places where the code looks like: #ifdef LINUX_VERSION_CODE < KERNEL_VERSION(3,18,0) some-loop { #else a-diff-loop { #endif /* Some code... */ } This leaves unbalanced curley braces: two open braces for one close brace. This messes up some editors syntax highlighting and auto- indentation features. This patch puts in the extra brace. It's not necessary for compiling code but it makes some editors much happier. Change-Id: Ida28bc001cc840fe52a43982db934d49c07cc7d3 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1153668 Reviewed-by: Konsta Holtta <kholtta@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: suppress prints in submit path	Deepak Nibade	2016-05-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we run out of gpfifo space or private command buffer space, we have error spew like below : __gk20a_channel_syncpt_incr: not enough priv cmd buffer space gk20a_submit_channel_gpfifo: fail Dumping these prints to UART cause increase in submit latencies But on these failures, we return -ENOSPC to UMD and then UMD retries the submit, hence it might be unnecessary to dump these prints Hence, remove the error prints of insufficient space and use gk20a_dbg_fn() instead of gk20a_err() to print failure in gk20a_submit_channel_gpfifo() Bug 200202653 Change-Id: I49efd7c6c554dd4fbfa4e66d196eb352e69f92c6 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1152378 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: use mem_desc for semaphores	Konsta Holtta	2016-05-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace manual buffer allocation and cpu_va pointer accesses with gk20a_gmmu_{alloc,free}() and gk20a_mem_{rd,wr}() using a struct mem_desc in gk20a_semaphore_pool, for buffer aperture flexibility. JIRA DNVGPU-23 Change-Id: I394c38f407a9da02480bfd35062a892eec242ea3 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1146684 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: use mem_desc in priv_cmd_entry	Konsta Holtta	2016-05-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the plain cpu pointer accesses with gk20a_mem_wr32(), and use a reference to the underlying mem_desc (within priv_cmd_queue) paired with an offset, for buffer aperture flexibility. JIRA DNVGPU-21 JIRA DNVGPU-23 Change-Id: I317672c94bb682bb895f9ed3e8116729c8bb7f4b Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1145922 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: implement sync refcounting	Deepak Nibade	2016-04-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently free sync when we find job list empty If aggressive_sync is set to true, we try to free sync during channel unbind() call But we rarely free sync from channel_unbind() call since freeing it when job list is empty is aggressive enough Hence remove sync free code from channel_unbind() Implement refcounting for sync: - get a refcount while submitting a job (and allocate sync if it is not allocated already) - put a refcount while freeing the job - if refcount==0 and if aggressive_sync_destroy is set, free the sync - if aggressive_sync_destroy is not set, we will free the sync during channel close time Bug 200187553 Change-Id: I74e24adb15dc26a375ebca1fdd017b3ad6d57b61 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1120410 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: support kernel-3.10 version	Deepak Nibade	2016-04-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make necessary changes to support nvgpu on kernel-3.10 This includes below changes - PROBE_PREFER_ASYNCHRONOUS is defined only for K3.10 - Fence handling and struct sync_fence is different between K3.10 and K3.18 - variable status in struct sync_fence is atomic on K3.18 whereas it is int on K3.10 - if SOC == T132, set soc_name = "tegra13x" - ioremap_cache() is not defined on K3.10 ARM versions, hence use ioremap_cached() Bug 200188753 Change-Id: I18d77eb1404e15054e8510d67c9a61c0f1883e2b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1121092 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Use device instead of platform_device	Terje Bergstrom	2016-04-08
\| \| \| \| \| \| \| \| \|	Use struct device instead of struct platform_device wherever possible. This allows adding other bus types later. Change-Id: I1657287a68d85a542cdbdd8a00d1902c3d6e00ed Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1120466
*	host: move nvhost to its own git repo	Alex Van Brunt	2016-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move nvhost out of the common Linux repo and into its own git repo. By doing this, the same nvhost driver can work on different Linux kernel versions. Previously android/sync.h was referenced relative to kernel/drivers/video/tegra/host. However, host moved to a completely different part of the tree. Instead, reference it relative to kernel/include. bug 1749413 Change-Id: Ic7f94093c712e5b64c9b3b660d6fce5d18e59bc0 Signed-off-by: Alex Van Brunt <avanbrunt@nvidia.com> Reviewed-on: http://git-master/r/1120544 Reviewed-by: Arto Merilainen <amerilainen@nvidia.com> Tested-by: Arto Merilainen <amerilainen@nvidia.com>
*	gpu: nvgpu: fix a sync_fence leak	Yunbo Wang	2016-03-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes a bug where reference to sync_fence is not closed before return. Bug 200171146 Change-Id: If174eb124bd69692bab4cc8629a103517d7cfef1 Signed-off-by: Yunbo Wang <yunbow@nvidia.com> Reviewed-on: http://git-master/r/1029844 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Eric Miao <emiao@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Increase semaphore count	Alex Waterman	2016-01-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increase the semaphore count per channel. Some channels were running out of semaphores. The original limit was 255 (256 fits in 1 page, but the 0th semaphore is used to return error codes from the allocator). Easy fix was to simply increase the number of semaphores each channel is allocated to 1024. Bug 1604892 Change-Id: I163e24b8d42a3dc1bb9b418dadc0c8532aff9adb Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/935911 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> GVS: Gerrit_Virtual_Submit
*	gpu: nvgpu: Fix semaphore race condition	Alex Waterman	2016-01-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A race condition existed in gk20a_channel_semaphore_wait_fd(). In some instances the semaphore underlying the sync_fence being waited on would have already signaled. This would cause the subsequent sync_fence_wait_async() call to return 1 and do nothing. Normally, the sync_fence_wait_async() call would release the newly created semaphore but in the above case that would not happen and hang any channel waiting on that semaphore. To fix this problem if sync_fence_wait_async() returns 1 immediately release the newly created semaphore. Bug 1604892 Change-Id: I1f5e811695bb099f71b7762835aba4a7e27362ec Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/935910 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> GVS: Gerrit_Virtual_Submit
*	gpu: nvgpu: return ENOSPC if no private command buffer space	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we run out of gpfifo space or private command buffer space, we currently return EAGAIN as error code Instead of EAGAIN, return ENOSPC as error code so that caller (user space) can read the error code and do some re-trials As the jobs are processed, it is possible to free up some space. And hence such re-trials could succeed Bug 1715291 Change-Id: I9a2ed7134d2496b383899b3c02c0e70452b26115 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/929402 Reviewed-by: Sachin Nikam <snikam@nvidia.com> Tested-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: create sync_fence only if needed	Deepak Nibade	2015-12-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we create sync_fence (from nvhost_sync_create_fence()) for every submit But not all submits request for a sync_fence. Also, nvhost_sync_create_fence() API takes about 1/3rd of the total submit path. Hence to optimize, we can allocate sync_fence only when user explicitly asks for it using (NVGPU_SUBMIT_GPFIFO_FLAGS_FENCE_GET && NVGPU_SUBMIT_GPFIFO_FLAGS_SYNC_FENCE) Also, in CDE path from gk20a_prepare_compressible_read(), we reuse existing fence stored in "state" and that can result into not returning sync_fence_fd when user asked for it Hence, force allocation of sync_fence when job submission comes from CDE path Bug 200141116 Change-Id: Ia921701bf0e2432d6b8a5e8b7d91160e7f52db1e Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/812845 (cherry picked from commit 5fd47015eeed00352cc8473eff969a66c94fee98) Reviewed-on: http://git-master/r/837662 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: set aggressive_sync_destroy at runtime	Deepak Nibade	2015-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently set "aggressive_destroy" flag to destroy sync object statically and for each sync object Move this flag to per-platform structure so that it can be set per-platform for all the sync objects Also, set the default value of this flag as "false" and set it to "true" once we have more than 64 channels in use Bug 200141116 Change-Id: I1bc271df4f468a4087a06a27c7289ee0ec3ef29c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/822041 (cherry picked from commit 98741e7e88066648f4f14490c76b61dbff745103) Reviewed-on: http://git-master/r/835800 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix memory corrupt	Richard Zhao	2015-08-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	replace sprinf with snprintf in func gk20a_channel_syncpt_create. sync point name can be long. Bug 1638853 Change-Id: Ie305d04edfbb299c8b1241eca52101439bb4a6c6 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/769113 Reviewed-on: http://git-master/r/776424 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: remove gk20a_busy() from channel_syncpt_incr()	Deepak Nibade	2015-08-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gk20a_busy() is already called on all the paths to __gk20a_channel_syncpt_incr() i.e. in gk20a_submit_channel_gpfifo() hence remove the redundant gk20a_busy() call since it causes deadlock scenario with VPR resize use case Bug 200128257 Bug 1645760 Bug 200114947 Bug 200124519 Change-Id: I4cd47b7e7cdc92aaeda17256a99f2ba93833a3b3 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/778341 (cherry picked from commit 5a5dc5b5a9d38a5e8d5c1ca29dc6de425c00b605) Reviewed-on: http://git-master/r/779070 Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: remove gk20a_busy() from channel_syncpt_update()	Deepak Nibade	2015-08-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gk20a_busy() was added to gk20a_channel_syncpt_update() for possible case of channel deletion But API to delete a channel (i.e. gk20a_free_channel()) is already called in paths which ensure gk20a_busy() is called before deleting the channel Hence, remove redundant gk20a_busy()/idle() calls This also fixes a deadlock scenario with VPR resize use case Bug 200128257 Bug 1645760 Bug 200114947 Bug 200124519 Change-Id: I05dc739b3be88af2ba22b0a667e5004d8100bf6f Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/778340 (cherry picked from commit 306282aa950201cf1ae91a5cc48d75719b179d19) Reviewed-on: http://git-master/r/779069 Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: add per-channel refcounting	Konsta Holtta	2015-06-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add reference counting for channels, and wait for reference count to get to 0 in gk20a_channel_free() before actually freeing the channel. Also, change free channel tracking a bit by employing a list of free channels, which simplifies the procedure of finding available channels with reference counting. Each use of a channel must have a reference taken before use or held by the caller. Taking a reference of a wild channel pointer may fail, if the channel is either not opened or in a process of being closed. Also, add safeguards for protecting accidental use of closed channels, specifically, by setting ch->g = NULL in channel free. This will make it obvious if freed channel is attempted to be used. The last user of a channel might be the deferred interrupt handler, so wait for deferred interrupts to be processed twice in the channel free procedure: once for providing last notifications to the channel and once to make sure there are no stale pointers left after referencing to the channel has been denied. Finally, fix some races in channel and TSG force reset IOCTL path, by pausing the channel scheduler in gk20a_fifo_recover_ch() and gk20a_fifo_recover_tsg(), while the affected engines have been identified, the appropriate MMU faults triggered, and the MMU faults handled. In this case, make sure that the MMU fault does not attempt to query the hardware about the failing channel or TSG ids. This should make channel recovery more safe also in the regular (i.e., not in the interrupt handler) context. Bug 1530226 Bug 1597493 Bug 1625901 Bug 200076344 Bug 200071810 Change-Id: Ib274876908e18219c64ea41e50ca443df81d957b Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Signed-off-by: Sami Kiminki <skiminki@nvidia.com> Reviewed-on: http://git-master/r/448463 (cherry picked from commit 3f03aeae64ef2af4829e06f5f63062e8ebd21353) Reviewed-on: http://git-master/r/755147 Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: use updated APIs to get/put syncpoint	Deepak Nibade	2015-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pass host1x device pointer for below APIs : nvhost_get_syncpt_host_managed() nvhost_syncpt_put_ref_ext() Also, compose a name for syncpoints in GPU driver itself. This name will be created as combination of device name and channel index Bug 1611482 Change-Id: Id1ddd0e87e0272ddb0758713d6b6c2544bc36bf4 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/751908 Reviewed-by: Arto Merilainen <amerilainen@nvidia.com> Tested-by: Arto Merilainen <amerilainen@nvidia.com>
*	gpu: nvgpu: Fix invalid GPFIFO entries	Alex Waterman	2015-06-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the addition of the buddy allocator often times push buffers are allocated by the kernel in high GVA memory regions. These addresses, when written into a GPFIFO entry, have bits set in entry1 of the GPFIFO command. As a result, if no length is set, then these address bits will be interpreted as opcodes by the GPU. The bug fixed by this patch was caused by a wait_cmd being inserted into the GPFIFO with an address of a pushbuffer above 4GB and a zero length. This occured becasue the code that creates the wait_cmd was able to return what appeared to be a valid priv_cmd_entry even though there was nothing in that command. This bug does not appear before the buddy allocator because the FFF allocator always starts allocating from low addresses. As such when a channel's GPFIFO is allocated it gets an address below 32bits. The, because no higher address bits are set, entry1 of the GPFIFO is simply 0 and the GPU trets the command as a no-op. Change-Id: I9c1e600c368b55626e99f6f712f1821148bbb76d Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/752080 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: drop syncpoint refcount instead of direct free	Deepak Nibade	2015-06-01
\| \| \| \| \| \| \| \| \| \| \| \| \|	Drop host1x syncpoint refcount with nvhost_syncpt_put_ref_ext() instead of freeing it directly Bug 1646883 Change-Id: Ib213e58031a9302e683f8d13ebb4e1f913206464 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/747150 GVS: Gerrit_Virtual_Submit Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
*	gpu: nvgpu: Reconfigure instance block with syncpt	Terje Bergstrom	2015-05-05
\| \| \| \| \| \| \| \| \| \| \| \|	Resetup RAMFC once sync point id is allocated for a channel. Change-Id: Idbac406bea1c94c89ef587dda08fddc740c1fadb Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/711302 Reviewed-on: http://git-master/r/737526 Reviewed-by: Alexander Van Brunt <avanbrunt@nvidia.com> Tested-by: Alexander Van Brunt <avanbrunt@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: use correct API to check valid syncpt	Deepak Nibade	2015-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Below check assumes that available syncpt range starts from 0 id >= nvhost_syncpt_nb_pts_ext() Instead of using this API, use nvhost_syncpt_is_valid_pt_ext() which validates the syncpt id against both upper and lower boundaries Bug 1611482 Change-Id: I7c4465a2bc84b63fefaa17c64f02582885924c5e Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/711211 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
*	gpu: nvgpu: Reset sync point at alloc/free	Terje Bergstrom	2015-04-04
\| \| \| \| \| \| \| \|	Change-Id: I8753e47ef4d3f4b3645ed6c6e604449d81d3da4b Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/709061 (cherry picked from commit cc07f316334b88cc18070fba9dd9149ba193bd38) Reviewed-on: http://git-master/r/707980
*	gpu: gk20a: Removing erroneous increment statement	Ishan Mittal	2015-03-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	This must have occurred while rebasing dev-kernel-3.10 over kernel 3.18. This change corrects the mistake. Change-Id: I11fbc11105a032198828e8bc31da5ab92af0ffdb Signed-off-by: Ishan Mittal <imittal@nvidia.com> Reviewed-on: http://git-master/r/720240 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
*	host/gpu: Upgrade to new fence-based sync implementation	Dan Willemsen	2015-03-18
\| \| \| \|	Signed-off-by: Dan Willemsen <dwillemsen@nvidia.com>
*	Revert "gpu: nvgpu: Enable syncpt reclaim only on gm20b"	Timo Alho	2015-03-18
\| \| \| \| \| \| \| \| \| \|	This reverts commit 8eefb93c21934b101d7f423c38d9ea384a45fad6. Bug 1585422 Change-Id: I217e0ffe6c230ee3c63d9aec1c48ce9c41770468 Signed-off-by: Timo Alho <talho@nvidia.com> Reviewed-on: http://git-master/r/659426
*	gpu: nvgpu: Enable syncpt reclaim only on gm20b	Terje Bergstrom	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \|	gm20b has more channels than sync points. We use aggressive reclaim of sync points to offset that. Disable aggressive reclaim for gk20a because it is not needed there. Bug 1583849 Change-Id: I2a74b0504150a54cb8a97016effe20c5d905ac95 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/657095 Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
*	gpu: nvgpu: fix sparse warnings	Deepak Nibade	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix below sparse warnings : warning: Using plain integer as NULL pointer warning: symbol <variable/funcion> was not declared. Should it be static? warning: Initializer entry defined twice Also, remove dead functions Bug 1573254 Change-Id: I29d71ecc01c841233cf6b26c9088ca8874773469 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/593363 Reviewed-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: Remove usage of KEPLER_C syncpt incr	Terje Bergstrom	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using KEPLER_C for doing sync point increment has side effects. It adds a SetObject method, which changes channel state that not all user space accounts for. Bug 1462255 Bug 1497928 Bug 1559462 Change-Id: I5c422ad8ca94fba15cad9bd232f7a10d94aa0973 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/554478 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit
*	gpu: nvgpu: Fix semaphore refcounting	Arto Merilainen	2015-03-18
\| \| \| \| \| \| \| \| \| \|	This patch fixes a refcounting issue in semaphore handling. Change-Id: I03327c60ed6923a90663f0b845566e81af4b94d4 Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Reviewed-on: http://git-master/r/453056 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Add gk20a_fence type	Lauri Peltonen	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When moving compression state tracking and compbit management ops to kernel, we need to attach a fence to dma-buf metadata, along with the compbit state. To make in-kernel fence management easier, introduce a new gk20a_fence abstraction. A gk20a_fence may be backed by a semaphore or a syncpoint (id, value) pair. If the kernel is configured with CONFIG_SYNC, it will also contain a sync_fence. The gk20a_fence can easily be converted back to a syncpoint (id, value) parir or sync FD when we need to return it to user space. Change gk20a_submit_channel_gpfifo to return a gk20a_fence instead of nvhost_fence. This is to facilitate work submission initiated from kernel. Bug 1509620 Change-Id: I6154764a279dba83f5e91ba9e0cb5e227ca08e1b Signed-off-by: Lauri Peltonen <lpeltonen@nvidia.com> Reviewed-on: http://git-master/r/439846 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Double syncpoint increments	Arto Merilainen	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \|	gm20b/gm20x requires incrementing syncpoints twice to ensure that the data has reached memory in all cases. This patch modifies increment push buffer to account this requirement. Bug 1491360 Change-Id: I5c2899b26ce0e1cdf9408bb9aaa576fc3054480f Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Reviewed-on: http://git-master/r/437675 Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: remove redundant busy()/idle() calls	Deepak Nibade	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gk20a_busy() call in channel_syncpt_incr() and corresponding gk20a_idle() call in channel_update() are redundant since they are already encapsulated inside another pair of busy/idle calls This busy/idle pair will be called only from submit_gpfifo() and submit_gpfifo() already has its own busy/idle which it preserves for whole path and hence this redundant pair can be removed Also, this prevents a dead lock scenario while do_idle() is in progress as follows : - in submit_gpfifo() we call first gk20a_busy() which acquires busy read semaphore - in do_idle() we acquire busy write semaphore and wait for current jobs to finish - now submit_gpfifo() encounters second gk20a_busy() and requests busy read semaphore again - this results in dead lock where do_idle() is waiting for submit_gpfifo() to complete and submit_gpfifo() is waiting for busy lock held by do_idle() and hence it cannot complete bug 1529160 Change-Id: I96e4368352f693e93524f0f61689b4447e5331ea Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/434191 (cherry picked from commit c4315c6caa42bab72ba6017c7ded25f4e9363dec) Reviewed-on: http://git-master/r/435132 Reviewed-by: Sachin Nikam <snikam@nvidia.com> Tested-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: Add semaphore based gk20a_channel_sync	Lauri Peltonen	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add semaphore implementation of the gk20a_channel_sync interface. Each channel has one semaphore pool, which is mapped as read-write to the channel vm. We allocate one or two semaphores from the pool for each submit. The first semaphore is only needed if we need to wait for an opaque sync fd. In that case, we allocate the semaphore, and ask GPU to wait for it's value to become 1 (semaphore acquire method). We also queue a kernel work that waits on the fence fd, and subsequently releases the semaphore (sets its value to 1) so that the command buffer can proceed. The second semaphore is used on every submit, and is used for work completion tracking. The GPU sets its value to 1 when the command buffer has been processed. The channel jobs need to hold references to both semaphores so that their backing semaphore pool slots are not reused while the job is in flight. Therefore gk20a_channel_fence will keep a reference to the semaphore that it represents (channel fences are stored in the job structure). This means that we must diligently close and dup the gk20a_channel_fence objects to avoid leaking semaphores. Bug 1450122 Bug 1445450 Change-Id: Ib61091a1b7632fa36efe0289011040ef7c4ae8f8 Signed-off-by: Lauri Peltonen <lpeltonen@nvidia.com> Reviewed-on: http://git-master/r/374844 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Allow suppressing WFI on submit	Terje Bergstrom	2015-03-18
\| \| \| \| \| \| \| \| \| \| \|	Allow suppressing WFI when submitting work and requesting a fence back. Bug 1491545 Change-Id: Ic3d061bb4f116cf7ea68dbd6a1b2ace9f11d0ab5 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/390457
*	gpu: nvgpu: gk20a: fix syncpt names for gk20a	Deepak Nibade	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nvhost_get_syncpt_host_managed() creates syncpt name based on platform_device pointer passed to it Passing host1x's pointer to this API results in setting gk20a syncpt names as "host1x_0" which is conflicting Hence to restore this pass gk20a's device pointer which gives syncpt names as "gk20a.0_0" Also, add a syncpt check for sycnpt received. Bug 1305024 Change-Id: I4ff96c7c9ebff2dca385c5787a85b4a9451b9514 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/410121 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Arto Merilainen <amerilainen@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Register as subdomain of host1x	Terje Bergstrom	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add gk20a as a sub power domain of host1x. This enforces keeping host1x on when using gk20a. Bug 200003112 Change-Id: I08db595bc7b819d86d33fb98af0d8fb4de369463 Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Reviewed-on: http://git-master/r/407006 (cherry picked from commit 009812b3e510518740e9c7e89b8b8b80439fe26a) Reviewed-on: http://git-master/r/408013 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	Revert "gpu: nvgpu: Keep host1x on when GPU on"	Matt Pedro	2015-03-18
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit 20d48a759b032116e3092e1df76518065da59879. Change-Id: I93718a314b70ee9284a83ca69964883e670ad78d Signed-off-by: Matt Pedro <mapedro@nvidia.com> Reviewed-on: http://git-master/r/407969 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Get host1x device from DTS	Arto Merilainen	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the gpu driver assumes that the GPU is a child of host1x. This is an invalid assumption and therefore we need to get the host1x device from device tree based on nvidia,host1x property. Bug 1311528 Bug 1434573 Change-Id: I097e39369aaa15ab6652cd23f353f88f7c2b9c48 Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Reviewed-on: http://git-master/r/395664 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: gk20a: set syncpt_aggressive_destroy	Deepak Nibade	2015-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set the flag syncpt_aggressive_destroy to enable destroying syncpts aggressively Bug 1305024 Change-Id: Iedff9c6bdb6bbe02057972733126ce685daa8d7f Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/400234 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Shridhar Rasal <srasal@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>