nvgpu.git - Tegra GPU Driver. Originally from nv-tegra.nvidia.com/linux-nvgpu.git.

	Commit message (Collapse)	Author	Age
*	gpu: nvgpu: update trace for sched params	Thomas Fleury	2016-05-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use an inline function instead of a macro to "expand" all channel parameters. Jira EVLR-244 Jira EVLR-318 Change-Id: I4e8c5ee6bc9da36564af171be809f50dd2dfd439 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: http://git-master/r/1150050 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix engine reset in FECS trace	Thomas Fleury	2016-05-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In virtualization case, VM server is the only one allowed to write to ctxsw ring buffer. It will also generate an event in case of engine reset. Only generate a tracepoint on Guest OS side. EVLR-314 Change-Id: I2cb09780a9b41237fe196ef1f3515923f36a24a4 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: http://git-master/r/1130743 (cherry picked from commit 4bbf9538e2a3375eb86b2feea6c605c3eec2ca40) Reviewed-on: http://git-master/r/1133614 (cherry picked from commit 2076d944db41b37143c27795b3cffd88e99e0b00) Reviewed-on: http://git-master/r/1150046 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: add ctxsw channel reset event	Thomas Fleury	2016-05-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generate a ctxsw channel reset when engine needs to be reset. This event is generated by the driver, while other events are generated by FECS. JIRA ELVR-314 Change-Id: I7791cf3e538782464c37c442c871acb177484566 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: http://git-master/r/1129029 (cherry picked from commit 114038a1a5d9e8941bc53f3e95115b01dd1f8c6e) Reviewed-on: http://git-master/r/1134379 (cherry picked from commit 15fa2a7b48a0937dfd449ca0c4ed5ad3a863d6bf) Reviewed-on: http://git-master/r/1123916 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: use mem_desc in priv_cmd_entry	Konsta Holtta	2016-05-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the plain cpu pointer accesses with gk20a_mem_wr32(), and use a reference to the underlying mem_desc (within priv_cmd_queue) paired with an offset, for buffer aperture flexibility. JIRA DNVGPU-21 JIRA DNVGPU-23 Change-Id: I317672c94bb682bb895f9ed3e8116729c8bb7f4b Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1145922 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: refactor gk20a_mem_{wr,rd} for vidmem	Konsta Holtta	2016-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To support vidmem, pass g and mem_desc to the buffer memory accessor functions. This allows the functions to select the memory access method based on the buffer aperture instead of using the cpu pointer directly (like until now). The selection and aperture support will be in another patch; this patch only refactors these accessors, but keeps the underlying functionality as-is. gk20a_mem_{rd,wr}32() work as previously; add also gk20a_mem_{rd,wr}() for byte-indexed accesses, gk20a_mem_{rd,wr}_n() for memcpy()-like functionality, and gk20a_memset() for filling buffers with a constant. The 8 and 16 bit accessor functions are removed. vmap()/vunmap() pairs are abstracted to gk20a_mem_{begin,end}() to support other types of mappings or conditions where mapping the buffer is unnecessary or different. Several function arguments that would access these buffers are also changed to take a mem_desc instead of a plain cpu pointer. Some relevant occasions are changed to use the accessor functions instead of cpu pointers without them (e.g., memcpying to and from), but the majority of direct accesses will be adjusted later, when the buffers are moved to support vidmem. JIRA DNVGPU-23 Change-Id: I3dd22e14290c4ab742d42e2dd327ebeb5cd3f25a Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1121143 Reviewed-by: Ken Adams <kadams@nvidia.com> Tested-by: Ken Adams <kadams@nvidia.com>
*	gpu: nvgpu: separate IOCTL to set preemption mode	Deepak Nibade	2016-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add separate IOCTL NVGPU_IOCTL_CHANNEL_SET_PREEMPTION_MODE to allow setting preemption modes from UMD Define preemption modes in nvgpu.h and use them everywhere Remove mode definitions from mm_gk20a.h Also, we support setting only one preemption mode in a channel But it is possible to have multiple preemption modes (one from graphics and one from compute) set simultaneously Hence, update struct gr_ctx_desc to include two separate preemption modes (graphics_preempt_mode and compute_preempt_mode) API NVGPU_IOCTL_CHANNEL_SET_PREEMPTION_MODE also supports setting two separate preemption modes i.e. one for graphics and one for compute Make necessary changes in code to support two preemption modes Bug 1646259 Change-Id: Ia1dea19e609ba8cc0de2f39ab6c0c4cd6b0a752c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1131805 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	nvgpu: vgpu: create fifo.force_reset_ch in gpu_ops	Haley Teng	2016-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	gk20a_fifo_force_reset_ch() does not support vgpu now, so we need to create a function pointer in gpu_ops and assign it differently for vgpu and non-vgpu. Bug 200184349 Change-Id: I5f8f4f731b4b970c4ff8de65531f25568e7691b6 Signed-off-by: Haley Teng <hteng@nvidia.com> Reviewed-on: http://git-master/r/1130420 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: update priv_cmdbuff computation	Alex Waterman	2016-05-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the priv_cmdbuff computation to take into account the amount of memory semaphores take. Since semaphores always require more memory than sync-pts the sync-pt computation has been dropped. Bug 1732449 JIRA DNVGPU-12 Change-Id: Ic05c26b4d1ed9cbd03d3239655c4607bb418396c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1141420 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Add trace and debugfs for sched params	Thomas Fleury	2016-05-05
\| \| \| \| \| \| \| \| \| \| \| \|	JIRA EVLR-244 JIRA EVLR-318 Change-Id: Ie95f42212dadcf2d0c1737eeb28812afb03b712f Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: http://git-master/r/1120603 GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Ken Adams <kadams@nvidia.com>
*	gpu: nvgpu: cancel clean up before update_fn_work	Deepak Nibade	2016-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_suspend(), we first cancel worker thread update_fn_work and then cancel job clean up worker But it is possible that we re-schedule update_fn_work from job cleanup worker after it was cancelled And in that case worker update_fn_work might run after we poweroff GPU Fix this by first cancelling job clean up worker and then update_fn_work worker Bug 200187905 Change-Id: Ia192c515702f14becf60d92c6471d8c0e892551e Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1120426 (cherry picked from commit b132b6c5aa9ab88c6733f36906f1b874114ec72d) Reviewed-on: http://git-master/r/1134888 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: return from worker if gpu is not up	Deepak Nibade	2016-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During GPU shutdown path, it is possible that we shut down the GPU while worker thread is still running gk20a_channel_update() Hence before accessing gp_put/get, check if GPU is up or not Bug 200166139 Change-Id: Iba3ec173041a84527c4700a93f20564a842cfb01 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/935193 (cherry picked from commit c81ea5fe383c44e872754b363968af57d84225ac) Reviewed-on: http://git-master/r/1121917 Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Flush FB before checking semaphores	Alex Waterman	2016-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before checking semaphore values to determine if jobs have been completed flush the FB. If this is not done, despite the sempahore memory being mapped as volatile in the GMMU, outstanding writes can still be pending. Bug 1732449 JIRA DNVGPU-12 Change-Id: I67b596cd23a5465af05d6d173641a579cb7f168c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1133787 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Schedule channel update when jobs actually finish	Alex Waterman	2016-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not schedule channel update call backs unless a job is actually finished. This saves a lot of call backs to the CDE code that don't do anything when semaphores are enabled. Bug 1732449 JIRA DNVGPU-12 Change-Id: I2f9a78498b08ebca44ee6a5171931a07721767f1 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1133786 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix sparse warnings	Deepak Nibade	2016-04-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix below sparse warnings : drivers/gpu/nvgpu/gk20a/gk20a.c:764:5: warning: symbol 'gk20a_pm_finalize_poweron' was not declared. Should it be static? drivers/gpu/nvgpu/gk20a/channel_gk20a.c:2504:14: warning: symbol 'gk20a_event_id_poll' was not declared. Should it be static? drivers/gpu/nvgpu/gk20a/channel_gk20a.c:2538:5: warning: symbol 'gk20a_event_id_release' was not declared. Should it be static? Bug 200067946 Bug 200088648 Change-Id: I5c23e7ee09c1a18fe2eeff12f80a3c2bf73120ef Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1128060 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Sri Krishna Chowdary <schowdary@nvidia.com>
*	gpu: nvgpu: make jobs_lock more fine grained	Deepak Nibade	2016-04-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While processing all the jobs in gk20a_channel_clean_up_jobs(), We currently acquire jobs_lock, traverse the list, clean up the jobs, and then release the lock But in this case we might hold the lock for too long blocking the submit path Hence make jobs_lock more fine grained by restricting it for list accesses only Bug 200187553 Change-Id: If82af8ff386f7bc29061cfd57fdda7df62f11c17 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1120412 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: remove submit lock	Deepak Nibade	2016-04-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove submit lock since we have moved to use more fine-grained locks Remove API check_gp_put() since we cannot call it in submit path due to latencies and we cannot call it in gk20a_channel_clean_up_jobs() anymore since it will fail there without the lock Bug 200187553 Change-Id: I05b9fa95c9009000e13232d8fa567336eeee11c6 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1120411 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: implement sync refcounting	Deepak Nibade	2016-04-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently free sync when we find job list empty If aggressive_sync is set to true, we try to free sync during channel unbind() call But we rarely free sync from channel_unbind() call since freeing it when job list is empty is aggressive enough Hence remove sync free code from channel_unbind() Implement refcounting for sync: - get a refcount while submitting a job (and allocate sync if it is not allocated already) - put a refcount while freeing the job - if refcount==0 and if aggressive_sync_destroy is set, free the sync - if aggressive_sync_destroy is not set, we will free the sync during channel close time Bug 200187553 Change-Id: I74e24adb15dc26a375ebca1fdd017b3ad6d57b61 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1120410 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: add lock for fences	Deepak Nibade	2016-04-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All pre/post fence accesses in last_submit are currently protected by submit lock In order to remove the submit lock, move all fence accesses under own lock i.e. fence_lock Bug 200187553 Change-Id: I0132d1933dc92db8c5ed8c9311e49a030aa2d38c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1120409 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: support binding multiple channels to a debug session	Deepak Nibade	2016-04-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently bind only one channel to a debug session But some use cases might need multiple channels bound to same debug session Add this support by adding a list of channels to debug session. List structure is implemented as struct dbg_session_channel_data List node dbg_s_list_node is currently defined in struct dbg_session_gk20a. But this is inefficient when we need to add debug session to multiple channels Hence add new reference structure dbg_session_data to store dbg_session pointer and list entry For each NVGPU_DBG_GPU_IOCTL_BIND_CHANNEL call, create two reference structure dbg_session_channel_data for channel and dbg_session_data for debug session and bind them together Define API nvgpu_dbg_gpu_get_session_channel() which will get first channel in the list of debug session Use this API wherever we refer to channel bound to debug session Remove dbg_sessions define in struct gk20a since it is not being used anywhere Add new API NVGPU_DBG_GPU_IOCTL_UNBIND_CHANNEL to support unbinding of channel from debug sesssion Bug 200156699 Change-Id: I3bfa6f9cd5b90e7254a75c7e64ac893739776b7f Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1120331 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Use sysmem aperture for SoC memory	Terje Bergstrom	2016-04-15
\| \| \| \| \| \| \| \| \|	In Tegra GPU, SoC memory has to be accessed as vidmem. In discrete GPU, it has to be accessed as sysmem. Change-Id: I4efe71b54a9a32f0bf1f02ec4016ed74405a14c5 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1120468
*	gpu: nvgpu: Support GPUs with no physical mode	Terje Bergstrom	2016-04-13
\| \| \| \| \| \| \| \| \| \| \|	Support GPUs which cannot choose between SMMU and physical addressing. Change-Id: If3256fa1bc795a84d039ad3aa63ebdccf5cc0afb Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1120469 GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com>
*	gpu: nvgpu: Use device instead of platform_device	Terje Bergstrom	2016-04-08
\| \| \| \| \| \| \| \| \|	Use struct device instead of struct platform_device wherever possible. This allows adding other bus types later. Change-Id: I1657287a68d85a542cdbdd8a00d1902c3d6e00ed Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1120466
*	gpu: nvgpu: add trace event for channel reset	Thomas Fleury	2016-04-07
\| \| \| \| \| \| \| \|	Change-Id: I319e877978b7f483108ef8f67c05702b71709f62 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: http://git-master/r/1120501 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: post BPT_INT/PAUSE and BLOCKING_SYNC events	Deepak Nibade	2016-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Post EVENT_ID_BPT_INT when bpt.int is pending Post EVENT_ID_BPT_PAUSE when bpt.pause is pending Post EVENT_ID_BLOCKING_SYNC whenever there is non-stalling semaphore interrupt indicating work completion from GR/CE2 engine Bug 200089620 Change-Id: I91b7bf48f8585f0d318298fc0c4a66d42055f0a7 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1112274 (cherry picked from commit d2b744b1f9acac56435cd7e7ab9a7a845579ef24) Reviewed-on: http://git-master/r/1120321 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: APIs to post event id events	Deepak Nibade	2016-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add below channel and TSG APIs to post events on event_id interface gk20a_channel_event_id_post_event() gk20a_tsg_event_id_post_event() Bug 200089620 Change-Id: I0cfadc9ffdb880b2410f97758fad47905c620db1 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1112267 (cherry picked from commit 9f50d7da4500af4dbf4dabe7916eda6fc220f4fb) Reviewed-on: http://git-master/r/1120320 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
*	gpu: nvgpu: add TSG support to channel event id	Deepak Nibade	2016-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add NVGPU_IOCTL_TSG_EVENT_ID_CTRL API for channel event id support to TSGs This API will accept an event_id (like BPT.INT or BPT.PAUSE), a command to enable the event, and return a file descriptor on which we can raise the event (if cmd=enable) Events generated for TSGs will reuse file operations "gk20a_event_id_ops" Bug 200089620 Change-Id: I2f563c6d3a0988eb670caac2d3c7c6795724792c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1030776 (cherry picked from commit 72b61fa266279038f013e582be80c21808e1038d) Reviewed-on: http://git-master/r/1120319 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
*	gpu: nvgpu: add channel event id support	Deepak Nibade	2016-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With NVGPU_IOCTL_CHANNEL_EVENTS_CTRL, nvgpu can raise events to User space. But user space cannot distinguish between various types of events. To overcome this, we need finer-grained API to deliver various events to user space. Remove old API NVGPU_IOCTL_CHANNEL_EVENTS_CTRL, and all the support for this API (we can remove this since User space has not started using this API at all) Add new API NVGPU_IOCTL_CHANNEL_EVENT_ID_CTRL which will accept an event_id (like BPT.INT or BPT.PAUSE), a command to enable the event, and return a file descriptor on which we can raise the event (if cmd=enable) Event is disabled when file descriptor is closed Add file operations "gk20a_event_id_ops" to support polling on event fd Also add API gk20a_channel_get_event_data_from_id() to get event_data of event from its id Bug 200089620 Change-Id: I5288f19f38ff49448c46338c33b2a927c9e02254 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1030775 (cherry picked from commit 5721ce2735950440bedc2b86f851db08ed593275) Reviewed-on: http://git-master/r/1120318 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
*	gpu: nvgpu: dump gpu status on channel wdt	Seshendra Gadagottu	2016-03-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	To get more useful debug info, dump gpu status on channel watchdog timeout. Bug 200163782 Change-Id: Ie160e3a65650b87f812e0c1d1d9b54814b45a1d7 Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/1115439 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Add support for FECS ctxsw tracing	Anton Vorontsov	2016-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bug 1648908 This commit adds support for FECS ctxsw tracing. Code is compiled conditionnaly under CONFIG_GK20_CTXSW_TRACE. This feature requires an updated FECS ucode that writes one record to a ring buffer on each context switch. On RM/Kernel side, the GPU driver reads records from the master ring buffer and generates trace entries into a user-facing VM ring buffer. For each record in the master ring buffer, RM/Kernel has to retrieve the vmid+pid of the user process that submitted related work. Features currently implemented: - master ring buffer allocation - debugfs to dump master ring buffer - FECS record per context switch (with both current and new contexts) - dedicated device for ctxsw tracing (access to VM ring buffer) - SOF generation (and access to PTIMER) - VM ring buffer allocation, and reconfiguration - enable/disable tracing at user level - event-based trace filtering - context_ptr to vmid+pid mapping - read system call for ctxsw dev - mmap system call for ctxsw dev (direct access to VM ring buffer) - poll system call for ctxsw dev - save/restore register on ELPG/CG6 - separate user ring from FECS ring handling Features requiring ucode changes: - enable/disable tracing at FECS level - actual busy time on engine (bug 1642354) - master ring buffer threshold interrupt (P1) - API for GPU to CPU timestamp conversion (P1) - vmid/pid/uid based filtering (P1) Change-Id: I8e39c648221ee0fa09d5df8524b03dca83fe24f3 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: http://git-master/r/1022737 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: add support to set channel timeslice	Aingara Paramakuru	2016-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of improving GPU scheduling, userspace can now set a channel's timeslice, within reasonable limits imposed by the kernel driver. JIRA VFND-1312 Bug 1729664 Change-Id: I4c3430c43437889b8685f12988d4b967bb7877bb Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com> Reviewed-on: http://git-master/r/1020917 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: enable semaphore acquire timeout only when timeouts_enabled is set	Richard Zhao	2016-03-22
\| \| \| \| \| \| \| \| \| \|	Bug 1727687 Change-Id: I7a7a4a2011b029474122fdbfbeb02b6302a5902b Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/1011486 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: improve channel interleave support	Aingara Paramakuru	2016-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, only "high" priority bare channels were interleaved between all other bare channels and TSGs. This patch decouples priority from interleaving and introduces 3 levels for interleaving a bare channel or TSG: high, medium, and low. The levels define the number of times a channel or TSG will appear on a runlist (see nvgpu.h for details). By default, all bare channels and TSGs are set to interleave level low. Userspace can then request the interleave level to be increased via the CHANNEL_SET_RUNLIST_INTERLEAVE ioctl (TSG-specific ioctl will be added later). As timeslice settings will soon be coming from userspace, the default timeslice for "high" priority channels has been restored. JIRA VFND-1302 Bug 1729664 Change-Id: I178bc1cecda23f5002fec6d791e6dcaedfa05c0c Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com> Reviewed-on: http://git-master/r/1014962 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: validate wait notification offset	Konsta Holtta	2016-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure that the notification object fits within the supplied buffer. Bug 1739182 Change-Id: Ifb66f848e3758438f37645be6f534f5b60260214 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1026431 (cherry picked from commit 2484c47f123c717030aa00253446e8756e1a0807) Reviewed-on: http://git-master/r/1030875 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: validate error notifier offset	Konsta Holtta	2016-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure that the notifier object fits within the supplied buffer. Bug 1739183 Bug 1739932 Change-Id: I713574ce797ffc23cec10b5114f469dbadc68f1e Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1026410 (cherry picked from commit f476b93eb19b962b8760457102448bd533efc54d) Reviewed-on: http://git-master/r/1028737 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: move clean up of jobs to separate worker	Deepak Nibade	2016-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently clean up the jobs in gk20a_channel_update() which is called from nvhost worker thread Instead of doing this, schedule another delayed worker thread clean_up_work to clean up the jobs (with delay of 1 jiffies) Keep update_gp_get() in channel_update() and not in delayed worker since this will help in better book keeping of gp_get Also, this scheduling will help delay job clean-up so that more number of jobs are batched for clean up and hence less time is consumed by worker Bug 1718092 Change-Id: If3b94b6aab93c92da4cf0d1c74aaba756f4cd838 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/931701 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: vgpu: add channel_set_priority support	Richard Zhao	2016-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- add gops.fifo.channel_set_priority and move current code as native callback. - implement the callback for vgpu Bug 1701079 Change-Id: If1cd13ea4478d11d578da2f682598e0c4522bcaf Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/932829 Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: move resetup_ramfc() out of sync_lock	Deepak Nibade	2016-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently have this sequence : - acquire sync_lock - sync_create - resetup_ramfc() - release sync_lock but this can lead to deadlock in case resetup_ramfc() triggers below stack : - resetup_ramfc() - channel_preempt() - preemption fails - trigger recovery - channel_abort() - acquire sync_lock Fix this by moving resetup_ramfc() out of sync_lock. resetup_ramfc() is still protected by submit_lock and hence we cannot free sync after allocation and before resetup Bug 200165811 Change-Id: Iebf74d950d6f6902b6d180c2cd8cd2d50493062c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/931726 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix tsg bugs	Richard Zhao	2016-01-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- correct runlist entry type for tsg - consider tsg when preempt channel Bug 1617046 Change-Id: Ie067df17fb53ae91c49403637a5f35fc3710e0b3 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/926571 GVS: Gerrit_Virtual_Submit Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: return ENOSPC if no private command buffer space	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we run out of gpfifo space or private command buffer space, we currently return EAGAIN as error code Instead of EAGAIN, return ENOSPC as error code so that caller (user space) can read the error code and do some re-trials As the jobs are processed, it is possible to free up some space. And hence such re-trials could succeed Bug 1715291 Change-Id: I9a2ed7134d2496b383899b3c02c0e70452b26115 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/929402 Reviewed-by: Sachin Nikam <snikam@nvidia.com> Tested-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: enable wdt for each channel open	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently enable per-channel wdt flag at channel initialization time only But if process disables channel's wdt via per-channel IOCTL, and closes the channel without re-enabling it, we leave the wdt disabled on that channel And if same channel is assigned to some other process, then that process might have wdt disabled already Fix this by setting ch->wdt_enabled = true during gk20a_open_new_channel() Bug 200165797 Change-Id: I3ab482ce7cfbcbbd2178041f01f97457ff24f7bb Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/931128 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Sachin Nikam <snikam@nvidia.com> Tested-by: Sachin Nikam <snikam@nvidia.com>
*	gpu: nvgpu: API to post channel events	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new API gk20a_channel_post_event() which adds channel event and also calls wake_up() for channel's semaphore wq Bug 200156699 Change-Id: If56f1bf8edcce79c9248809f8476ed853b7d2d9d Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/927132 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: APIs to enable/disable TSG	Deepak Nibade	2016-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	export below APIs for TSGs : gk20a_enable_tsg() - enable only TSG gk20a_disable_tsg() - disable only TSG gk20a_enable_channel_tsg() - if channel is part of TSG, enable TSG otherwise enable channel gk20a_disable_channel_tsg() - if channel is part of TSG, disable TSG otherwise disable channel Bug 200156699 Change-Id: Icdaca35235c3f323687f839fe32c6c5fe964b230 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/927131 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: stop timer on failing channel	Deepak Nibade	2016-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_timeout_handler(), below deadlock scenario is possible : thread 1: - take global lock g->ch_wdt_lock - identify timed out channel (as ch1) - check engine status which is stuck - identify failing channel on engine as ch2 - we need to trigger recovery with ch2 - as part of recovery, call channel_abort() for ch2 - in channel_abort(), we wait to cancel the timer wq - but timer wq for ch2 never completes due to thread 2 thread 2: - ch2 has already timed out - to process, we wait for global lock g->ch_wdt_lock - this lock needs to be released by thread 1 To fix this, cancel the timer (through flag) of ch2 (failing channel on engine) before triggering recovery on that channel Bug 200164753 Change-Id: Idb42d01c8440a53f43cb5e87e41f1c283f7e8fcf Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/929924 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: disable ctxsw instead of all engines activity	Deepak Nibade	2016-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_timeout_handler(), we currently disable all engine activity before checking for fence completion and before we identify timed out channel But disabling all engine activity could be overkill for this process. Also, as part of disabling engine activity we preempt the channel on engine. But it is possible that channel preemption times out since channel has already timed out And this can lead to races and deadlock Hence, instead of disabling all engine activity, just disable the context switch which should also do the same trick Bug 1716062 Change-Id: I596515ed670a2e134f7bcd9758488a4aa0bf16f7 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/929421 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: add high priority channel interleave	Peter Pipkorn	2016-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Interleave all high priority channels between all other channels. This reduces the latency for high priority work when there are a lot of lower priority work present, imposing an upper bound on the latency. Change the default high priority timeslice from 5.2ms to 3.0 in the process, to prevent long running high priority apps from hogging the GPU too much. Introduce a new debugfs node to enable/disable high priority channel interleaving. It is currently enabled by default. Adds new runlist length max register, used for allocating suitable sized runlist. Limit the number of interleaved channels to 32. This change reduces the maximum time a lower priority job is running (one timeslice) before we check that high priority jobs are running. Tested with gles2_context_priority (still passes) Basic sanity testing is done with graphics_submit (one app is high priority) Also more functional testing using lots of parallel runs with: NVRM_GPU_CHANNEL_PRIORITY=3 ./gles2_expensive_draw –drawsperframe 20000 –triangles 50 –runtime 30 –finish plus multiple: NVRM_GPU_CHANNEL_PRIORITY=2 ./gles2_expensive_draw –drawsperframe 20000 –triangles 50 –runtime 30 -finish Previous to this change, the relative performance between high priority work and normal priority work comes down to timeslice value. This means that when there are many low priority channels, the high priority work will still drop quite a lot. But with this change, the high priority work will roughly get about half the entire GPU time, meaning that after the initial lower performance, it is less likely to get lower in performance due to more apps running on the system. This change makes a large step towards real priority levels. It is not perfect and there are no guarantees on anything, but it is a step forwards without any additional CPU overhead or other complications. It will also serve as a baseline to judge other algorithms against. Support for priorities with TSG is future work. Support for interleave mid + high priority channels, instead of just high, is also future work. Bug 1419900 Change-Id: I0f7d0ce83b6598fe86000577d72e14d312fdad98 Signed-off-by: Peter Pipkorn <ppipkorn@nvidia.com> Reviewed-on: http://git-master/r/805961 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: enable semaphore acquire timeout	Richard Zhao	2016-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It'll detect dead semaphore acquire. The worst case is when ACQUIRE_SWITCH is disabled, semaphore acquire will poll and consume full gpu timeslicees. The timeout value is set to half of channel WDT. Bug 1636800 Change-Id: Ida6ccc534006a191513edf47e7b82d4b5b758684 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/928827 GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: preempt before adjusting fences	Deepak Nibade	2015-12-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current sequence in gk20a_disable_channel() is - disable channel in gk20a_channel_abort() - adjust pending fence in gk20a_channel_abort() - preempt channel But this leads to scenarios where syncpoint has min > max value Hence to fix this, make sequence in gk20a_disable_channel() - disable channel in gk20a_channel_abort() - preempt channel in gk20a_channel_abort() - adjust pending fence in gk20a_channel_abort() If gk20a_channel_abort() is called from other API where preemption is not needed, then use channel_preempt flag and do not preempt channel in those cases Bug 1683059 Change-Id: I4d46d4294cf8597ae5f05f79dfe1b95c4187f2e3 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/921290 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: call channel_update() always in channel_abort()	Deepak Nibade	2015-12-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_channel_abort(), we disable the channel and adjust the fences. And then we call gk20a_channel_update() only if we released the post-fence semaphore Instead of this, call gk20a_channel_update() always to ensure that all the clean up after fence completion is done always Bug 1683059 Change-Id: Ife00ba43e808c0833264d79c98c74417ffadf802 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/842999 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Immediate channel release	Terje Bergstrom	2015-12-10
\| \| \| \| \| \| \| \| \| \| \| \|	When closing channel, disable and preempt it immediately instead of waiting for it to finish all work. Bug 1683059 Change-Id: Ia5f5fc6a072dc3ddb1e9bf63534814ff0a60b5b4 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/836746
*	gpu: nvgpu: create sync_fence only if needed	Deepak Nibade	2015-12-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we create sync_fence (from nvhost_sync_create_fence()) for every submit But not all submits request for a sync_fence. Also, nvhost_sync_create_fence() API takes about 1/3rd of the total submit path. Hence to optimize, we can allocate sync_fence only when user explicitly asks for it using (NVGPU_SUBMIT_GPFIFO_FLAGS_FENCE_GET && NVGPU_SUBMIT_GPFIFO_FLAGS_SYNC_FENCE) Also, in CDE path from gk20a_prepare_compressible_read(), we reuse existing fence stored in "state" and that can result into not returning sync_fence_fd when user asked for it Hence, force allocation of sync_fence when job submission comes from CDE path Bug 200141116 Change-Id: Ia921701bf0e2432d6b8a5e8b7d91160e7f52db1e Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/812845 (cherry picked from commit 5fd47015eeed00352cc8473eff969a66c94fee98) Reviewed-on: http://git-master/r/837662 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Sachin Nikam <snikam@nvidia.com>