| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
Fix small problems related to signed versus unsigned comparisons
throughout the driver. Bump up the warning level to prevent such
problems from occuring in future.
Change-Id: I8ff5efb419f664e8a2aedadd6515ae4d18502ae0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1252068
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When buffer's IOVA is zero, treat that as error condition instead of
ignoring and continuing.
Change-Id: I2ede9921945645f526b0600f61f7e5ed19af6d73
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1249963
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In CDE GPU CONFIGURATION the result is computed using 32-bit
arithmetic and returned as 64-bit unsigned integer. Cast intermediate
result to u64 to prevent unintentional overflow.
Change-Id: Iebe53e2b17c1aaa498245a52962c3dbad7ce893e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1249962
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for pre-allocation of job tracking resources
w/ new (extended) ioctl. Goal is to avoid dynamic memory
allocation in the submit path. This patch does the following:
1) Intoduces a new ioctl, NVGPU_IOCTL_CHANNEL_ALLOC_GPFIFO_EX,
which enables pre-allocation of tracking resources per job:
a) 2x priv_cmd_entry
b) 2x gk20a_fence
2) Implements circular ring buffer for job
tracking to avoid lock contention between producer
(submitter) and consumer (clean-up)
Bug 1795076
Change-Id: I6b52e5c575871107ff380f9a5790f440a6969347
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1203300
(cherry picked from commit 9fd270c22b860935dffe244753dabd87454bef39)
Reviewed-on: http://git-master/r/1223934
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Suppress error message when nvgpu tries to load VBIOS overlay, but
one is not found. This situation is normal. This is done by moving
gk20a_request_firmware() to be nvgpu generic function
nvgpu_request_firmware(), and adding a NO_WARN flag to it.
Introduce also a NO_SOC flag to suppress attempt to load firmware
from SoC specific directory in addition to the chip specific
directory. Use it for dGPU firmware files.
Bug 200236777
Change-Id: I0294d3308f029a6a6d3c2effa579d5f69a91e418
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1223840
(cherry picked from commit cca44c3f010f15918cdd2259c15170ba1917828a)
Reviewed-on: http://git-master/r/1233353
GVS: Gerrit_Virtual_Submit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have completely different versions of probe for
nvgpu and pci device
Extract out common steps into nvgpu_probe() function
and separate it out in new file nvgpu_common.c
Divide task of nvgpu_probe() into further smaller
functions
Do platform specific things (like irq handling,
memresource management, power management) only in
individual probes and then call nvgpu_probe() to
complete the common initialization
Move all debugfs initialization to common gk20a_debug_init()
This also helps to bringup all debug nodes to pci device
Pass debugfs_symlink name as a parameter to gk20a_debug_init()
This allows us to set separate debugfs symlink for nvgpu
and pci device
In case of railgating, cde and ce debugfs, check if
platform supports them or not
Copy vidmem_is_vidmem from platform to mm structure
and set it to true for pci device
Return from gk20a_scale_init() if we don't have either of
governor or qos_notifier
Fix gk20a_alloc_debugfs_init() and gk20a_secure_page_alloc()
to receive device pointer instead of platform_device
Export gk20a_railgating_debugfs_init() so that we can call
it from gk20a_debug_init()
Jira DNVGPU-56
Jira DNVGPU-58
Change-Id: I3cc048082b0a1e57415a9fb8bfb9eec0f0a280cd
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1204207
(cherry picked from commit add6bb0a3d5bd98131bbe6f62d4358d4d722b0fe)
Reviewed-on: http://git-master/r/1204462
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is done to boost performance of the GPU submit time, which
is critical for compute use-cases.
Bug 200215465
Bug 1804898
Conflicts:
drivers/gpu/nvgpu/gk20a/channel_gk20a.c
Change-Id: Ic4884ee4eac910b92b84a47fdc1b2e9f26b2f1f0
Signed-off-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-on: http://git-master/r/1199860
Reviewed-on: http://git-master/r/1209834
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added interface to allow kernel to create privileged CE channels for
page migration and clearing support between sysmem and videmem.
JIRA DNVGPU-53
Change-Id: I3e18d18403809c9e64fa45d40b6c4e3844992506
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1173085
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For devices that have vidmem available, use the vidmem allocator in
gk20a_gmmu_alloc{,attr,_map,_map_attr}. For others, use sysmem.
Because all of the buffers haven't been tested to work in vidmem yet,
rename calls to gk20a_gmmu_alloc{,attr,_map,_map_attr} to have _sys at
the end to declare explicitly that vidmem is used. Enabling vidmem for
each now is a matter of removing "_sys" from the function call.
Jira DNVGPU-18
Change-Id: Ibe42f67eff2c2b68c36582e978ace419dc815dc5
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1176805
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Propagate the buffer aperture flag in gk20a_locked_gmmu_map up so that
buffers represented as a mem_desc and present in vidmem can be mapped to
gpu.
JIRA DNVGPU-18
JIRA DNVGPU-76
Change-Id: I46cf87e27229123016727339b9349d5e2c835b3e
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1169308
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
| |
Use struct device instead of struct platform_device wherever
possible. This allows adding other bus types later.
Change-Id: I1657287a68d85a542cdbdd8a00d1902c3d6e00ed
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1120466
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we create sync_fence (from nvhost_sync_create_fence())
for every submit
But not all submits request for a sync_fence.
Also, nvhost_sync_create_fence() API takes about 1/3rd of the total
submit path.
Hence to optimize, we can allocate sync_fence
only when user explicitly asks for it using
(NVGPU_SUBMIT_GPFIFO_FLAGS_FENCE_GET &&
NVGPU_SUBMIT_GPFIFO_FLAGS_SYNC_FENCE)
Also, in CDE path from gk20a_prepare_compressible_read(),
we reuse existing fence stored in "state" and that can
result into not returning sync_fence_fd when user asked
for it
Hence, force allocation of sync_fence when job submission
comes from CDE path
Bug 200141116
Change-Id: Ia921701bf0e2432d6b8a5e8b7d91160e7f52db1e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/812845
(cherry picked from commit 5fd47015eeed00352cc8473eff969a66c94fee98)
Reviewed-on: http://git-master/r/837662
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In GPU job submit path gk20a_ioctl_channel_submit_gpfifo(),
we currently allocate a temporary gpfifo, copy user space
gpfifo content into this temporary buffer, and then copy
temp buffer content into channel's gpfifo.
Allocation/copy/free of temporary buffer adds additional
overhead
Rewrite this sequence such that gk20a_submit_channel_gpfifo()
can receive either a pre-filled gpfifo or pointer to
user provided args.
And then we can direclty copy the user provided gpfifo
into the channel's gpfifo
Also, if command buffer tracing is enabled, we still need
to copy user provided gpfifo into temporaty buffer for reading
But that should not cause overhead in real world use case
Bug 200141116
Change-Id: I7166c9271da2694059da9853ab8839e98457b941
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/823386
(cherry picked from commit 3e0702db006c262dd8737a567b8e06f7ff005e2c)
Reviewed-on: http://git-master/r/835799
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocate a separate VM for CDE channels instead of using the system
(PMU) vm, and make it much bigger than the PMU's to fit the maximum
number of CDE channels there.
Bug 1566740
Change-Id: I4f487c40c9ec79cc9ffb880b0ecd3f47eb450336
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/815149
Reviewed-by: Automatic_Commit_Validation_User
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add CPU dcache flush after populating scatterBuffer so that the GPU
will see the buffer contents.
Bug 1679453
Change-Id: I564394ed1fcff4d08d753e753bd3243b460d76df
Signed-off-by: Jussi Rasanen <jrasanen@nvidia.com>
Reviewed-on: http://git-master/r/805197
(cherry picked from commit d6a5513745aa77c84ac5408a62f72f24839ef439)
Reviewed-on: http://git-master/r/808246
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of CDE channel, T1 (Tex) unit needs to be promoted to 128B
aligned, otherwise causes a HW deadlock. Gpu driver makes changes in
FECS header which FECS uses to configure the T1 promotions to aligned
128B accesses.
Bug 200096226
Change-Id: I8a8deaf6fb91f4bbceacd491db7eb6f7bca5001b
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/804625
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for CDE scatter buffers. When the bus addresses for
surfaces are not contiguous as seen by the GPU (e.g., when SMMU is
bypassed), CDE swizzling needs additional per-page information. This
information is populated in a scatter buffer when required.
Bug 1604102
Change-Id: I3384e2cfb5d5f628ed0f21375bdac8e36b77ae4f
Signed-off-by: Jussi Rasanen <jrasanen@nvidia.com>
Reviewed-on: http://git-master/r/789436
Reviewed-on: http://git-master/r/791243
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
| |
This reverts commit 882975f7f1b4e050be79b0a047a2daa8b53a9187.
Change-Id: I4940fc9f7a837840be1ea8e42d58d603235d88d5
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/804616
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of CDE channel, T1 (Tex) unit needs to be promoted to 128B
aligned, otherwise causes a HW deadlock. Gpu driver makes changes in
FECS header which FECS uses to configure the T1 promotions to aligned
128B accesses.
Bug 200096226
Change-Id: Ic006b2c7035bbeabe1081aeed968a6c6d11f9995
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/802327
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add gpu_ops for CDE, and add get_program_numbers function pointer for
determining horizontal and vertical CDE swizzler programs. This allows
different GPUs to have their own specific requirements for choosing
the CDE firmware programs.
Bug 1604102
Change-Id: Ib37c13abb017c8eb1c32adc8cbc6b5984488222e
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/784899
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In gk20a_cde_remove_ctx(), current sequence is as below
- gk20a_channel_close()
- gk20a_deinit_cde_img()
- gk20a_free_obj_ctx()
But gk20a_free_obj_ctx() needs reference to channel and hence
below crash is seen :
[ 3901.466223] Unable to handle kernel paging request at virtual address
00001624
...
[ 3901.535218] PC is at gk20a_free_obj_ctx+0x14/0xb0
[ 3901.539910] LR is at gk20a_deinit_cde_img+0xd8/0x12c
Fix this by closing the channel after gk20a_deinit_cde_img()
Bug 1625901
Change-Id: Ic2dc5af933b6d6ef8982c2b9f0caa28df204051f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/770322
GVS: Gerrit_Virtual_Submit
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
|
|
|
|
|
|
|
|
| |
Implement support for privileged pages. Use them for kernel allocated buffers.
Change-Id: I720fc441008077b8e2ed218a7a685b8aab2258f0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/761919
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add batch support for mapping and unmapping. Batching essentially
helps transform some per-map/unmap overhead to per-batch overhead,
namely gk20a_busy()/gk20a_idle() calls, GPU L2 flushes, and GPU TLB
invalidates. Batching with size 64 has been measured to yield >20x
speed-up in low-level fixed-address mapping microbenchmarks.
Bug 1614735
Bug 1623949
Change-Id: Ie22b9caea5a7c3fc68a968d1b7f8488dfce72085
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/733231
(cherry picked from commit de4a7cfb93e8228a4a0c6a2815755a8df4531c91)
Reviewed-on: http://git-master/r/763812
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add reference counting for channels, and wait for reference count to
get to 0 in gk20a_channel_free() before actually freeing the channel.
Also, change free channel tracking a bit by employing a list of free
channels, which simplifies the procedure of finding available channels
with reference counting.
Each use of a channel must have a reference taken before use or held
by the caller. Taking a reference of a wild channel pointer may fail, if
the channel is either not opened or in a process of being closed. Also,
add safeguards for protecting accidental use of closed channels,
specifically, by setting ch->g = NULL in channel free. This will make it
obvious if freed channel is attempted to be used.
The last user of a channel might be the deferred interrupt handler,
so wait for deferred interrupts to be processed twice in the channel
free procedure: once for providing last notifications to the channel
and once to make sure there are no stale pointers left after referencing
to the channel has been denied.
Finally, fix some races in channel and TSG force reset IOCTL path,
by pausing the channel scheduler in gk20a_fifo_recover_ch() and
gk20a_fifo_recover_tsg(), while the affected engines have been identified,
the appropriate MMU faults triggered, and the MMU faults handled. In this
case, make sure that the MMU fault does not attempt to query the hardware
about the failing channel or TSG ids. This should make channel recovery
more safe also in the regular (i.e., not in the interrupt handler) context.
Bug 1530226
Bug 1597493
Bug 1625901
Bug 200076344
Bug 200071810
Change-Id: Ib274876908e18219c64ea41e50ca443df81d957b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/448463
(cherry picked from commit 3f03aeae64ef2af4829e06f5f63062e8ebd21353)
Reviewed-on: http://git-master/r/755147
Reviewed-by: Automatic_Commit_Validation_User
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reduce amount of duplicate code around memory allocation by using
common helpers, and common data structure for storing results of
allocations.
Bug 1605769
Change-Id: I7c1662b669ed8c86465254f6001e536141051ee5
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/720435
|
|
|
|
|
|
|
|
|
|
| |
Introduce mem_desc, which holds all information needed for a buffer.
Implement helper functions for allocation and freeing that use this
data type.
Change-Id: I82c88595d058d4fb8c5c5fbf19d13269e48e422f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/712699
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 9968badd26490a9d399f526fc57a9defd161dd6c. The commit
accidentally introduced some memory leaks.
Change-Id: I00d8d4452a152a8a2fe2d90fb949cdfee0de4c69
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/714288
Reviewed-by: Juha Tukkinen <jtukkinen@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-CDE firmware v0 is not used anymore so we can remove support for
it.
-Bump the threshold for a large surface warning to 8k.
Bug 1566740
Change-Id: Ia0434a04cdd453a10a8de08d259e92e6b9a3e964
Signed-off-by: Jussi Rasanen <jrasanen@nvidia.com>
Reviewed-on: http://git-master/r/709452
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
don't unmap compbits_buf explicitly from system vm early but store it in
the dmabuf's private data, and unmap it later when all user mappings to
that buffer have been disappeared.
Bug 1546619
Change-Id: I333235a0ea74c48503608afac31f5e9f1eb4b99b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/661949
(cherry picked from commit ed2177e25d9e5facfb38786b818330798a14b9bb)
Reviewed-on: http://git-master/r/661835
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Add TYPE_PARAM_GOBS_PER_COMPTAGLINE_PER_SLICE.
Change-Id: I7cbf7b6db6642a61629ba06f7887bd58af3dc28f
Signed-off-by: Jussi Rasanen <jrasanen@nvidia.com>
Reviewed-on: http://git-master/r/673152
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
For compute channel on gk20a, set lockboost size to zero.
Bug 1573856
Change-Id: I369cebf72241e4017e7d380c82caff6014e42984
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/594843
GVS: Gerrit_Virtual_Submit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Gpu channels may get spurious updates from at least nonstalling
semaphore wait interrupts. Protect data structure sanity by ignoring
releases on already released (= not in use) cde contexts.
Bug 200062826
Change-Id: I5940a7557e902bcfcff1a7e8e4593472d9ac306c
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/666235
(cherry picked from commit 47dc2f41eb8054b099b6eb9a4a7d82c97295d415)
Reviewed-on: http://git-master/r/666657
GVS: Gerrit_Virtual_Submit
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Channel update callback for a channel that has no more cde jobs signals
that a cde context is free. Spurious channel updates may still happen
from at least nonstalling semaphore wait interrupts. Instead of scary
WARNs, use only gk20a_dbg_info() for info prints in these harmless
situations, and double check that only the first update starts a deleter
work for temporary contexts.
Change-Id: I68de8f35e2c366206c6efac3ee97025239e8bba2
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
(cherry-picked from commit f56a941b4962c5479291cae48e2abca6067e3f13)
Reviewed-on: http://git-master/r/660849
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
obj_id from gk20a_alloc_obj_ctx is not used and calling free_obj_ctx is
effectively a no-op, since the corresponding channel is also freed.
Bug 200059216
Change-Id: Icbe2cf5dc21d50cb007bf73829705451ada106ac
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/655368
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
| |
Bug 200046882
Change-Id: I515e972f84cb7e1b17eef42ade6a4eaf0f8d71f8
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/559332
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Wait for possible temp context deletion to finish properly before
passing contexts around later, to prevent situations where the context
deleter scheduling would have been completed, but running it would not,
and a new one could have been scheduled again. When finished, schedule
the deleter before freeing the context back to use to prevent races.
Warn in impossible situations when these double deletions would happen.
Bug 200054186
Bug 200052943
Change-Id: I23ca0d1081eea77d0e453b9038adc914909b5f48
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/603439
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CDE context needs to be initialized in the first run using a separate
initialization gpfifo before the actual conversion. To prevent a race
condition, include both of them in a single gpfifo whenever the
initialization is performed.
Bug 200052943
Change-Id: I7eb09a906c0374825df71eba969e4596b94e5ff2
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/602888
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trace cde context allocation and deallocation with ftrace.
Bug 200052943
Change-Id: Ieeb625166662971fb3eb3fb29c986fdb6809c10b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/602886
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add WARN to conditions that should never happen, to help debugging
any context issues.
Bug 200052943
Change-Id: Ibe2a9507f3a62bb7b2e263ff3ff21a24a092a971
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/602885
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an upper limit for cde contexts, and wait for a while if a new
context is queried and the limit has been exceeded. This happens only
under very high load. If the timeout is exceeded, report -EAGAIN.
Change-Id: I1fa47ad6cddf620eae00cea16ecea36cf4151cab
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/601719
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Create debugfs nodes for ctx_count, ctx_usecount and ctx_cont_top.
Change-Id: I1360853b2650d37a96c8adf76368d48d9b457909
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/602860
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rescheduling the temp context deleter when it is not immediately
possible is not necessary, and complicates things. Don't do it.
The context would anyway be used later when its time comes in the free
list, and the deletion would then be retried.
This simplifies canceling the works when shutting down or going into
suspend, since re-canceling the possibly rescheduled work is not needed.
Releasing the app mutex is still necessary when deleting the whole cde.
Bug 200052943
Change-Id: I06afe1766097a78d7bcb93f3140855799ac903ca
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/601035
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cancel the temporary context deleter work when acquiring a channel for
use, to prevent re-scheduling when the same context would be quickly
re-used and finished twice or more in a row before deletion.
Change-Id: Iadd8230d9462adc451e506152a24c50a920a59e3
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/600273
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
use a correct, negative error sign in ENOMEM when gk20a_gmmu_map runs
out of memory.
Change-Id: I4fa8a2cf359a5c98cebdf64d4e3fcc96f478f779
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/594397
Reviewed-by: Jussi Rasanen <jrasanen@nvidia.com>
Tested-by: Jussi Rasanen <jrasanen@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using CDE firmware v1, combine H and V swizzling passes into one
pushbuffer submission. This removes one GPU context switch, almost
halving the time taken for swizzling.
Map only the compbit part of the destination surface.
Bug 1546619
Change-Id: I95ed4e4c2eefd6d24a58854d31929cdb91ff556b
Signed-off-by: Jussi Rasanen <jrasanen@nvidia.com>
Reviewed-on: http://git-master/r/553234
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a channel timeout occurs, reload only the particular context/channel
where the timeout occurred, instead of destroying whole cde. Reloading
happens by allocating a replacement context and marking the offending
channel as soon-to-be-deleted.
Clean up the code by using two separate lists for free and used
contexts. Rename channel deallocation/allocation functions to better
describe what they do, and annotate the functions that need locking.
Also do not wait for channel idle before submitting, since the acquired
context has a ready channel already.
Bug 200046882
Change-Id: I4155a85ea0ed79e284309eb2ad0042df3938f1e2
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/591235
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix below sparse warnings :
warning: Using plain integer as NULL pointer
warning: symbol <variable/funcion> was not declared. Should it be static?
warning: Initializer entry defined twice
Also, remove dead functions
Bug 1573254
Change-Id: I29d71ecc01c841233cf6b26c9088ca8874773469
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/593363
Reviewed-by: Amit Sharma (SW-TEGRA) <amisharma@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During gpu suspend, cancel all pending delayed cde work
to avoid issues of scheduling this delayed work
during suspend/resume when gpu is not ready.
Bug 1574000
Change-Id: I2b6bfa489435a781dc576a077f9af01b1e1628ce
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/593557
Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Prashant Gaikwad <pgaikwad@nvidia.com>
Tested-by: Prashant Gaikwad <pgaikwad@nvidia.com>
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of current preallocated array plus dynamically allocated
temporary contexts, use a linked list in LRU fashion, always storing
free contexts at the beginning of the list. Initialize the preallocated
contexts to the list and store dynamically allocated temporaries there
too for quick reuse as needed, with a delayed scheduled work for
deleting temporaries when the high load has diminished.
Bug 200040211
Change-Id: Ibc75a0150109ec9c44b2eeb74607450990584b18
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/562856
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 41b82e97164138f45fbdaef6ab6939d82ca9419e.
Change-Id: Iabd01fcb124e0d22cd9be62151a6552cbb27fc94
Signed-off-by: Sam Payne <spayne@nvidia.com>
Reviewed-on: http://git-master/r/592221
Tested-by: Hoang Pham <hopham@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Mitch Luban <mluban@nvidia.com>
|