nvgpu.git - Tegra GPU Driver. Originally from nv-tegra.nvidia.com/linux-nvgpu.git.

	Commit message (Collapse)	Author	Age
*	gpu: nvgpu: add mm ops for mmu_fault_pending	Seema Khowala	2017-03-22
\| \| \| \| \| \| \| \| \| \|	This change is needed for t19x mmu fault handling. Change-Id: I7f9190ab305f699401f6b0033b6a93dd8b4fc3cd Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: http://git-master/r/1315201 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: use dma-attr wrappers for K4.9 compatibility	Konsta Holtta	2017-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On kernel 4.9, the DMA API has changed, so use the NVIDIA compatibility wrappers from dma-attrs.h to allow the code to build for both 4.4 and 4.9. Bug 1853519 Change-Id: I0196936e81c7f72b41b38a67f42af0dc0b5518df Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1321102 Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: abstract away dma alloc attrs	Konsta Holtta	2017-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't use enum dma_attr in the gk20a_gmmu_alloc_attr* functions, but define nvgpu-internal flags for no kernel mapping, force contiguous, and read only modes. Store the flags in the allocated struct mem_desc and only use gk20a_gmmu_free, remove gk20a_gmmu_free_attr. This helps in OS abstraction. Rename the notion of attr to flags. Add implicit NVGPU_DMA_NO_KERNEL_MAPPING to all vidmem buffers allocated via gk20a_gmmu_alloc_vid for consistency. Fix a bug in gk20a_gmmu_alloc_map_attr that dropped the attr parameter accidentally. Bug 1853519 Change-Id: I1ff67dff9fc425457ae445ce4976a780eb4dcc9f Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1321101 Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Move all FB programming to FB HAL	Terje Bergstrom	2017-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move all programming of FB to fb_*.c files, and remove the inclusion of FB hardware headers from other files. TLB invalidate function took previously a pointer to VM, but the new API takes only a PDB mem_desc, because FB does not need to know about higher level VM. GPC MMU is programmed from the same function as FB MMU, so added dependency to GR hardware header to FB. GP106 ACR was also triggering a VPR fetch, but that's not applicable to dGPU, so removed that call. Change-Id: I4eb69377ac3745da205907626cf60948b7c5392a Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1321516 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: cancel vidmem worker only if supported	Konsta Holtta	2017-03-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cancel the vidmem.clear_mem_worker during suspend only if vidmem is enabled via kernel config. Otherwise it's not initialized. Bug 1853519 Change-Id: If88c756ae14f348eddda01218fa218480217388c Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1321118 Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: David Martinez Nieto <dmartineznie@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Remove unused function gk20a_get_phys_from_iova	Terje Bergstrom	2017-03-14
\| \| \| \| \| \| \| \| \| \| \| \|	Remove unused function gk20a_get_phys_from_iova. At the same time remove the #include for iommu.h, which was only needed by this function. Change-Id: Ia858b0ad5fe7e423d650aa9f82e430f419f2a492 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1319070 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Do not use pm_runtime calls directly	Terje Bergstrom	2017-03-14
\| \| \| \| \| \| \| \| \| \| \| \|	mm_gk20a.c had direct calls to pm_runtime_put_noidle(). Replace them with calls to wrapper gk20a_idle_nosuspend() to prevent unnecessary dependencies to Linux. Change-Id: Iaf8b9255750be2f3e1aa39587c1a4a3cbeacc67f Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1319069 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: check return value of mutex_init for semaphores	Deepak Nibade	2017-03-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- check return value of nvgpu_mutex_init for semaphores - add corresponding nvgpu_mutex_destroy calls Jira NVGPU-13 Change-Id: I5404dbd29e3fce29f1a445eb2e6ce8e1d1b616c4 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1317138 Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: Navneet Kumar <navneetk@nvidia.com>
*	gpu: nvgpu: kmem abstraction and tracking	Alex Waterman	2017-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement kmem abstraction and tracking in nvgpu. The abstraction helps move nvgpu's core code away from being Linux dependent and allows kmem allocation tracking to be done for Linux and any other OS supported by nvgpu. Bug 1799159 Bug 1823380 Change-Id: Ieaae4ca1bbd1d4db4a1546616ab8b9fc53a4079d Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1283828 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Give nvgpu_kalloc a less generic name	Alex Waterman	2017-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change nvgpu_kalloc() to nvgpu_big_[mz]alloc(). This is necessary since the natural free function name for this is nvgpu_kfree() but that conflicts with nvgpu_k[mz]alloc() (implemented in a subsequent patch). This API exists becasue not all allocation sizes can be determined at compile time and in some cases sizes may vary across the system page size. Thus always using kmalloc() could lead to OOM errors due to fragmentation. But always using vmalloc() is wastful of memory for small allocations. This API tries to alleviate those problems. Bug 1799159 Bug 1823380 Change-Id: I49ec5292ce13bcdecf112afbb4a0cfffeeb5ecfc Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1283827 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: optimize duplicate buffer lookup in case of fixed offsets	Deepak Nibade	2017-03-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gk20a_vm_map_duplicate_locked(), we always do a linear search in rb-tree to find a duplicate entry of the buffer In case NVGPU_AS_MAP_BUFFER_FLAGS_FIXED_OFFSET is set, we first traverse whole rb-tree linearly and then compare offset_align with the address searched from rb-tree If size of rb-tree is very large this linear lookup takes upto 7mS and causes huge delays Hence in case of NVGPU_AS_MAP_BUFFER_FLAGS_FIXED_OFFSET, we can use offset_align to perform a binary search on rb-tree and then verify that dmabuf and kind match with the node obtained from the search This saves a lot of time per-lookup Bug 1874516 Change-Id: Ia4924b64d66e586c14341ae2e2283beac394bf6f Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1309343 Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: remove use of DEFINE_MUTEX()	Deepak Nibade	2017-02-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	API DEFINE_MUTEX() is defined in Linux and might not be available in other OSs. Hence remove its usage from nvgpu Declare and explicitly initialize below mutexes for both nvgpu and vgpu g->mm.priv_lock g->mm.tlb_lock Jira NVGPU-13 Change-Id: If72885a6da0227a1552303206172f1f2b751471d Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1298042 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: use common nvgpu mutex/spinlock APIs	Deepak Nibade	2017-02-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of using Linux APIs for mutex and spinlocks directly, use new APIs defined in <nvgpu/lock.h> Replace Linux specific mutex/spinlock declaration, init, lock, unlock APIs with new APIs e.g struct mutex is replaced by struct nvgpu_mutex and mutex_lock() is replaced by nvgpu_mutex_acquire() And also include <nvgpu/lock.h> instead of including <linux/mutex.h> and <linux/spinlock.h> Add explicit nvgpu/lock.h includes to below files to fix complilation failures. gk20a/platform_gk20a.h include/nvgpu/allocator.h Jira NVGPU-13 Change-Id: I81a05d21ecdbd90c2076a9f0aefd0e40b215bd33 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1293187 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Move from gk20a_ to nvgpu_ in semaphore code	Alex Waterman	2017-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \|	Change the prefix in the semaphore code to 'nvgpu_' since this code is global to all chips. Bug 1799159 Change-Id: Ic1f3e13428882019e5d1f547acfe95271cc10da5 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1284628 Reviewed-by: Varun Colbert <vcolbert@nvidia.com> Tested-by: Varun Colbert <vcolbert@nvidia.com>
*	gpu: nvgpu: Organize semaphore_gk20a.[ch]	Alex Waterman	2017-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move semaphore_gk20a.c drivers/gpu/nvgpu/common/ since the semaphore code is common to all chips. Move the semaphore_gk20a.h header file to drivers/gpu/nvgpu/include/nvgpu and rename it to semaphore.h. Also update all places where the header is inluced to use the new path. This revealed an odd location for the enum gk20a_mem_rw_flag. This should be in the mm headers. As a result many places that did not need anything semaphore related had to include the semaphore header file. Fixing this oddity allowed the semaphore include to be removed from many C files that did not need it. Bug 1799159 Change-Id: Ie017219acf34c4c481747323b9f3ac33e76e064c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1284627 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Simplify ref-counting on VMs	Alex Waterman	2017-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify ref-counting on VMs: take a ref when a VM is bound to a channel and drop a ref when a channel is freed. Previously ref-counts were scattered over the driver. Also the CE and CDE code would bind channels with custom rolled code. This was because the gk20a_vm_bind_channel() function took an as_share as the VM argument (the VM was then inferred from that as_share). However, it is trivial to abtract that bit out and allow a central bind channel function that just takes a VM and a channel. Bug 1846718 Change-Id: I156aab259f6c7a2fa338408c6c4a3a464cd44a0c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1261886 Reviewed-by: Richard Zhao <rizhao@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Conditional address space unification	Alex Waterman	2017-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow platforms to choose whether or not to have unified GPU VA spaces. This is useful for the dGPU where having a unified address space has no problems. On iGPUs testing issues is getting in the way of enabling this feature. Bug 1396644 Bug 1729947 Change-Id: I65985f1f9a818f4b06219715cc09619911e4824b Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1265303 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Remove separate fixed address VMA	Alex Waterman	2017-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the special VMA that could be used for allocating fixed addresses. This feature was never used and is not worth maintaining. Bug 1396644 Bug 1729947 Change-Id: I06f92caa01623535516935acc03ce38dbdb0e318 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1265302 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Cleanup gk20a_init_vm()	Alex Waterman	2017-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cleanup and simplify the gk20a_init_vm() function to ease the implementation of a platform dependent address space unification decision. Bug 1396644 Bug 1729947 Change-Id: Id8487d0e3d3c65e3357e3528063fb17c8a85f7da Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1265301 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Unify the small and large page address spaces	Alex Waterman	2017-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The basic structure of this patch is to make the small page allocator and the large page allocator into pointers (where they used to be just structs). Then assign each of those pointers to the same actual allocator since the buddy allocator has supported mixed page sizes since its inception. For the rest of the driver some changes had to be made in order to actually support mixed pages in a single address space. 1. Unifying the allocation page size determination Since the allocation and map operations happen at distinct times both mapping and allocation of GVA space must agree on page size. This is because the allocation has to separate allocations into separate PDEs to avoid the necessity of supporting mixed PDEs. To this end a function __get_pte_size() was introduced which is used both by the balloc code and the core GPU MM code. It determines page size based only on the length of the mapping/ allocation. 2. Fixed address allocation + page size Similar to regular mappings/GVA allocations fixed address mapping page size determination had to be modified. In the past the address of the mapping determined page size since the address space split was by address (low addresses were small pages, high addresses large pages). Since that is no longer the case the page size field in the reserve memory ioctl is now honored by the mapping code. When, for instance, CUDA makes a memory reservation it specifies small or large pages. When CUDA requests mappings to be made within that address range the page size is then looked up in the reserved memory struct. Fixed address reservations were also modified to now always allocate at a PDE granularity (64M or 128M depending on large page size. This prevents non-fixed allocations from ending up in the same PDE and causing kernel panics or GMMU faults. 3. The rest... The rest of the changes are just by products of the above. Lots of places required minor updates to use a pointer to the GVA allocator struct instead of the struct itself. Lastly, this change is not truly complete. More work remains to be done in order to fully remove the notion that there was such a thing as separate address spaces for different page sizes. Basically after this patch what remains is cleanup and proper documentation. Bug 1396644 Bug 1729947 Change-Id: If51ab396a37ba16c69e434adb47edeef083dce57 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1265300 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: use map_offset for PTE size computation	Alex Waterman	2017-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure that map_offset is set to the fixed map address or 0) before determining PTE size. Then use map_offset instead of offset_align for computing the PTE size since offset_align could be either an alignment ora fixed mapping offset. Also is the minimum of the buffer size and the buffer alignment for computing page size. This is necessary is the GMMU is doing page gathering (i.e the buffer does not appear as a continguous IOMMU range to the GPU). Is such cases a large page sized buffer may be made up of a bunch of discontiguous 4k pages. Bug 1396644 Bug 1729947 Change-Id: I6464ee6a4ccab2495ccb31cd1ddf1db467d2b215 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1271359 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: use soc/tegra/chip-id.h for soc header	Shardar Shariff Md	2017-01-20
\| \| \| \| \| \| \| \| \| \| \| \|	The soc tegra headers are unified and moved all the content of linux/tegra-soc.h to the soc/tegra/chip-id.h to have the single soc header for Tegra. Change-Id: I281e19dd3eb1538b8dfbea4eb0779fb64d1fcffa Signed-off-by: Shardar Shariff Md <smohammed@nvidia.com> Reviewed-on: http://git-master/r/1288365 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Use timer API in gk20a code	Alex Waterman	2017-01-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the timers API in the gk20a code instead of Linux specific API calls. This also changes the behavior of several functions to wait for the full timeout for each operation that can timeout. Previously the timeout was shared across each operation. Bug 1799159 Change-Id: I2bbed54630667b2b879b56a63a853266afc1e5d8 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1273826 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Start re-organizing the HW headers	Alex Waterman	2017-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reorganize the HW headers of gk20a. The headers are moved to a new directory: include/nvgpu/hw/gk20a And from the code are included like so: #include <nvgpu/hw/gk20a/hw_pwr_gk20a.h> This is the first step in reorganizing all of the HW headers for gm20b, gm206, etc. This is part of a larger effort to re-structure and make the driver more readable and scalable. Bug 1799159 Change-Id: Ic151155cbc2e6f75009f2d9d597b364a1bed2c4c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1244790 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Move allocators to common/mm/	Alex Waterman	2017-01-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the GPU allocators to common/mm/ since the allocators are common code across all GPUs. Also rename the allocator code to move away from gk20a_ prefixed structs and functions. This caused one issue with the nvgpu_alloc() and nvgpu_free() functions. There was a function for allocating either with kmalloc() or vmalloc() depending on the size of the allocation. Those have now been renamed to nvgpu_kalloc() and nvgpu_kfree(). Bug 1799159 Change-Id: Iddda92c013612bcb209847084ec85b8953002fa5 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1274400 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Misc fixes for crashes on shutdown	Alex Waterman	2017-01-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix miscellaneous issues seen during driver shutdown. o Make sure pointers are valid before accessing them. o Busy the GPU during channel timeout. o Cancel delayed work on channels. o Avoid access to channels that may have been freed. Bug 1816516 Bug 1807277 Change-Id: I62df40373fdfb1c4a011364e8c435176a08a7a96 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1250026 (cherry picked from commit 64a95fc96c8ef7c5af9c53c4bb3402626e0d2f60) Reviewed-on: http://git-master/r/1274474 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: rename timeout_check to timeout_expired	Konsta Holtta	2016-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change "check" to "expired" in nvgpu_timeout_check* and append _expired to nvgpu_timeout_peek to clarify what the boolean-like return value means and thus avoid bugs. Bug 200260715 Change-Id: I47e097ee922e856005a79fa9e27eddb1c8d77f8b Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1269366 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Use end of vidmem as bootstrap region	Terje Bergstrom	2016-12-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of hard coding bootstrap region, it should always be set to the last 256MB of vidmem. Bug 200244445 Change-Id: I91779d1bf861f4f23a0b646f70b1febbbc4581b5 Signed-off-by: David Nieto <dmartineznie@nvidia.com> Reviewed-on: http://git-master/r/1242409 Reviewed-by: David Martinez Nieto <dmartineznie@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-on: http://git-master/r/1267124 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: fix timeout retry usage in mm_gk20a.c	Konsta Holtta	2016-12-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Loop conditions of timeout checking introduced in commit 21094783114b9314d57f412196544a34b3a40f4a ("gpu: nvgpu: Use timeout retry API in mm_gk20a.c") were flipped by accident, so each usage in a loop actually did not wait enough but ran only one iteration. Fix the conditions to loop as long as the timeout is NOT expired. Also restore l2 flush timeout to 10 ms from 1, which was done in commit 030ef82bdd474ef4261a2f40995b8db57857899e ("gpu: nvgpu: increase l2 flush timeout") but overwritten by the above "use timeout" commit. Bug 200260715 Change-Id: I0db16be79a1a27caa3d97fac9d4361582cc232e8 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1268482 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Use timeout retry API in mm_gk20a.c	Alex Waterman	2016-12-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the retry API that is part of the nvgpu timeout API to keep track of retry attempts in the vrious flushing and invalidating operations in the MM code. Bug 1799159 Change-Id: I36e98d37183a13d7c3183262629f8569f64fe4d7 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1255866 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: fix hardcoded page size reference	Krishna Reddy	2016-12-05
\| \| \| \| \| \| \| \| \| \| \| \|	Bug 1843356 Bug 1769772 Change-Id: I6c2a3a72f7082074bbf1165a74d5070195e1e653 Signed-off-by: Krishna Reddy <vdumpa@nvidia.com> Reviewed-on: http://git-master/r/1258352 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: increase l2 flush timeout	seshendra Gadagottu	2016-12-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under heavy throttling case gpu runs 8 times slower. This is making l2 flush to timeout, to avoid this increase timeout to 10msec from 1msec. Bug 1787261 Change-Id: Ia112ce968c136135ccb9df4a7364073103684403 Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/1216559 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: API to access fb memory	Deepak Nibade	2016-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add IOCTL API NVGPU_DBG_GPU_IOCTL_ACCESS_FB_MEMORY to read/write fb/vidmem memory Interface will accept dmabuf_fd of the buffer in vidmem, offset into the buffer to access, temporary buffer to copy data across API, size of read/write and command indicating either read or write operation API will first parse all the inputs, and then call gk20a_vidbuf_access_memory() to complete fb access gk20a_vidbuf_access_memory() will then just use gk20a_mem_rd_n() or gk20a_mem_wr_n() depending on the command issued Bug 1804714 Jira DNVGPU-192 Change-Id: Iba3c42410abe12c2884d3b603fa33d27782e4c56 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1255556 (cherry picked from commit 2c49a8a79d93fc526adbf6f808484fa9a3fa2498) Reviewed-on: http://git-master/r/1260471 GVS: Gerrit_Virtual_Submit Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
*	gpu: nvgpu: chip specific init_inst_block	seshendra Gadagottu	2016-11-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add function pointer to add chip specific init_inst_block. Update this function pointer for gk20a and gm20b. JIRA GV11B-21 Change-Id: I74ca6a8b4d5d1ed36f7b25b7f62361c2789b9540 Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/1254875 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Fix signed comparison bugs	Terje Bergstrom	2016-11-17
\| \| \| \| \| \| \| \| \| \| \| \|	Fix small problems related to signed versus unsigned comparisons throughout the driver. Bump up the warning level to prevent such problems from occuring in future. Change-Id: I8ff5efb419f664e8a2aedadd6515ae4d18502ae0 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1252068 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Remove global debugfs variable	Alex Waterman	2016-10-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove a global debugfs variable and instead save the allocator debugfs root node in the gk20a struct. Bug 1799159 Change-Id: If4eed34fa24775e962001e34840b334658f2321c Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1225611 (cherry picked from commit 1908fde10bb1fb60ce898ea329f5a441a3e4297a) Reviewed-on: http://git-master/r/1242390 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Move CE cleanup	Alex Waterman	2016-10-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the CE cleanup to before the FIFO cleanup. Since the CE closes a channel during its cleanup the FIFO needs to be initialized since the FIFO code maintains the vmalloc()'ed channels. Bug 1816516 Change-Id: Ia7a97059a12a0c2b52368ffe411e597f803e8e6e Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1225613 (cherry picked from commit 707bd2a6d4672c6a7b7a8b2e581ea3a606ed971d) Reviewed-on: http://git-master/r/1240106 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Only cleanup existing semaphore pools	Alex Waterman	2016-10-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not all VMs have semaphore pools made for them even when semaphores are going to be used. Thus only VMs with existing semaphore pools should have their pools cleaned up. Bug 1816516 Change-Id: I07828708faef451f1711f58c0d5b3f8e4d296dd0 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1225612 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> (cherry picked from commit 6cdb7b6650765465dca68dc3c23b3d795ccdafb5) Reviewed-on: http://git-master/r/1240105 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: SLAB allocation for page allocator	Alex Waterman	2016-10-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the ability to do "SLAB" allocation in the page allocator. This is generally useful since the allocator manages 64K pages but often we only need 4k chunks (for example when allocating memory for page table entries). Bug 1799159 JIRA DNVGPU-100 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1225322 (cherry picked from commit 299a5639243e44be504391d9155b4ae17d914aa2) Change-Id: Ib3a8558d40ba16bd3a413f4fd38b146beaa3c66b Reviewed-on: http://git-master/r/1227924 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: skip pramin barriers for page tables	Konsta Holtta	2016-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Page table updates have an explicit write barrier at the end of a pte update operation in update_gmmu_ptes_locked(), so the per-wr32 wmb()s are not necessary. Jira DNVGPU-23 Change-Id: I2e2596f0900d840fadb369ee1261c5e2305f2070 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1225150 (cherry picked from commit 6664a667ea326e9663a6b502765f858d8669f4d9) Reviewed-on: http://git-master/r/1227475 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: allow skipping pramin barriers	Konsta Holtta	2016-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A wmb() next to each gk20a_mem_wr32() via PRAMIN may be overly careful, so support not inserting these barriers for performance, in cases where they are not necessary, where the caller would do an explicit barrier after a bunch of reads. Also, move those optional wmb()s to be done at the end of the whole internally batched write for gk20a_mem_{wr_n,memset} from the per-batch subloops that may run multiple times. Jira DNVGPU-23 Change-Id: I61ee65418335863110bca6f036b2e883b048c5c2 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1225149 (cherry picked from commit d2c40327d1995f76e8ab9cb4cd8c76407dabc6de) Reviewed-on: http://git-master/r/1227474 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Fix wmb() order in pramin access	Alex Waterman	2016-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	wmb() should come after the writes to ensure that the writes have completed before progressing. Bug 1811382 Change-Id: I98fba317b1760240c0b5de531accf398fe69c9b3 Signed-off-by: Alex Waterman <alexw@nvidia.com> (cherry picked from commit 1b1201b9c109061590e6e25260d7230ae2c89888) Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1225251 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: add ioctl for querying memory state	Konsta Holtta	2016-10-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add NVGPU_GPU_IOCTL_GET_MEMORY_STATE to read the amount of free device-local video memory, if applicable. Some reserved fields are added to support different types of queries in the future (e.g. context-local free amount). Bug 1787771 Bug 200233138 Change-Id: Id5ffd02ad4d6ed3a6dc196541938573c27b340ac Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1223762 (cherry picked from commit 96221d96c7972c6387944603e974f7639d6dbe70) Reviewed-on: http://git-master/r/1235980 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: track pending bytes for vidmem clears	Konsta Holtta	2016-10-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change clears_pending to bytes_pending and track accordingly the number of bytes to be freed instead of the number of buffers. This, atomically combined with the amount of space in the allocator, is the total amount of free memory available. Bug 200233138 Change-Id: Ibbb4e80a32728781ba19a74307d8a8ac1a4d7431 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1231422 (cherry picked from commit 025e765f312c253b201ecf2dbbe0f4972fe1d4bc) Reviewed-on: http://git-master/r/1235957 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: compact pte buffers	Konsta Holtta	2016-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The lowest page table level may hold very few entries for mappings of large pages, but a new page is allocated for each list of entries at the lowest level, wasting memory and performance. Compact these so that the new "allocation" of ptes is appended at the end of the previous allocation, if there is space. 4 KB page is still the smallest size requested from the allocator; any possible overhead in the allocator (e.g., internally allocating big pages only) is not taken into account. Bug 1736604 Change-Id: I03fb795cbc06c869fcf5f1b92def89a04583ee83 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1221841 (cherry picked from commit fa92017ed48e1d5f48c1a12c512641c6ce9924af) Reviewed-on: http://git-master/r/1234996 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: make sure vidmem is cleared only once	Konsta Holtta	2016-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect the initial vidmem zeroing performed during the first userspace alloc with a mutex, so that it blocks next concurrent users and is run only once. Otherwise, multiple clears could end up running in parallel, so that the next ones corrupt memory allocated by the thread that has finished earlier and advanced to allocate and use memory. Jira DNVGPU-84 Change-Id: If497749abf481b230835250191d011c4a9d1483b Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1232461 (cherry picked from commit 79435a68e6d2713b78acdb0ec6f77cfd78651d7f) Reviewed-on: http://git-master/r/1234990 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: userd allocation from sysmem	seshendra Gadagottu	2016-10-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When bar1 memory is not supported then userd will be allocated from sysmem. Functions gp_get and gp_put are updated accordingly. JIRA GV11B-1 Change-Id: Ia895712a110f6cca26474228141488f5f8ace756 Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: http://git-master/r/1225384 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: optimize barrier in batch pramin writes	Konsta Holtta	2016-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move wmb() before the loop in pramin-accessed batch writes and use writel_relaxed() directly, instead of calling gk20a_writel() that would do wmb() on each iteration separately. Jira DNVGPU-24 Change-Id: I4c1375a819266727f97e2f109d3132b5b0974ac6 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1213600 (cherry picked from commit 79e3e38e0c5384ababfd55b8e6cd9723eb8f7b66) Reviewed-on: http://git-master/r/1184343 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: fix allocation and map size mismatch while mapping	Deepak Nibade	2016-09-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible to allocate larger size than user requested e.g. If we allocate at 64k granularity, and user asks for 32k buffer, we end up allocating 64k chunk. User still asks to map the buffer with size 32k and hence we reserve mapping addresses only for 32k But due to bug in mapping in update_gmmu_ptes_locked() we end up creating mappings considering size of 64k and corrupt some mappings Fix this by considering min(chunk->length, map_size) while mapping address range for a chunk Also, map_size will be zero once we map all requested address range. So bail out from the loop if map_size is zero Bug 1805064 Change-Id: I125d3ce261684dce7e679f9cb39198664f8937c4 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1217755 (cherry picked from commit 3ee1c6bc0718fb8dd9a28a37eff43a2872bdd5c0) Reviewed-on: http://git-master/r/1221775 GVS: Gerrit_Virtual_Submit Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
*	gpu: nvgpu: Use actual carveouts for WPR region	Alex Waterman	2016-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a carveout for the WPR region in the VIDMEM. Jira DNVGPU-84 Change-Id: I191ecc3bb317ae3af6b56f5970194e646c513964 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1208527 (cherry picked from commit 7edf74d7468dcff1f01cbd901d83aa0e32602f0e) Reviewed-on: http://git-master/r/1223455 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>