nvgpu.git - Tegra GPU Driver. Originally from nv-tegra.nvidia.com/linux-nvgpu.git.

	Commit message (Collapse)	Author	Age
...
*	gpu: nvgpu: add new API to get base address for sysmem/vidmem buffers	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new API gk20a_mem_get_base_addr() which will return vidmem base address in case of vidmem and IOVA address in case of sysmem Even though vidmem allocations are non-contiguous, this API is useful (and should only be used) for allocations with one chunk (e.g. page tables) Also, since page tables could either reside in sysmem or vidmem, use this API to get address of page tables Jira DNVGPU-20 Change-Id: Ie04af9ca7bfccfec1a8a8e4be2c507cef5cef8e1 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1206403 (cherry picked from commit a8c74dc188878f2948fa1e0e47bf1837fba6c5e0) Reviewed-on: http://git-master/r/1210957 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: allocate blob space early	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allocting blob space for pmu might need fixed address allocation in vidmem and during boot up But if some page tables are allocated before blob space, blob space allocation could fail Fix this by allocating blob space early during boot up Jira DNVGPU-20 Change-Id: I30eca1023c8f8f8be101bb7e160ba57a7040911a Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1206402 (cherry picked from commit fad4309ce345ed3879f497bda27f2eceb1084dbb) Reviewed-on: http://git-master/r/1210956 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Add proper timeout handling for vidmem clear operations	Lakshmanan M	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gk20a_fence_wait() api may be interrupted by a signal before actual its timeout elapsed. This CL does retry (-ERESTARTSYS) mechanism if gk20a_fence_wait() return before its timeout elapsed. Bug 200230544 Change-Id: I347ed2004935a8b9413f95dcb6fca2b74bf49f2a Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: http://git-master/r/1206265 (cherry picked from commit d3ef533942487785d84d109f985ae648eb3c2434) Reviewed-on: http://git-master/r/1210955 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: support pramin access for non-contiguous vidmem	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	API pramin_access_batched() currenly only supports contiguous allocations. Modify this API to support non-contiguous allocations from page allocator as well Update gk20a_mem_wr32() and gk20a_mem_rd32()to reuse pramin_access_batched() Use gk20a_memset() in gk20a_gmmu_free_attr_vid() to clear vidmem pages for kernel buffers Jira DNVGPU-30 Change-Id: I43630912f4837d8ebc6b9c58f4f427218ef9725b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1204303 (cherry picked from commit 2f84f141d02fd2f641cb18a48896fb3ae5f7e51f) Reviewed-on: http://git-master/r/1210954 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: track allocator and user for each mem	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Store allocator pointer for each mem_desc This pointer should be used while freeing the mem instead of assuming a common allocator Add flag user_mem to mem_desc which will be set only in case of User vidmem allocations We will delay free of mem in worker only if this flag is set on mem. Otherwise, we will free it immediately This is needed so that all kernel allocations can work with both sysmem and vidmem Jira DNVGPU-84 Change-Id: Ib9a9209b164bc56b7880448f86bd6d42b324cc86 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1203099 (cherry picked from commit 8f0b0122f36a0b6f1932fa9a98d7eb03b1f623d1) Reviewed-on: http://git-master/r/1210953 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: clear vidmem buffers in worker	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We clear buffers allocated in vidmem in buffer free path. But to clear buffers, we need to submit CE jobs and this could cause issues/races if free called from critical path Hence solve this by moving buffer clear/free to a worker gk20a_gmmu_free_attr_vid() will now just put mem_desc into a list and schedule a worker And worker thread will traverse the list and clear/free the allocations In struct gk20a_vidmem_buf, mem variable is statically allocated. But since we delay free of mem, convert this variable into a pointer and allocate it dynamically Since we delay free of vidmem memory, it is now possible to face OOM conditions during allocations. Hence while allocating block until we have sufficient memory available with an upper limit of 1S Jira DNVGPU-84 Change-Id: I7925590644afae50b6fc04c6e1e43bbaa1c220fd Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1201346 (cherry picked from commit b4dec4a30de2431369d677acca00e420f8e581a5) Reviewed-on: http://git-master/r/1210950 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: clear whole vidmem on first allocation	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently clear vidmem pages in gk20a_gmmu_alloc_attr_vid_at() i.e. allocation path for each buffer But since buffer allocation path could be latency critical, clear whole vidmem first and before first User allcation in gk20a_vidmem_buf_alloc() And then clear buffer pages while releasing the buffer In this way, we can ensure that vidmem pages are already cleared during buffer allocation path At a later stage, clearing of pages can be removed from free path and moved to a separate worker as well At this point, first allocation has overhead of clearing whole vidmem which takes about 380mS and this should improve once clocks are raised. Also, this is one time larency, and subsequent allocations should not have any overhead for clearing at all Add API gk20a_vidmem_clear_all() to clear whole vidmem We have WPR buffers allocated during boot up and at fixed address in vidmem. To prevent overwriting to these buffers in gk20a_vidmem_clear_all(), clear whole vidmem except for the bootstrap allocator carveout Add new API gk20a_gmmu_clear_vidmem_mem() to clear one mem_desc Jira DNVGPU-84 Change-Id: I5661700585c6241a6a1ddeb5b7c068d3d2aed4b3 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1194301 (cherry picked from commit 950ab61a04290ea405968d8b0d03e3bd044ce83d) Reviewed-on: http://git-master/r/1193158 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Add a bootstrap vidmem allocator	Alex Waterman	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an allocator for allocating vidmem before the CE has had a chance to be initialized (and clear the rest of vidmem). Jira DNVGPU-84 Change-Id: I5166607a712b3a6eb4c2906b8c7d002c68a6567b Signed-off-by: Alex Waterman <alexw@nvidia.com> Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1197204 (cherry picked from commit b4e68e84eedd952637b2332d8dc73a9090d6d62e) Reviewed-on: http://git-master/r/1210949 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: support GMMU mappings for vidmem page allocator	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Switch to use page allocator for vidmem Support GMMU mappings for page (non-contiguous page allocator) in update_gmmu_ptes_locked() If aperture is VIDMEM, traverse each chunk in an allocation and map it to GPU VA separately Fix CE page clearing to support page allocator Fix gk20a_pramin_enter() to get base address from new allocator Define API gk20a_mem_get_vidmem_addr() to get base address of allocation. Note that this API should not be used if we have more than 1 chunk Jira DNVGPU-96 Change-Id: I725422f3538aeb477ca4220ba57ef8b3c53db703 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1199177 (cherry picked from commit 1afae6ee6529ab88cedd5bcbe458fbdc0d4b1fd8) Reviewed-on: http://git-master/r/1197647 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: fix build error when CONFIG_DEBUG_FS=n	David Pu	2016-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	adding 'ifdef CONFIG_DEBUG_FS' check to fix following compilation error when CONFIG_DEBUG_FS=n(which is used for Android 'production' build): mm_gk20a.c: In function 'gk20a_mm_debugfs_init': mm_gk20a.c:4824:2: error: implicit declaration of function 'debugfs_create_x64' [-Werror=implicit-function-declaration] Bug 1778001 Change-Id: I785288a37b96c391b84925d5971d2691cf80206e Signed-off-by: David Pu <dpu@nvidia.com> Reviewed-on: http://git-master/r/1210393 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Add gpu_dbg_map_v message type	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new debug message type: gpu_dbg_map_v. This is used for mapping messages that are not specifically memory map operations. Also cleanup the memory mapping debugging a bit since there was one duplicate print and the memory map print was difficult to parse visually. As a result the message has been modified to put the most important information first in an easily readable format. Bug 1732449 JIRA DNVGPU-12 Change-Id: Ib19c9371ee958009ab5a2d89b9610e699d070ee2 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1198593 (cherry picked from commit 51dba53b06ca171cdb13d1707f2d026b0ce29f07) Reviewed-on: http://git-master/r/1147670 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Implement a vidmem allocator	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement an allocator suitable for managing the video memory on dGPUs. It works by allocating chunks from an underlying buddy allocator and collating the chunks together (similar to what an sgt does in the wider Linux kernel). This handles the ability to get large buffers in potentially fragmented memory. The GMMU can then obviously map the physical vidmem into contiguous GVA spaces. Jira DNVGPU-96 Change-Id: Ic1d7800b033a170b77790aa23fad6858443d0e89 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1197203 (cherry picked from commit fa44684a843956ae384fef6d7a79b9cbbd04f73e) Reviewed-on: http://git-master/r/1185231 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Exclude first page from vidmem size	Terje Bergstrom	2016-08-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We initialized vidmem allocator with base=4K, and size of 4GB. This caused allocator to allocate addresses between 4K and 4GB+4K, causing a physical MMU fault. Bug 1793810 Change-Id: I554f62aeee4080acd86ef2c8011089ec9b8120df Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1196300 (cherry picked from commit 41a860e21c6da3f8fda58ceb56e78316f6987f53) Reviewed-on: http://git-master/r/1200712 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit
*	gpu: nvgpu: add check for is_fmodel	Seema Khowala	2016-07-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	is_fmodel flag will be set in gk20a_probe(). Updated code for is_fmodel check, instead of check for supported simulated platforms. Bug 1735760 Change-Id: I7cbac2196130fe5ce4c1a910504879e6948c13da Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: http://git-master/r/1177869 Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Adeel Raza <araza@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: add aperture and size to map logging	Konsta Holtta	2016-07-22
\| \| \| \| \| \| \| \| \| \| \| \|	Include the buffer aperture flag (sysmem/vidmem/invalid) and the size of the buffer and of the mapping in logging strings during gmmu map path. Change-Id: Ie4c46bf9cb5db79b738571029d46ce8cbfc63f99 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1189492 GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
*	gpu: nvgpu: support userspace vidmem mappings	Konsta Holtta	2016-07-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When mapping a userspace buffer, determine if it was vidmem allocated from the aperture of the current gpu, and pass that information into page tables. Mapping a vidmem buffer to a gpu it wasn't allocated from is disallowed. This includes mapping vidmem to igpus and to possibly other dgpus on the system. Jira DNVGPU-19 Change-Id: Ia9d2d0133e77659ab96b36ed61eeb4cd5a2b7dff Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1169309 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
*	gpu: nvgpu: add vidmem allocation ioctl	Konsta Holtta	2016-07-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add NVGPU_GPU_IOCTL_ALLOC_VIDMEM to the ctrl fd for letting userspace allocate on-board GPU memory (aka vidmem). The allocations are returned as dmabuf fds. Also, report the amount of local video memory in the gpu characteristics. Jira DNVGPU-19 Jira DNVGPU-38 Change-Id: I28e361d31bb630b96d06bb1c86d022d91c7592bc Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1181152 GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
*	gpu: nvgpu: add vidmem manager	Konsta Holtta	2016-07-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the nvgpu-internal buddy allocator for video memory allocations, instead of nvmap. This allows better integration for copyengine, BAR1 mapping to userspace, etc. Jira DNVGPU-38 Change-Id: I9fd67b76cd39721e4cd8e525ad0ed76f497e8b99 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1181151 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
*	gpu: nvgpu: Add nvgpu infra to allow kernel to create privileged CE channels	Lakshmanan M	2016-07-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	Added interface to allow kernel to create privileged CE channels for page migration and clearing support between sysmem and videmem. JIRA DNVGPU-53 Change-Id: I3e18d18403809c9e64fa45d40b6c4e3844992506 Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: http://git-master/r/1173085 GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
*	gpu: nvgpu: Change the allocator flag naming scheme	Alex Waterman	2016-07-19
\| \| \| \| \| \| \| \| \|	Move to a more generic name of GPU_ALLOC_*. Change-Id: Icbbd366847a9d74f83f578e4d9ea917a6e8ea3e2 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1176445 Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
*	gpu: nvgpu: Support multiple types of allocators	Alex Waterman	2016-07-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support multiple types of allocation backends. Currently there is only one allocator implementation available: a buddy allocator. Buddy allocators have certain limitations though. For one the allocator requires metadata to be allocated from the kernel's system memory. This causes a given buddy allocation to potentially sleep on a kmalloc() call. This patch has been created so that a new backend can be created which will avoid any dynamic system memory management routines from being called. Bug 1781897 Change-Id: I98d6c8402c049942f13fee69c6901a166f177f65 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1172115 GVS: Gerrit_Virtual_Submit Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
*	gpu: nvgpu: fix gk20a_mm_smmu_vaddr_translate()	Richard Zhao	2016-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- remove checking of has_physical_mode - check whether get_physical_addr_bits is null JIRA VFND-1965 Change-Id: If19b297dc853b9e0b5879c5b2e0a350b5d9b279a Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: http://git-master/r/1175738 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Thomas Fleury <tfleury@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
*	gpu: nvgpu: handle map/unmap for vidmem gmmu pages	Konsta Holtta	2016-07-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If page tables are allocated from vidmem, cpu cache flushing doesn't make sense, so skip it. Unify also map/unmap actions if the pages are not mapped. Jira DNVGPU-20 Change-Id: I36b22749aab99a7bae26c869075f8073eab0f860 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1178830 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
*	gpu: nvgpu: zero vidmem pages on allocation	Konsta Holtta	2016-07-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The allocator doesn't give us empty pages, so make sure that they're full of zeros, just like the sysmem alloc path does. Jira DNVGPU-16 Change-Id: I0ff8a0718829b13973535ba1111a8a11b91be04d Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1178829 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
*	gpu: nvgpu: use vidmem by default in gmmu_alloc variants	Konsta Holtta	2016-07-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For devices that have vidmem available, use the vidmem allocator in gk20a_gmmu_alloc{,attr,_map,_map_attr}. For others, use sysmem. Because all of the buffers haven't been tested to work in vidmem yet, rename calls to gk20a_gmmu_alloc{,attr,_map,_map_attr} to have _sys at the end to declare explicitly that vidmem is used. Enabling vidmem for each now is a matter of removing "_sys" from the function call. Jira DNVGPU-18 Change-Id: Ibe42f67eff2c2b68c36582e978ace419dc815dc5 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1176805 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Make gk20a_init_sema_pool() static	Alex Waterman	2016-07-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function is only used in mm_gk20a.c and as a result should be static (fixes a sparse issue). Bug 200088648 Change-Id: I6787b4ebc5925a503d8ef2fed90c3d7cd5027589 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1176309 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Richard Zhao <rizhao@nvidia.com>
*	gpu: nvgpu: support in-kernel vidmem mappings	Konsta Holtta	2016-07-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Propagate the buffer aperture flag in gk20a_locked_gmmu_map up so that buffers represented as a mem_desc and present in vidmem can be mapped to gpu. JIRA DNVGPU-18 JIRA DNVGPU-76 Change-Id: I46cf87e27229123016727339b9349d5e2c835b3e Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1169308 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: ngpu: add support for vidmem in page tables	Konsta Holtta	2016-07-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Modify page table updates to take an aperture flag (up until gk20a_locked_gmmu_map()), don't hard-assume sysmem and propagate it to hardware. Jira DNVGPU-76 Change-Id: Ifcb22900c96db993068edd110e09368f72b06f69 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1169307 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: initial support for vidmem apertures	Konsta Holtta	2016-07-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	add gk20a_aperture_mask() for memory target selection now that buffers can actually be allocated from vidmem, and use it in all cases that have a mem_desc available. Jira DNVGPU-76 Change-Id: I4353cdc6e1e79488f0875581cfaf2a5cfb8c976a Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1169306 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Use 64-bit math for cacheline offset	Terje Bergstrom	2016-07-04
\| \| \| \| \| \| \| \| \| \|	Cast cacheline_start to u64 to use 64-bit maths for cacheline offset. Coverity ID 24250 Change-Id: Ic10c92ebb737bd39486a83e4de53cc1191193667 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1172053
*	gpu: nvgpu: Revamp semaphore support	Alex Waterman	2016-06-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revamp the support the nvgpu driver has for semaphores. The original problem with nvgpu's semaphore support is that it required a SW based wait for every semaphore release. This was because for every fence that gk20a_channel_semaphore_wait_fd() waited on a new semaphore was created. This semaphore would then get released by SW when the fence signaled. This meant that for every release there was necessarily a sync_fence_wait_async() call which could block. The latency of this SW wait was enough to cause massive degredation in performance. To fix this a fast path was implemented. When a fence is passed to gk20a_channel_semaphore_wait_fd() that is backed by a GPU semaphore a semaphore acquire is directly used to block the GPU. No longer is a sync_fence_wait_async() performed nor is there an extra semaphore created. To implement this fast path the semaphore memory had to be shared between channels. Previously since a new semaphore was created every time through gk20a_channel_semaphore_wait_fd() what address space a semaphore was mapped into was irrelevant. However, when using the fast path a sempahore may be released on one address space but acquired in another. Sharing the semaphore memory was done by making a fixed GPU mapping in all channels. This mapping points to the semaphore memory (the so called semaphore sea). This global fixed mapping is read-only to make sure no semaphores can be incremented (i.e released) by a malicious channel. Each channel then gets a RW mapping of it's own semaphore. This way a channel may only acquire other channel's semaphores but may both acquire and release its own semaphore. The gk20a fence code was updated to allow introspection of the GPU backed fences. This allows detection of when the fast path can be taken. If the fast path cannot be used (for example when a fence is sync-pt backed) the original slow path is still present. This gets used when the GPU needs to wait on an event from something which only understands how to use sync-pts. Bug 1732449 JIRA DNVGPU-12 Change-Id: Ic0fea74994da5819a771deac726bb0d47a33c2de Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1133792 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: add vidmem allocation API	Konsta Holtta	2016-06-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add in-nvgpu APIs for allocating and freeing mem_descs in video memory. Changes for gmmu tables etc. will be added in upcoming changes. Video memory is allocated via nvmap by initially registering the aperture size to it and binding it to a struct device, and then going via the usual dma alloc. This API allows also fixed-address allocations, meant for reserving special memory areas at boot. The aperture registration is skipped completely if vidmem isn't found for the particular device. gk20a_gmmu_alloc_attr() still uses sysmem, and the unmap/free paths select internally the correct path by the mem_desc's aperture. Video memory allocation is off by default, and can be turned on with CONFIG_GK20A_VIDMEM. JIRA DNVGPU-16 Change-Id: I77eae5ea90cbed6f4b5db0da86c5f70ddf2a34f9 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1157216 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: optimize mem_desc accessor loops	Konsta Holtta	2016-06-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of going via gk20a_mem_{wr,rd}32() on each iteration, do direct memcpy/memset with sysmem, and minimize the enter/exit overhead with vidmem. JIRA DNVGPU-23 Change-Id: I5437e35f8393a746777a40636c1e9b5d93ced1f6 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1159524 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: detect vidmem configuration from HW	Konsta Holtta	2016-06-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	Read video memory size from hardware during initialization for devices that support it. JIRA DNVGPU-14 Change-Id: If190f2d89f7148520ee274ca674f972987c8056d Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1157215 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: cache whole bar0_window for mem accesses	Konsta Holtta	2016-06-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Save the whole bar0 window register that encodes also the target aperture (vid/sys mem) instead of only the base address that could overlap between the two. JIRA DNVGPU-23 Change-Id: I2ccbea0e1f7c7310c1ca6b158afafe8fd974a615 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1159523 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: add PRAMIN support for mem accessors	Konsta Holtta	2016-05-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To support vidmem, implement a way to access buffers via the PRAMIN window instead of just kernel-mapped sysmem buffers for iGPU as of now. Depending on the buffer aperture, choose between the two access types in the buffer memory accessor functions. vmap()/vunmap() pairs are no-ops for buffers that can't be cpu-mapped. Two uses of DMA_ATTR_READ_ONLY are removed in the ucode loading path to support writing to them too via the indirection in addition to cpu. JIRA DNVGPU-23 Change-Id: I282dba6741c6b8224bc12e69c1fb3936bde7e6ed Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1141314 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	Revert "gpu: nvgpu: Enable FB before initializing L2"	Terje Bergstrom	2016-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit df05d2a7c214bc8cdb887f1609853d0f424ef6f1. It causes intermittent failures on laguna_t124. Bug 1766083 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Change-Id: Idabcf96d2eaddc989b029c429cec213bcabbf28c Reviewed-on: http://git-master/r/1147683 Reviewed-by: Alexander Van Brunt <avanbrunt@nvidia.com> Tested-by: Alexander Van Brunt <avanbrunt@nvidia.com>
*	gpu: nvgpu: refactor gk20a_mem_{wr,rd} for vidmem	Konsta Holtta	2016-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To support vidmem, pass g and mem_desc to the buffer memory accessor functions. This allows the functions to select the memory access method based on the buffer aperture instead of using the cpu pointer directly (like until now). The selection and aperture support will be in another patch; this patch only refactors these accessors, but keeps the underlying functionality as-is. gk20a_mem_{rd,wr}32() work as previously; add also gk20a_mem_{rd,wr}() for byte-indexed accesses, gk20a_mem_{rd,wr}_n() for memcpy()-like functionality, and gk20a_memset() for filling buffers with a constant. The 8 and 16 bit accessor functions are removed. vmap()/vunmap() pairs are abstracted to gk20a_mem_{begin,end}() to support other types of mappings or conditions where mapping the buffer is unnecessary or different. Several function arguments that would access these buffers are also changed to take a mem_desc instead of a plain cpu pointer. Some relevant occasions are changed to use the accessor functions instead of cpu pointers without them (e.g., memcpying to and from), but the majority of direct accesses will be adjusted later, when the buffers are moved to support vidmem. JIRA DNVGPU-23 Change-Id: I3dd22e14290c4ab742d42e2dd327ebeb5cd3f25a Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1121143 Reviewed-by: Ken Adams <kadams@nvidia.com> Tested-by: Ken Adams <kadams@nvidia.com>
*	gpu: nvgpu: Enable FB before initializing L2	Terje Bergstrom	2016-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deassert reset in L2 and FB before initializing L2. In gk20a L2 can be off and thus writing registers results in a priv ring failure. Change-Id: I680b8b1e77cf67a8269c6de59a15d9817301300e Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1140482 (cherry picked from commit d85edcf4170d7bc59d2c080f4343bc2f959be023) Reviewed-on: http://git-master/r/1143684 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit
*	gpu: nvgpu: don't alloc gmmu bufs manually	Konsta Holtta	2016-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use gk20a_gmmu_{alloc,free}() instead of duplicating their code for page tables. The linsim-specific special case is kept as-is. JIRA DNVGPU-23 JIRA DNVGPU-20 Change-Id: I66d772337bad5d081256b13877c4e713ea8b634a Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1139695 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: adapt gk20a_mm_entry for mem_desc	Konsta Holtta	2016-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For upcoming vidmem refactor, replace struct gk20a_mm_entry's contents identical to struct mem_desc, with a struct mem_desc member. This makes it possible to use the page table buffers like the others too. JIRA DNVGPU-23 JIRA DNVGPU-20 Change-Id: I714ee5dcb33c27aaee932e8e3ac367e84610b102 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1139694 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Do not generate any ctag info unless enabled	Alex Waterman	2016-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do not put any ctag data in the PTEs unless compression is actually enabled for the mapping. Bug 1732449 JIRA DNVGPU-12 Change-Id: I2abfbf9d1282af24541f8199bd9fbf2133c12899 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1133790 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Add gk20a_gmmu_fixed_map() function	Alex Waterman	2016-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a function to allow the kernel to do fixed mappings. Necessary for the semaphore functionality since there needs to be a common address in each VM for the semaphores. Bug 1732449 JIRA DNVGPU-12 Change-Id: I2b451db2d3cb3c003d951f7b0ffc87f6c91db7dc Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1133789 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Program NISO sysmem flush addr	Terje Bergstrom	2016-04-25
\| \| \| \| \| \| \| \| \| \| \| \|	Program sysmem flush address to prevent random accesses of address 0. Change-Id: I886170395f036805f02e0bce7ecd3c8c46b921df Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1129216 GVS: Gerrit_Virtual_Submit Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
*	gpu: nvgpu: Wait for BAR1 bind	Terje Bergstrom	2016-04-15
\| \| \| \| \| \| \| \| \|	Wait for BAR1 bind to complete before continuing. The register to wait exists Maxwell onwards. Change-Id: Ie3736033fdb748c5da8d7a6085ad6d63acaf41f5 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1123941
*	gpu: nvgpu: Use sysmem aperture for SoC memory	Terje Bergstrom	2016-04-15
\| \| \| \| \| \| \| \| \|	In Tegra GPU, SoC memory has to be accessed as vidmem. In discrete GPU, it has to be accessed as sysmem. Change-Id: I4efe71b54a9a32f0bf1f02ec4016ed74405a14c5 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1120468
*	gpu: nvgpu: Support GPUs with no physical mode	Terje Bergstrom	2016-04-13
\| \| \| \| \| \| \| \| \| \| \|	Support GPUs which cannot choose between SMMU and physical addressing. Change-Id: If3256fa1bc795a84d039ad3aa63ebdccf5cc0afb Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1120469 GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com>
*	gpu: nvgpu: Use device instead of platform_device	Terje Bergstrom	2016-04-08
\| \| \| \| \| \| \| \| \|	Use struct device instead of struct platform_device wherever possible. This allows adding other bus types later. Change-Id: I1657287a68d85a542cdbdd8a00d1902c3d6e00ed Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1120466
*	gpu: nvgpu: split address space for fixed allocs	Alex Waterman	2016-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow a special address space node to be split out from the user adress space or fixed allocations. A debugfs node, /d/<gpu>/separate_fixed_allocs Controls this feature. To enable it: # echo <SPLIT_ADDR> > /d/<gpu>/separate_fixed_allocs Where <SPLIT_ADDR> is the address to do the split on in the GVA address range. This will cause the split to be made in all subsequent address space ranges that get created until it is turned off. To turn this off just echo 0x0 into the same debugfs node. Change-Id: I21a3f051c635a90a6bfa8deae53a54db400876f9 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1030303 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Clear comptags for whole buffer	Terje Bergstrom	2016-03-22
\| \| \| \| \| \| \| \| \| \| \| \|	Clear comptags for whole buffer when nvgpu sees the buffer for the first time. Change-Id: I67108ce0f0def46ddda1aa9b9bb5ea22549cce13 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1013517 (cherry picked from commit 544446aacdc695dc2e27c42a0086292cd69c2eee) Reviewed-on: http://git-master/r/1031009 GVS: Gerrit_Virtual_Submit