nvgpu.git - Tegra GPU Driver. Originally from nv-tegra.nvidia.com/linux-nvgpu.git.

	Commit message (Collapse)	Author	Age
*	gpu: nvgpu: move gpfifo submit wait to userspace	Aingara Paramakuru	2016-09-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of blocking for gpfifo space in the nvgpu driver, return -EAGAIN and allow userspace to decide the blocking policy. Bug 1795076 Change-Id: Ie091caa92aad3f68bc01a3456ad948e76883bc50 Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com> Reviewed-on: http://git-master/r/1202591 (cherry picked from commit 8056f422c6a34a4239fc4993c40c2e517c932714) Reviewed-on: http://git-master/r/1203800 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix null access in page table allocation	Konsta Holtta	2016-09-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check entry->mem.sgt for validity before attempting to dereference it in a debug print. Bug 1809939 Change-Id: If7aa7444c162a076d8f23a88dfd2e3e0a9c33813 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1215522 (cherry picked from commit 48c25cd4f1db9d5bb07847af4de29d8f369b52e3) Reviewed-on: http://git-master/r/1220547 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: fix chunk size mismatch in page allocator	Konsta Holtta	2016-09-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When allocating discontiguous memory composed of several chunks, update also the number of pages used by the current chunk, if a large chunk was not available and a retry is performed with a smaller one. Failing to do this would result in too few chunks reserved for a large enough allocation in certain conditions. Bug 1805067 Change-Id: I9d14864724d228b42c47eb4669fbe0f789334397 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1214914 (cherry picked from commit 9bece931b13e4dad808622462d4d98d421cfb383) Reviewed-on: http://git-master/r/1220546 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: test free user vidmem atomically	Konsta Holtta	2016-09-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An empty list of soon-to-be-freed userspace vidmem buffers is not enough to safely assume that an allocation may succeed or not if tried again, because removal from the list and actually marking the memory freed is not atomic. Fix this by using an atomic counter for the number of pending frees (so that it's still safe to first remove from the job list and then perform the free), and making allocation attempts combined with a test of pending frees atomic. This still does not guarantee that there is memory available (as the actual amount of pending memory in bytes plus the current free amount isn't computed), but removes the race that produces false negatives in case a single program expects repeated frees and allocs to succeed. Bug 1809939 Change-Id: I6a92da2e21cbf3f886b727000c924d56f35ce55b Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1217078 (cherry picked from commit 83c1f1e70dccd92fdd4481132cf5b6717760d432) Reviewed-on: http://git-master/r/1220545 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: vgpu: NULL out unused css entries	Peter Daifuku	2016-09-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix cyclestats snapshots HAL entries in the vgpu case, need to null out the ones that don't apply. Bug 1700143 JIRA EVLR-278 Change-Id: I1b5f4652d1bf3283d96fdb3c2f66c4f69a9f6acc Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: http://git-master/r/1217507 Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: use spinlock for ch timeout lock	Aingara Paramakuru	2016-09-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The channel timeout lock guards a very small critical section. Use a spinlock instead of a mutex for performance. Bug 1795076 Change-Id: I94940f3fbe84ed539bcf1bc76ca6ae7a0ef2fe13 Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com> Reviewed-on: http://git-master/r/1200803 (cherry picked from commit 4fa9e973da141067be145d9eba2ea74e96869dcd) Reviewed-on: http://git-master/r/1203799 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: change the usage of tegra_fuse_readl	Shardar Shariff Md	2016-09-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tegra_fuse_readl() prototype is changed to match upstreamed fuse driver, so change implementation accordingly. Bug 200233653 Change-Id: I01f23cfafd5923d86ac48e67b36132ce690e962b Signed-off-by: Shardar Shariff Md <smohammed@nvidia.com> Reviewed-on: http://git-master/r/1217374 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
*	gpu: nvgpu: fix pmu_copy_to_dmem spew	David Nieto	2016-09-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The error check was not taking account of the DMEM address wrap-around JIRA DNVGPU-34 Change-Id: Ibfed5532c3ee785b3061e6837f012939118a7ece Signed-off-by: David Nieto <dmartineznie@nvidia.com> Reviewed-on: http://git-master/r/1206460 (cherry picked from commit 080953c20f91068ccaaa564d9492a1582ffa28fe) Reviewed-on: http://git-master/r/1218297 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Call init_cbc only when defined	Terje Bergstrom	2016-09-12
\| \| \| \| \| \| \| \| \| \| \| \| \|	Call init_cbc only when it contains a non-NULL pointer. Bug 1799537 Change-Id: Ic23f264e10daff30365bf3cf86ac9c155f50e497 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1208008 (cherry picked from commit ec69fa15c32f49d96939fd9a672faec45e078dfa) Reviewed-on: http://git-master/r/1217298 Reviewed-by: Automatic_Commit_Validation_User
*	gpu: nvgpu: refactor pmu include	Vijayakumar Subbu	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	split pmu include files to add lot more APIs pmu_api.h - all the current APIs used in igpu pmu_common.h - common defines for all APIs pmu_gk20a.h - SW defines specific needed for nvgpu like PMU version, PMU SW structure definition etc. Splitting APIs to separate files allows us to use auto generated PMU task headers from RM We have script which generates pmu interface herader files in linux format. It replaces RM with NV. Adding typedef in existing pmu code make auto generated files easy to compile/add JIRA DNVGPU-85 Change-Id: I851b88769fe8d60561a44754ddb7dde45b45959e Signed-off-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-on: http://git-master/r/1192702 Reviewed-on: http://git-master/r/1203124 (cherry picked from commit 0fe5f020c3f934cf2cc5336f1b6c3bafaf9e0c2a) Reviewed-on: http://git-master/r/1217301 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com> Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Support mclk initialization	Terje Bergstrom	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add ops for calling mclk initialization. JIRA DNVGPU-85 Change-Id: I2e9da80fdb014d916b40513d605c38711818d2f6 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1203975 (cherry picked from commit 9be482c4ece7ffc550ae19f133638c808b3a768f) Reviewed-on: http://git-master/r/1217300 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: get bios perf and clk table ptr	Mahantesh Kumbar	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement support for reading perf and clk tables from VBIOS. JIRA DNVGPU-83 Change-Id: I095fea08479161362e4c2ffa7500ee6a57d6d447 Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-on: http://git-master/r/1202602 (cherry picked from commit fb7c7356f131a198bd655a25fc6ff17067477e1b) Reviewed-on: http://git-master/r/1217299 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Skip calling undefined prod callbacks	Terje Bergstrom	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix rest of code to not call prod callbacks that are set to NULL. Bug 1799537 Change-Id: I756bb1f7ef58ba753ac43a2be6f125107be3cf34 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1209133 (cherry picked from commit 5f4d7b42b6101407fde8c4a7dcdd3633eca85ae5) Reviewed-on: http://git-master/r/1217297 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Allocate vidmem fds from 1024	Terje Bergstrom	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allocate vidmem fds from 1024 onwards. This prevents us from using up the 0-1023 range which is tracked per process, and fits within FD_SETSIZE. Bug 200222681 Change-Id: I104b81f2831f1816ff66fc245fa63013d78001ec Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1199269 (cherry picked from commit 5d5cbaf6a63dd31538fa35081b70e103d8a658f4) Reviewed-on: http://git-master/r/1217294 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit
*	gpu: nvgpu: Do not print error on unknown engine	Terje Bergstrom	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unknown engine is expected, as we do not support all dGPU engines. Remove the error spew. JIRA DNVGPU-26 Change-Id: I6f7897c6ead168f1d8100421d16d0540a7f7b542 Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> Reviewed-on: http://git-master/r/1206449 (cherry picked from commit 4cc610755df94065afd28a90c63aca8fff9685b1) Reviewed-on: http://git-master/r/1217292 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit
*	gpu: nvgpu: vgpu: cyclestat snapshot support	Peter Daifuku	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for cyclestats snapshots in the virtual case Bug 1700143 JIRA EVLR-278 Change-Id: I376a8804d57324f43eb16452d857a3b7bb0ecc90 Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: http://git-master/r/1211547 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: unify nvgpu and pci probe	Deepak Nibade	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have completely different versions of probe for nvgpu and pci device Extract out common steps into nvgpu_probe() function and separate it out in new file nvgpu_common.c Divide task of nvgpu_probe() into further smaller functions Do platform specific things (like irq handling, memresource management, power management) only in individual probes and then call nvgpu_probe() to complete the common initialization Move all debugfs initialization to common gk20a_debug_init() This also helps to bringup all debug nodes to pci device Pass debugfs_symlink name as a parameter to gk20a_debug_init() This allows us to set separate debugfs symlink for nvgpu and pci device In case of railgating, cde and ce debugfs, check if platform supports them or not Copy vidmem_is_vidmem from platform to mm structure and set it to true for pci device Return from gk20a_scale_init() if we don't have either of governor or qos_notifier Fix gk20a_alloc_debugfs_init() and gk20a_secure_page_alloc() to receive device pointer instead of platform_device Export gk20a_railgating_debugfs_init() so that we can call it from gk20a_debug_init() Jira DNVGPU-56 Jira DNVGPU-58 Change-Id: I3cc048082b0a1e57415a9fb8bfb9eec0f0a280cd Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1204207 (cherry picked from commit add6bb0a3d5bd98131bbe6f62d4358d4d722b0fe) Reviewed-on: http://git-master/r/1204462 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: remove blocking wait for vidmem allocation	Deepak Nibade	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have blocking 1sec wait for vidmem allocation Remove this blocking wait and just return proper error code to the caller In case we have some buffers to be cleaned up in the list (clear_list_head), return EAGAIN so that caller can retry Otherwise return ENOMEM indicating that no memory is available right now Jira DNVGPU-84 Change-Id: Ife2b17c989fc80e568f03bb18ad75b93a25be962 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1204969 (cherry picked from commit 2bacdf0bc6d5b1cdcb8be37e574ca5f4f0663cae) Reviewed-on: http://git-master/r/1213451 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: use vidmem for gr ctx if available	Konsta Holtta	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the common gk20a_gmmu_alloc() that tries vidmem too. Jira DNVGPU-24 Change-Id: I5dfd7eaab737a5290b4d21ac575d6b89777a567e Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-on: http://git-master/r/1209077 (cherry picked from commit e3085d37735c8f1cf4845621f29fe9d2689aad4b) Reviewed-on: http://git-master/r/1184330 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Tested-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix non-contiguous pramin access	Deepak Nibade	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pramin_access_batched(), in each iteration of the loop we first decide size of data that we should write in that iteration. In case this size is equal to length of the chunk, we need to move to use next chunk for subsequent iteration But since we change offset variable before we check above, we end up using same chunk in next iteration Fix this by correcting the sequnce to first check if we should move to next chunk and then only adjust the offset variable Jira DNVGPU-24 Change-Id: I58c2e24678f4c6dfbe33bf111edd06788629eca8 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1210892 (cherry picked from commit 83cc179199692d28a93b3b884c9bc094ff513298) Reviewed-on: http://git-master/r/1213450 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: suppress unbind operation	Sri Krishna chowdary	2016-09-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unbind on nvgpu results in kernel panic. Suppress it to avoid kernel panic. Proper fix should follow later on. bug 1779085 Change-Id: Ibc966ac031f7f04406db63310e2f5ea126649ac0 Signed-off-by: Sri Krishna chowdary <schowdary@nvidia.com> Reviewed-on: http://git-master/r/1212759 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
*	gpu: nvgpu: fix compilation errors for 32 bit arch	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Converting return value of sg_dma_address() (which is u64) into a pointer results in compilation failure on 32 bit machines Hence convert address first into uintptr_t and then into pointer Change-Id: I8e036af8f4c936b88883cf8af1491f03025ed356 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1211243 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: enable big page support for pci	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While mapping the buffer, first check if buffer is in vidmem, and if yes convert allocation into base address And then walk through each chunk to decide the alignment Add new API gk20a_mm_get_align() which returns the alignment based on scatterlist and aperture, and use this API to get alignment during mapping Enable big page support for pci by unsetting disable_bigpage Jira DNVGPU-97 Change-Id: I358dc98fac8103fdf9d2bde758e61b363fea9ae9 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1207673 (cherry picked from commit d14d42290eed4aa7a2dd2be25e8e996917a58e82) Reviewed-on: http://git-master/r/1210959 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: make default vidmem page size of 64k	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allocate 64k pages for vidmem by default Also make sure that base address of vidmem is aligned to page size Jira DNVGPU-20 Change-Id: Ie2e5111f942467754db5b45f1518d72c925d3d19 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1206405 (cherry picked from commit 542ebf7f571ba6dc631466e562f7d8e05df4a9a6) Reviewed-on: http://git-master/r/1210958 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: use vidmem for page tables if available	Konsta Holtta	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the common gk20a_gmmu_alloc() that tries vidmem too. Jira DNVGPU-20 Change-Id: I4ea02bc4962d299c6f71444048d4a2a22bd80f55 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1206404 (cherry picked from commit 7297727cce8c5c7b26f82afe98cc5428135b4777) Reviewed-on: http://git-master/r/1178831 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: add new API to get base address for sysmem/vidmem buffers	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add new API gk20a_mem_get_base_addr() which will return vidmem base address in case of vidmem and IOVA address in case of sysmem Even though vidmem allocations are non-contiguous, this API is useful (and should only be used) for allocations with one chunk (e.g. page tables) Also, since page tables could either reside in sysmem or vidmem, use this API to get address of page tables Jira DNVGPU-20 Change-Id: Ie04af9ca7bfccfec1a8a8e4be2c507cef5cef8e1 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1206403 (cherry picked from commit a8c74dc188878f2948fa1e0e47bf1837fba6c5e0) Reviewed-on: http://git-master/r/1210957 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: allocate blob space early	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allocting blob space for pmu might need fixed address allocation in vidmem and during boot up But if some page tables are allocated before blob space, blob space allocation could fail Fix this by allocating blob space early during boot up Jira DNVGPU-20 Change-Id: I30eca1023c8f8f8be101bb7e160ba57a7040911a Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1206402 (cherry picked from commit fad4309ce345ed3879f497bda27f2eceb1084dbb) Reviewed-on: http://git-master/r/1210956 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Add proper timeout handling for vidmem clear operations	Lakshmanan M	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gk20a_fence_wait() api may be interrupted by a signal before actual its timeout elapsed. This CL does retry (-ERESTARTSYS) mechanism if gk20a_fence_wait() return before its timeout elapsed. Bug 200230544 Change-Id: I347ed2004935a8b9413f95dcb6fca2b74bf49f2a Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: http://git-master/r/1206265 (cherry picked from commit d3ef533942487785d84d109f985ae648eb3c2434) Reviewed-on: http://git-master/r/1210955 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: use vidmem for gpfifos if available	Konsta Holtta	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the common gk20a_gmmu_alloc() that tries vidmem too. Jira DNVGPU-21 Change-Id: Ie22cb0f5ed70ec71567fc85d348b3526c9a32b02 Signed-off-by: Konsta Holtta <kholtta@nvidia.com> Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1204304 (cherry picked from commit 07cb99baeb10194c520addd77517841a6f99df93) Reviewed-on: http://git-master/r/1169310 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: support pramin access for non-contiguous vidmem	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	API pramin_access_batched() currenly only supports contiguous allocations. Modify this API to support non-contiguous allocations from page allocator as well Update gk20a_mem_wr32() and gk20a_mem_rd32()to reuse pramin_access_batched() Use gk20a_memset() in gk20a_gmmu_free_attr_vid() to clear vidmem pages for kernel buffers Jira DNVGPU-30 Change-Id: I43630912f4837d8ebc6b9c58f4f427218ef9725b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1204303 (cherry picked from commit 2f84f141d02fd2f641cb18a48896fb3ae5f7e51f) Reviewed-on: http://git-master/r/1210954 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: track allocator and user for each mem	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Store allocator pointer for each mem_desc This pointer should be used while freeing the mem instead of assuming a common allocator Add flag user_mem to mem_desc which will be set only in case of User vidmem allocations We will delay free of mem in worker only if this flag is set on mem. Otherwise, we will free it immediately This is needed so that all kernel allocations can work with both sysmem and vidmem Jira DNVGPU-84 Change-Id: Ib9a9209b164bc56b7880448f86bd6d42b324cc86 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1203099 (cherry picked from commit 8f0b0122f36a0b6f1932fa9a98d7eb03b1f623d1) Reviewed-on: http://git-master/r/1210953 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: fix memory leak in case of failure	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In __gk20a_alloc_pages(), if we fail to allocate a chunk we free previously allocated chunks in error path But we do not free up the memory reserved in those chunks which could lead to OOM situations Fix this by calling gk20a_free() for each chunk in error path Jira DNVGPU-96 Change-Id: I68aa18d68a5282405016e688c790ccbc0c2a0d69 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1203098 (cherry picked from commit f096bd1675600f4e2fc2d686f2911bb945fbbf0b) Reviewed-on: http://git-master/r/1210952 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: disable sync_fence for CE jobs	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We do not need sync_fence for CE jobs submitted in gk20a_ce_execute_ops() since all the waiters of fence are in kernel space only Jira DNVGPU-84 Change-Id: Idad6c40abcefb86e60a5327bbbff6827b1ca33cc Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1201347 (cherry picked from commit e294b2d37cf79182bb9a255adb188eb6afa47c27) Reviewed-on: http://git-master/r/1210951 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: clear vidmem buffers in worker	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We clear buffers allocated in vidmem in buffer free path. But to clear buffers, we need to submit CE jobs and this could cause issues/races if free called from critical path Hence solve this by moving buffer clear/free to a worker gk20a_gmmu_free_attr_vid() will now just put mem_desc into a list and schedule a worker And worker thread will traverse the list and clear/free the allocations In struct gk20a_vidmem_buf, mem variable is statically allocated. But since we delay free of mem, convert this variable into a pointer and allocate it dynamically Since we delay free of vidmem memory, it is now possible to face OOM conditions during allocations. Hence while allocating block until we have sufficient memory available with an upper limit of 1S Jira DNVGPU-84 Change-Id: I7925590644afae50b6fc04c6e1e43bbaa1c220fd Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1201346 (cherry picked from commit b4dec4a30de2431369d677acca00e420f8e581a5) Reviewed-on: http://git-master/r/1210950 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: clear whole vidmem on first allocation	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently clear vidmem pages in gk20a_gmmu_alloc_attr_vid_at() i.e. allocation path for each buffer But since buffer allocation path could be latency critical, clear whole vidmem first and before first User allcation in gk20a_vidmem_buf_alloc() And then clear buffer pages while releasing the buffer In this way, we can ensure that vidmem pages are already cleared during buffer allocation path At a later stage, clearing of pages can be removed from free path and moved to a separate worker as well At this point, first allocation has overhead of clearing whole vidmem which takes about 380mS and this should improve once clocks are raised. Also, this is one time larency, and subsequent allocations should not have any overhead for clearing at all Add API gk20a_vidmem_clear_all() to clear whole vidmem We have WPR buffers allocated during boot up and at fixed address in vidmem. To prevent overwriting to these buffers in gk20a_vidmem_clear_all(), clear whole vidmem except for the bootstrap allocator carveout Add new API gk20a_gmmu_clear_vidmem_mem() to clear one mem_desc Jira DNVGPU-84 Change-Id: I5661700585c6241a6a1ddeb5b7c068d3d2aed4b3 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1194301 (cherry picked from commit 950ab61a04290ea405968d8b0d03e3bd044ce83d) Reviewed-on: http://git-master/r/1193158 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Add a bootstrap vidmem allocator	Alex Waterman	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an allocator for allocating vidmem before the CE has had a chance to be initialized (and clear the rest of vidmem). Jira DNVGPU-84 Change-Id: I5166607a712b3a6eb4c2906b8c7d002c68a6567b Signed-off-by: Alex Waterman <alexw@nvidia.com> Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1197204 (cherry picked from commit b4e68e84eedd952637b2332d8dc73a9090d6d62e) Reviewed-on: http://git-master/r/1210949 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: support GMMU mappings for vidmem page allocator	Deepak Nibade	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Switch to use page allocator for vidmem Support GMMU mappings for page (non-contiguous page allocator) in update_gmmu_ptes_locked() If aperture is VIDMEM, traverse each chunk in an allocation and map it to GPU VA separately Fix CE page clearing to support page allocator Fix gk20a_pramin_enter() to get base address from new allocator Define API gk20a_mem_get_vidmem_addr() to get base address of allocation. Note that this API should not be used if we have more than 1 chunk Jira DNVGPU-96 Change-Id: I725422f3538aeb477ca4220ba57ef8b3c53db703 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: http://git-master/r/1199177 (cherry picked from commit 1afae6ee6529ab88cedd5bcbe458fbdc0d4b1fd8) Reviewed-on: http://git-master/r/1197647 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: send only one event to the debugger	Cory Perry	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Event notifications on TSGs should only be sent to the channel that caused the event to happen in the first place, not evey channel in the tsg. Any more and the debugger will not be able to tell what channel actually got the event. Worse yet, if all the channels in a tsg are bound to the same debug session (as is the case with cuda-gdb), then multiple nvgpu events for the same gpu event will be triggered, causing events to be buffered and the client to get out of sync. One gpu exception, one nvgpu event per tsg. Bug 1793988 Signed-off-by: Cory Perry <cperry@nvidia.com> Change-Id: I4efb83b0593bd1af38f2342c80793d9db56e42b1 Reviewed-on: http://git-master/r/1194203 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Fix error handling of __semaphore_bitmap_alloc()	Alex Waterman	2016-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The return from __semaphore_bitmap_alloc() is an int for which a negative value indicates a failure. That return value was being directly cast to an unsigned int before being checked for a negative error code. This obviously isn't a good idea. Coverity ID 38754 Change-Id: I50c0478e5504988b059e69b929e9c2e465df7cc0 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1210317 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Fix possible overflow in buddy allocator	Alex Waterman	2016-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a possible overflow in the buddy allocator's initialization code. In practice it should never happen that pde size is greater than 32bits but this makes coverity happy. Coverity ID 54964 Change-Id: I886fd962bb3e9e328f7305bdcf69827979a39a21 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1210316 GVS: Gerrit_Virtual_Submit Reviewed-by: Sachit Kadle <skadle@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: gk20a: Use spin_lock for jobs_lock	Bharat Nihalani	2016-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is done to boost performance of the GPU submit time, which is critical for compute use-cases. Bug 200215465 Bug 1804898 Conflicts: drivers/gpu/nvgpu/gk20a/channel_gk20a.c Change-Id: Ic4884ee4eac910b92b84a47fdc1b2e9f26b2f1f0 Signed-off-by: Bharat Nihalani <bnihalani@nvidia.com> Reviewed-on: http://git-master/r/1199860 Reviewed-on: http://git-master/r/1209834 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: fix build error when CONFIG_DEBUG_FS=n	David Pu	2016-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	adding 'ifdef CONFIG_DEBUG_FS' check to fix following compilation error when CONFIG_DEBUG_FS=n(which is used for Android 'production' build): mm_gk20a.c: In function 'gk20a_mm_debugfs_init': mm_gk20a.c:4824:2: error: implicit declaration of function 'debugfs_create_x64' [-Werror=implicit-function-declaration] Bug 1778001 Change-Id: I785288a37b96c391b84925d5971d2691cf80206e Signed-off-by: David Pu <dpu@nvidia.com> Reviewed-on: http://git-master/r/1210393 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Turn the debug macro back to pr_info	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of having the debug prints from the allocators be warnings they should be just regular prints. Bug 1799159 Change-Id: Ic6e3c38fa286c4acd6fcba51dc59158dc2d655fc Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1201372 (cherry picked from commit 107caf4ce68a7c76023ee1e66a98c5570f401059) Reviewed-on: http://git-master/r/1208478 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Clarify comment in allocator code	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One of the flags that is defined for allocators has not yet been imlpemented. This clarifies the comment and explains why the flag has been defined even though it is not yet implemented. Bug 1799159 Change-Id: I1e84439d63ca391941cee8e5362ffd9cc959744b Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1201371 (cherry picked from commit 8e6566b173f17d9c169a9fa0f6104f4bbf608dc1) Reviewed-on: http://git-master/r/1208477 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Add checking in allocator functions	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add checks to make sure function pointers are valid before attempting to call said function. Also, ensure that any allocator created defines the following 3 functions at minimum: alloc() free() fini() Bug 1799159 Change-Id: I4cd3d5746ccb721c723a161c9487564846027572 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1200059 (cherry picked from commit e26557a49d7ca6629ada24f12a3be396b0ae22cd) Reviewed-on: http://git-master/r/1208476 Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Add debugging to the semaphore code	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add GPU debugging to the semaphore code. Bug 1732449 JIRA DNVGPU-12 Change-Id: I98466570cf8d234b49a7f85d88c834648ddaaaee Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1198594 (cherry picked from commit 420809cc31fcdddde32b8e59721676c67b45f592) Reviewed-on: http://git-master/r/1153671 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Add gpu_dbg_map_v message type	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new debug message type: gpu_dbg_map_v. This is used for mapping messages that are not specifically memory map operations. Also cleanup the memory mapping debugging a bit since there was one duplicate print and the memory map print was difficult to parse visually. As a result the message has been modified to put the most important information first in an easily readable format. Bug 1732449 JIRA DNVGPU-12 Change-Id: Ib19c9371ee958009ab5a2d89b9610e699d070ee2 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1198593 (cherry picked from commit 51dba53b06ca171cdb13d1707f2d026b0ce29f07) Reviewed-on: http://git-master/r/1147670 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
*	gpu: nvgpu: Add semaphore debugging info	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add semaphore debugging information to the gk20a channel state debug dump. Bug 1732449 JIRA DNVGPU-12 Change-Id: I7caafd4f6420e1c478be22e236513603c315ce5e Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1198592 (cherry picked from commit 3fa247adf5fdd8c9b16a24fec00903fdc3abc90a) Reviewed-on: http://git-master/r/1133793 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: Implement a vidmem allocator	Alex Waterman	2016-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement an allocator suitable for managing the video memory on dGPUs. It works by allocating chunks from an underlying buddy allocator and collating the chunks together (similar to what an sgt does in the wider Linux kernel). This handles the ability to get large buffers in potentially fragmented memory. The GMMU can then obviously map the physical vidmem into contiguous GVA spaces. Jira DNVGPU-96 Change-Id: Ic1d7800b033a170b77790aa23fad6858443d0e89 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: http://git-master/r/1197203 (cherry picked from commit fa44684a843956ae384fef6d7a79b9cbbd04f73e) Reviewed-on: http://git-master/r/1185231 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
*	gpu: nvgpu: do not flush FECS record on engine reset	Thomas Fleury	2016-08-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Flushing timestamp record method can fail in case FECS is not processing the main method queue. In particular, this occurs in case of ctxsw timeout, where we process fifo sched interrupts from the host, but FECS is still waiting for idle (grWFI). In such scenario, this adds huge delay in fifo recovery procedure (timeout on FECS method). Since flushing the last (incomplete) record from FECS would only be useful in that case (context switch ongoing), remove flush operation on engine reset. Note that an explicit ENGINE_RESET event (with pid) is inserted in user-facing ctxsw buffer on engine reset. Bug 200228310 Change-Id: I885525f8f197f81266b50db161bb511867fc74f4 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: http://git-master/r/1207305 (cherry picked from commit 44391b6204fd648949295f90481b0c424d9a5ddf) Reviewed-on: http://git-master/r/1208414 GVS: Gerrit_Virtual_Submit Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>