| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't use enum dma_attr in the gk20a_gmmu_alloc_attr* functions, but
define nvgpu-internal flags for no kernel mapping, force contiguous, and
read only modes. Store the flags in the allocated struct mem_desc and
only use gk20a_gmmu_free, remove gk20a_gmmu_free_attr. This helps in OS
abstraction. Rename the notion of attr to flags.
Add implicit NVGPU_DMA_NO_KERNEL_MAPPING to all vidmem buffers
allocated via gk20a_gmmu_alloc_vid for consistency.
Fix a bug in gk20a_gmmu_alloc_map_attr that dropped the attr
parameter accidentally.
Bug 1853519
Change-Id: I1ff67dff9fc425457ae445ce4976a780eb4dcc9f
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1321101
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move all programming of FB to fb_*.c files, and remove the inclusion
of FB hardware headers from other files.
TLB invalidate function took previously a pointer to VM, but the new
API takes only a PDB mem_desc, because FB does not need to know about
higher level VM.
GPC MMU is programmed from the same function as FB MMU, so added
dependency to GR hardware header to FB.
GP106 ACR was also triggering a VPR fetch, but that's not applicable
to dGPU, so removed that call.
Change-Id: I4eb69377ac3745da205907626cf60948b7c5392a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1321516
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove unused function gk20a_get_phys_from_iova. At the same time
remove the #include for iommu.h, which was only needed by this
function.
Change-Id: Ia858b0ad5fe7e423d650aa9f82e430f419f2a492
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1319070
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change nvgpu_kalloc() to nvgpu_big_[mz]alloc(). This is necessary
since the natural free function name for this is nvgpu_kfree() but
that conflicts with nvgpu_k[mz]alloc() (implemented in a subsequent
patch).
This API exists becasue not all allocation sizes can be determined
at compile time and in some cases sizes may vary across the system
page size. Thus always using kmalloc() could lead to OOM errors due
to fragmentation. But always using vmalloc() is wastful of memory
for small allocations. This API tries to alleviate those problems.
Bug 1799159
Bug 1823380
Change-Id: I49ec5292ce13bcdecf112afbb4a0cfffeeb5ecfc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1283827
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
API DEFINE_MUTEX() is defined in Linux and might
not be available in other OSs.
Hence remove its usage from nvgpu
Declare and explicitly initialize below mutexes
for both nvgpu and vgpu
g->mm.priv_lock
g->mm.tlb_lock
Jira NVGPU-13
Change-Id: If72885a6da0227a1552303206172f1f2b751471d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1298042
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of using Linux APIs for mutex and spinlocks
directly, use new APIs defined in <nvgpu/lock.h>
Replace Linux specific mutex/spinlock declaration,
init, lock, unlock APIs with new APIs
e.g
struct mutex is replaced by struct nvgpu_mutex and
mutex_lock() is replaced by nvgpu_mutex_acquire()
And also include <nvgpu/lock.h> instead of including
<linux/mutex.h> and <linux/spinlock.h>
Add explicit nvgpu/lock.h includes to below
files to fix complilation failures.
gk20a/platform_gk20a.h
include/nvgpu/allocator.h
Jira NVGPU-13
Change-Id: I81a05d21ecdbd90c2076a9f0aefd0e40b215bd33
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1293187
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the prefix in the semaphore code to 'nvgpu_' since this code
is global to all chips.
Bug 1799159
Change-Id: Ic1f3e13428882019e5d1f547acfe95271cc10da5
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1284628
Reviewed-by: Varun Colbert <vcolbert@nvidia.com>
Tested-by: Varun Colbert <vcolbert@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move semaphore_gk20a.c drivers/gpu/nvgpu/common/ since the semaphore
code is common to all chips.
Move the semaphore_gk20a.h header file to drivers/gpu/nvgpu/include/nvgpu
and rename it to semaphore.h. Also update all places where the header
is inluced to use the new path.
This revealed an odd location for the enum gk20a_mem_rw_flag. This should
be in the mm headers. As a result many places that did not need anything
semaphore related had to include the semaphore header file. Fixing this
oddity allowed the semaphore include to be removed from many C files that
did not need it.
Bug 1799159
Change-Id: Ie017219acf34c4c481747323b9f3ac33e76e064c
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1284627
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simplify ref-counting on VMs: take a ref when a VM is bound to a
channel and drop a ref when a channel is freed.
Previously ref-counts were scattered over the driver. Also the CE
and CDE code would bind channels with custom rolled code. This was
because the gk20a_vm_bind_channel() function took an as_share as
the VM argument (the VM was then inferred from that as_share).
However, it is trivial to abtract that bit out and allow a central
bind channel function that just takes a VM and a channel.
Bug 1846718
Change-Id: I156aab259f6c7a2fa338408c6c4a3a464cd44a0c
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1261886
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow platforms to choose whether or not to have unified GPU
VA spaces. This is useful for the dGPU where having a unified
address space has no problems. On iGPUs testing issues is
getting in the way of enabling this feature.
Bug 1396644
Bug 1729947
Change-Id: I65985f1f9a818f4b06219715cc09619911e4824b
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1265303
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the special VMA that could be used for allocating fixed
addresses. This feature was never used and is not worth maintaining.
Bug 1396644
Bug 1729947
Change-Id: I06f92caa01623535516935acc03ce38dbdb0e318
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1265302
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The basic structure of this patch is to make the small page allocator
and the large page allocator into pointers (where they used to be just
structs). Then assign each of those pointers to the same actual
allocator since the buddy allocator has supported mixed page sizes
since its inception.
For the rest of the driver some changes had to be made in order to
actually support mixed pages in a single address space.
1. Unifying the allocation page size determination
Since the allocation and map operations happen at distinct
times both mapping and allocation of GVA space must agree
on page size. This is because the allocation has to separate
allocations into separate PDEs to avoid the necessity of
supporting mixed PDEs.
To this end a function __get_pte_size() was introduced which
is used both by the balloc code and the core GPU MM code. It
determines page size based only on the length of the mapping/
allocation.
2. Fixed address allocation + page size
Similar to regular mappings/GVA allocations fixed address
mapping page size determination had to be modified. In the
past the address of the mapping determined page size since
the address space split was by address (low addresses were
small pages, high addresses large pages). Since that is no
longer the case the page size field in the reserve memory
ioctl is now honored by the mapping code. When, for instance,
CUDA makes a memory reservation it specifies small or large
pages. When CUDA requests mappings to be made within that
address range the page size is then looked up in the reserved
memory struct.
Fixed address reservations were also modified to now always
allocate at a PDE granularity (64M or 128M depending on
large page size. This prevents non-fixed allocations from
ending up in the same PDE and causing kernel panics or GMMU
faults.
3. The rest...
The rest of the changes are just by products of the above.
Lots of places required minor updates to use a pointer to
the GVA allocator struct instead of the struct itself.
Lastly, this change is not truly complete. More work remains to be
done in order to fully remove the notion that there was such a thing
as separate address spaces for different page sizes. Basically after
this patch what remains is cleanup and proper documentation.
Bug 1396644
Bug 1729947
Change-Id: If51ab396a37ba16c69e434adb47edeef083dce57
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1265300
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove including gk20a.h from pmu_gk20a.h. This causes a fallout
as some #includes were missing.
gr_gp10b.h uses mem_desc, but did not include mm_gk20a.h. Add the
include.
Including mm_gk20a.h in gr_gp10b.h causes recursive include, as
mm_gk20a.h has some gr defines. Move the defines to gr_gk20a.h to
remove the dependency.
gr_ctx_gk20a.h used struct gk20a pointers, but did not forward
declare it. Add a forward declaration.
gr_gk20a.h uses dbg_session_gk20a, but was missing forward
declaration.
gr_gk20a.h did not include nvgpu.h but it uses preemption types from
that header. Add include.
Change-Id: I2168e2303b55e0d187b816bcb26f37c8af1649ba
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1283717
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The soc tegra headers are unified and moved all the content of
linux/tegra-soc.h to the soc/tegra/chip-id.h to have the
single soc header for Tegra.
Change-Id: I281e19dd3eb1538b8dfbea4eb0779fb64d1fcffa
Signed-off-by: Shardar Shariff Md <smohammed@nvidia.com>
Reviewed-on: http://git-master/r/1288365
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the GPU allocators to common/mm/ since the allocators are common
code across all GPUs. Also rename the allocator code to move away from
gk20a_ prefixed structs and functions.
This caused one issue with the nvgpu_alloc() and nvgpu_free() functions.
There was a function for allocating either with kmalloc() or vmalloc()
depending on the size of the allocation. Those have now been renamed to
nvgpu_kalloc() and nvgpu_kfree().
Bug 1799159
Change-Id: Iddda92c013612bcb209847084ec85b8953002fa5
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1274400
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
graphics and compute preemption modes are currently
defined as int
But it is more logical to have them as unsigned int
Also, we treat preemption modes as unsigned almost
everywhere in the code
Fix prints in gk20a_fifo_sched_debugfs_seq_show() to
print U32_MAX with %d which is same as printing -1
Bug 200263471
Change-Id: Iabd0ee3923b76d81620898e90a9b1fc5dd75b530
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1272514
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If channel context has separate context header then
copy required info into context header instead of
main context header.
JIRA GV11B-21
Change-Id: I5e0bdde132fb83956fd6ac473148ad4de498e830
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/1229243
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of hard coding bootstrap region, it should always be set to
the last 256MB of vidmem.
Bug 200244445
Change-Id: I91779d1bf861f4f23a0b646f70b1febbbc4581b5
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: http://git-master/r/1242409
Reviewed-by: David Martinez Nieto <dmartineznie@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-on: http://git-master/r/1267124
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add IOCTL API NVGPU_DBG_GPU_IOCTL_ACCESS_FB_MEMORY
to read/write fb/vidmem memory
Interface will accept dmabuf_fd of the buffer in vidmem,
offset into the buffer to access, temporary buffer
to copy data across API, size of read/write and
command indicating either read or write operation
API will first parse all the inputs, and then call
gk20a_vidbuf_access_memory() to complete fb access
gk20a_vidbuf_access_memory() will then just use
gk20a_mem_rd_n() or gk20a_mem_wr_n() depending
on the command issued
Bug 1804714
Jira DNVGPU-192
Change-Id: Iba3c42410abe12c2884d3b603fa33d27782e4c56
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1255556
(cherry picked from commit 2c49a8a79d93fc526adbf6f808484fa9a3fa2498)
Reviewed-on: http://git-master/r/1260471
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 5f1c2bc27fb9dd66ed046b0590afc365be5011bf.
Added back now that matching RM server has been updated:
In hypervisor mode, all GPU VA allocations must be done by client;
fix this for the allocation of the hwpm ctxt buffer
Bug 200231611
Change-Id: Ie5ce2c2562401b1f00821231d37608e3fc30d4a4
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1252138
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 57821e215756b3df7acc9c0eb5017e39f141d381.
Change-Id: Ic4801115064ccbcd1435298a61871921d056b8ea
Signed-off-by: Sivaram Nair <sivaramn@nvidia.com>
Reviewed-on: http://git-master/r/1247825
Reviewed-by: Rakesh Babu Bodla <rbodla@nvidia.com>
Tested-by: Rakesh Babu Bodla <rbodla@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In hypervisor mode, all GPU VA allocations must be done by client;
fix this for the allocation of the hwpm ctxt buffer
Bug 200231611
Change-Id: I0270b1298308383a969a47d0a859ed53c20594ef
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1240913
(cherry picked from commit 49314d42b13e27dc2f8c1e569a8c3e750173148d)
Reviewed-on: http://git-master/r/1245867
(cherry picked from commit d0b10e84d90d0fd61eca8be0f9e879d9cec71d3e)
Reviewed-on: http://git-master/r/1246700
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the CE cleanup to before the FIFO cleanup. Since the CE closes
a channel during its cleanup the FIFO needs to be initialized since
the FIFO code maintains the vmalloc()'ed channels.
Bug 1816516
Change-Id: Ia7a97059a12a0c2b52368ffe411e597f803e8e6e
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1225613
(cherry picked from commit 707bd2a6d4672c6a7b7a8b2e581ea3a606ed971d)
Reviewed-on: http://git-master/r/1240106
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change is the first of a series of changes to
support the usage of pre-allocated job tracking resources
in the submit path. With this change, we still maintain a
dynamically-allocated joblist, but make the necessary changes
in the channel_sync & fence framework to use in-place
allocations. Specifically, we:
1) Update channel sync framework routines to take in
pre-allocated priv_cmd_entry(s) & gk20a_fence(s) rather
than dynamically allocating themselves
2) Move allocation of priv_cmd_entry(s) & gk20a_fence(s)
to gk20a_submit_prepare_syncs
3) Modify fence framework to have seperate allocation
and init APIs. We expose allocation as a seperate API, so
the client can allocate the object before passing it into
the channel sync framework.
4) Fix clean_up logic in channel sync framework
Bug 1795076
Change-Id: I96db457683cd207fd029c31c45f548f98055e844
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1206725
(cherry picked from commit 9d196fd10db6c2f934c2a53b1fc0500eb4626624)
Reviewed-on: http://git-master/r/1223933
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A wmb() next to each gk20a_mem_wr32() via PRAMIN may be overly careful,
so support not inserting these barriers for performance, in cases where
they are not necessary, where the caller would do an explicit barrier
after a bunch of reads.
Also, move those optional wmb()s to be done at the end of the whole
internally batched write for gk20a_mem_{wr_n,memset} from the per-batch
subloops that may run multiple times.
Jira DNVGPU-23
Change-Id: I61ee65418335863110bca6f036b2e883b048c5c2
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1225149
(cherry picked from commit d2c40327d1995f76e8ab9cb4cd8c76407dabc6de)
Reviewed-on: http://git-master/r/1227474
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add NVGPU_GPU_IOCTL_GET_MEMORY_STATE to read the amount of free
device-local video memory, if applicable.
Some reserved fields are added to support different types of queries in
the future (e.g. context-local free amount).
Bug 1787771
Bug 200233138
Change-Id: Id5ffd02ad4d6ed3a6dc196541938573c27b340ac
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1223762
(cherry picked from commit 96221d96c7972c6387944603e974f7639d6dbe70)
Reviewed-on: http://git-master/r/1235980
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change clears_pending to bytes_pending and track accordingly the number
of bytes to be freed instead of the number of buffers. This, atomically
combined with the amount of space in the allocator, is the total amount
of free memory available.
Bug 200233138
Change-Id: Ibbb4e80a32728781ba19a74307d8a8ac1a4d7431
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1231422
(cherry picked from commit 025e765f312c253b201ecf2dbbe0f4972fe1d4bc)
Reviewed-on: http://git-master/r/1235957
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The boolean flag mm_gk20a.vidmem.cleared is shared across threads, so
mark it volatile to prevent compiler from wrongly optimizing accesses to
it.
Jira DNVGPU-84
Change-Id: I1fe66b26966685d3f74ed95ba53b198f810231b9
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1233016
(cherry picked from commit dc6c9db56ea8a5f55f28f97fdfc3c1ac60d8b195)
Reviewed-on: http://git-master/r/1235317
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lowest page table level may hold very few entries for mappings of
large pages, but a new page is allocated for each list of entries at the
lowest level, wasting memory and performance. Compact these so that the
new "allocation" of ptes is appended at the end of the previous
allocation, if there is space.
4 KB page is still the smallest size requested from the allocator; any
possible overhead in the allocator (e.g., internally allocating big
pages only) is not taken into account.
Bug 1736604
Change-Id: I03fb795cbc06c869fcf5f1b92def89a04583ee83
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1221841
(cherry picked from commit fa92017ed48e1d5f48c1a12c512641c6ce9924af)
Reviewed-on: http://git-master/r/1234996
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Protect the initial vidmem zeroing performed during the first userspace
alloc with a mutex, so that it blocks next concurrent users and is run
only once. Otherwise, multiple clears could end up running in parallel,
so that the next ones corrupt memory allocated by the thread that has
finished earlier and advanced to allocate and use memory.
Jira DNVGPU-84
Change-Id: If497749abf481b230835250191d011c4a9d1483b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1232461
(cherry picked from commit 79435a68e6d2713b78acdb0ec6f77cfd78651d7f)
Reviewed-on: http://git-master/r/1234990
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An empty list of soon-to-be-freed userspace vidmem buffers is not enough
to safely assume that an allocation may succeed or not if tried again,
because removal from the list and actually marking the memory freed is
not atomic. Fix this by using an atomic counter for the number of
pending frees (so that it's still safe to first remove from the job list
and then perform the free), and making allocation attempts combined with
a test of pending frees atomic.
This still does not guarantee that there is memory available (as the
actual amount of pending memory in bytes plus the current free amount
isn't computed), but removes the race that produces false negatives in
case a single program expects repeated frees and allocs to succeed.
Bug 1809939
Change-Id: I6a92da2e21cbf3f886b727000c924d56f35ce55b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1217078
(cherry picked from commit 83c1f1e70dccd92fdd4481132cf5b6717760d432)
Reviewed-on: http://git-master/r/1220545
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add new API gk20a_mem_get_base_addr() which will return vidmem
base address in case of vidmem and IOVA address in case of
sysmem
Even though vidmem allocations are non-contiguous, this API
is useful (and should only be used) for allocations with one
chunk (e.g. page tables)
Also, since page tables could either reside in sysmem or vidmem,
use this API to get address of page tables
Jira DNVGPU-20
Change-Id: Ie04af9ca7bfccfec1a8a8e4be2c507cef5cef8e1
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1206403
(cherry picked from commit a8c74dc188878f2948fa1e0e47bf1837fba6c5e0)
Reviewed-on: http://git-master/r/1210957
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Store allocator pointer for each mem_desc
This pointer should be used while freeing the mem
instead of assuming a common allocator
Add flag user_mem to mem_desc which will be set
only in case of User vidmem allocations
We will delay free of mem in worker only if this
flag is set on mem. Otherwise, we will free it
immediately
This is needed so that all kernel allocations
can work with both sysmem and vidmem
Jira DNVGPU-84
Change-Id: Ib9a9209b164bc56b7880448f86bd6d42b324cc86
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1203099
(cherry picked from commit 8f0b0122f36a0b6f1932fa9a98d7eb03b1f623d1)
Reviewed-on: http://git-master/r/1210953
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We clear buffers allocated in vidmem in buffer free path.
But to clear buffers, we need to submit CE jobs and this
could cause issues/races if free called from critical
path
Hence solve this by moving buffer clear/free to a worker
gk20a_gmmu_free_attr_vid() will now just put mem_desc into
a list and schedule a worker
And worker thread will traverse the list and clear/free
the allocations
In struct gk20a_vidmem_buf, mem variable is statically
allocated. But since we delay free of mem, convert this
variable into a pointer and allocate it dynamically
Since we delay free of vidmem memory, it is now possible
to face OOM conditions during allocations. Hence while
allocating block until we have sufficient memory
available with an upper limit of 1S
Jira DNVGPU-84
Change-Id: I7925590644afae50b6fc04c6e1e43bbaa1c220fd
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1201346
(cherry picked from commit b4dec4a30de2431369d677acca00e420f8e581a5)
Reviewed-on: http://git-master/r/1210950
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently clear vidmem pages in gk20a_gmmu_alloc_attr_vid_at()
i.e. allocation path for each buffer
But since buffer allocation path could be latency critical,
clear whole vidmem first and before first User allcation
in gk20a_vidmem_buf_alloc()
And then clear buffer pages while releasing the buffer
In this way, we can ensure that vidmem pages are already cleared
during buffer allocation path
At a later stage, clearing of pages can be removed from free path
and moved to a separate worker as well
At this point, first allocation has overhead of clearing whole
vidmem which takes about 380mS and this should improve once
clocks are raised.
Also, this is one time larency, and subsequent allocations
should not have any overhead for clearing at all
Add API gk20a_vidmem_clear_all() to clear whole vidmem
We have WPR buffers allocated during boot up and
at fixed address in vidmem.
To prevent overwriting to these buffers in gk20a_vidmem_clear_all(),
clear whole vidmem except for the bootstrap allocator carveout
Add new API gk20a_gmmu_clear_vidmem_mem() to clear one mem_desc
Jira DNVGPU-84
Change-Id: I5661700585c6241a6a1ddeb5b7c068d3d2aed4b3
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1194301
(cherry picked from commit 950ab61a04290ea405968d8b0d03e3bd044ce83d)
Reviewed-on: http://git-master/r/1193158
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an allocator for allocating vidmem before the CE has had a
chance to be initialized (and clear the rest of vidmem).
Jira DNVGPU-84
Change-Id: I5166607a712b3a6eb4c2906b8c7d002c68a6567b
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1197204
(cherry picked from commit b4e68e84eedd952637b2332d8dc73a9090d6d62e)
Reviewed-on: http://git-master/r/1210949
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Include the buffer aperture flag (sysmem/vidmem/invalid) and the size of
the buffer and of the mapping in logging strings during gmmu map path.
Change-Id: Ie4c46bf9cb5db79b738571029d46ce8cbfc63f99
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1189492
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add NVGPU_GPU_IOCTL_ALLOC_VIDMEM to the ctrl fd for letting userspace
allocate on-board GPU memory (aka vidmem). The allocations are returned
as dmabuf fds.
Also, report the amount of local video memory in the gpu
characteristics.
Jira DNVGPU-19
Jira DNVGPU-38
Change-Id: I28e361d31bb630b96d06bb1c86d022d91c7592bc
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1181152
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the nvgpu-internal buddy allocator for video memory allocations,
instead of nvmap. This allows better integration for copyengine, BAR1
mapping to userspace, etc.
Jira DNVGPU-38
Change-Id: I9fd67b76cd39721e4cd8e525ad0ed76f497e8b99
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1181151
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added interface to allow kernel to create privileged CE channels for
page migration and clearing support between sysmem and videmem.
JIRA DNVGPU-53
Change-Id: I3e18d18403809c9e64fa45d40b6c4e3844992506
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1173085
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support multiple types of allocation backends. Currently there is
only one allocator implementation available: a buddy allocator.
Buddy allocators have certain limitations though. For one the
allocator requires metadata to be allocated from the kernel's
system memory. This causes a given buddy allocation to potentially
sleep on a kmalloc() call.
This patch has been created so that a new backend can be created
which will avoid any dynamic system memory management routines
from being called.
Bug 1781897
Change-Id: I98d6c8402c049942f13fee69c6901a166f177f65
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1172115
GVS: Gerrit_Virtual_Submit
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If page tables are allocated from vidmem, cpu cache flushing doesn't
make sense, so skip it. Unify also map/unmap actions if the pages are
not mapped.
Jira DNVGPU-20
Change-Id: I36b22749aab99a7bae26c869075f8073eab0f860
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1178830
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For devices that have vidmem available, use the vidmem allocator in
gk20a_gmmu_alloc{,attr,_map,_map_attr}. For others, use sysmem.
Because all of the buffers haven't been tested to work in vidmem yet,
rename calls to gk20a_gmmu_alloc{,attr,_map,_map_attr} to have _sys at
the end to declare explicitly that vidmem is used. Enabling vidmem for
each now is a matter of removing "_sys" from the function call.
Jira DNVGPU-18
Change-Id: Ibe42f67eff2c2b68c36582e978ace419dc815dc5
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1176805
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Propagate the buffer aperture flag in gk20a_locked_gmmu_map up so that
buffers represented as a mem_desc and present in vidmem can be mapped to
gpu.
JIRA DNVGPU-18
JIRA DNVGPU-76
Change-Id: I46cf87e27229123016727339b9349d5e2c835b3e
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1169308
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Modify page table updates to take an aperture flag (up until
gk20a_locked_gmmu_map()), don't hard-assume sysmem and propagate it to
hardware.
Jira DNVGPU-76
Change-Id: Ifcb22900c96db993068edd110e09368f72b06f69
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1169307
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add gk20a_aperture_mask() for memory target selection now that buffers
can actually be allocated from vidmem, and use it in all cases that have
a mem_desc available.
Jira DNVGPU-76
Change-Id: I4353cdc6e1e79488f0875581cfaf2a5cfb8c976a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1169306
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Revamp the support the nvgpu driver has for semaphores.
The original problem with nvgpu's semaphore support is that it
required a SW based wait for every semaphore release. This was
because for every fence that gk20a_channel_semaphore_wait_fd()
waited on a new semaphore was created. This semaphore would then
get released by SW when the fence signaled. This meant that for
every release there was necessarily a sync_fence_wait_async() call
which could block. The latency of this SW wait was enough to cause
massive degredation in performance.
To fix this a fast path was implemented. When a fence is passed to
gk20a_channel_semaphore_wait_fd() that is backed by a GPU semaphore
a semaphore acquire is directly used to block the GPU. No longer is
a sync_fence_wait_async() performed nor is there an extra semaphore
created.
To implement this fast path the semaphore memory had to be shared
between channels. Previously since a new semaphore was created
every time through gk20a_channel_semaphore_wait_fd() what address
space a semaphore was mapped into was irrelevant. However, when
using the fast path a sempahore may be released on one address
space but acquired in another.
Sharing the semaphore memory was done by making a fixed GPU mapping
in all channels. This mapping points to the semaphore memory (the
so called semaphore sea). This global fixed mapping is read-only to
make sure no semaphores can be incremented (i.e released) by a
malicious channel. Each channel then gets a RW mapping of it's own
semaphore. This way a channel may only acquire other channel's
semaphores but may both acquire and release its own semaphore.
The gk20a fence code was updated to allow introspection of the GPU
backed fences. This allows detection of when the fast path can be
taken. If the fast path cannot be used (for example when a fence is
sync-pt backed) the original slow path is still present. This gets
used when the GPU needs to wait on an event from something which
only understands how to use sync-pts.
Bug 1732449
JIRA DNVGPU-12
Change-Id: Ic0fea74994da5819a771deac726bb0d47a33c2de
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1133792
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use gk20a_mem_*() accessors for gpfifo memory in work submission instead
of direct cpu accesses in order to support other apertures than sysmem.
The gpfifo memory is still allocated from sysmem for dgpus too.
Split the copying of priv_cmds and the main gpfifo to be submitted in
gk20a_submit_channel_gpfifo() into separate functions.
JIRA DNVGPU-21
Change-Id: If271ca8e7e34235f00d31855dbccf77c0008e10b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1145923
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add in-nvgpu APIs for allocating and freeing mem_descs in video memory.
Changes for gmmu tables etc. will be added in upcoming changes.
Video memory is allocated via nvmap by initially registering the
aperture size to it and binding it to a struct device, and then going
via the usual dma alloc. This API allows also fixed-address allocations,
meant for reserving special memory areas at boot.
The aperture registration is skipped completely if vidmem isn't found
for the particular device.
gk20a_gmmu_alloc_attr() still uses sysmem, and the unmap/free paths
select internally the correct path by the mem_desc's aperture.
Video memory allocation is off by default, and can be turned on with
CONFIG_GK20A_VIDMEM.
JIRA DNVGPU-16
Change-Id: I77eae5ea90cbed6f4b5db0da86c5f70ddf2a34f9
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1157216
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of going via gk20a_mem_{wr,rd}32() on each iteration, do direct
memcpy/memset with sysmem, and minimize the enter/exit overhead with
vidmem.
JIRA DNVGPU-23
Change-Id: I5437e35f8393a746777a40636c1e9b5d93ced1f6
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1159524
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|