| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update last IOCTL number to point to GET_TIMEOUT to allow its use.
Bug 1706457
Change-Id: I9c8cc4e972fc25e14ca5aff075eca72bc1807a0b
Signed-off-by: Chris Dragan <kdragan@nvidia.com>
(cherry-picked from commit f73444b0f92737919dafd1623a2dde60c467b25b)
Reviewed-on: http://git-master/r/921453
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Bug 1706457
Change-Id: Iab76bcb7cabc55d99b5acd932716d30da6f01b46
Signed-off-by: Chris Dragan <kdragan@nvidia.com>
Reviewed-on: http://git-master/r/835852
Reviewed-on: http://git-master/r/836454
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
SM locking & register reads Order has been changed.
Also, functions have been implemented based on gk20a
and gm20b.
Change-Id: Iaf720d088130f84c4b2ca318d9860194c07966e1
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Signed-off-by: ashutosh jain <ashutoshj@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/837236
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIRA VFND-1005
Bug 1594604
Change-Id: Ic159a1aff9cee508194f1f5dff7a16eb0e47ad64
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/833498
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add IOCTL NVGPU_IOCTL_CHANNEL_WDT to disable/enable
watchdog per-channel
Also, if watchdog is disabled, we currently schedule
the worker with MAX timeout.
Instead of this, do not schedule any worker if
watchdog is disabled
Bug 1683059
Bug 1700277
Change-Id: I7f6bec84adeedb74e014ed6d1471317b854df84c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/837962
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In job submission path, we always take refcount on all
the mapped buffers to safeguard against case where user
space releases the buffer early
But in case user space itself is doing proper buffer
management, kernel need not take refcounts on all the
buffers - which is also a overhead in submit path
Hence, provide a new submit flag
NVGPU_SUBMIT_GPFIFO_FLAGS_SKIP_BUFFER_REFCOUNTING to
optionally skip taking refcounts on all the buffers
Also, if we do not take refcounts, then no need to drop
any refcounts in gk20a_channel_update() as well
Bug 1698667
Bug 200141116
Change-Id: I81bb7a03240300b691c70bcec04ea1badd5934f4
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/824718
(cherry picked from commit 8c8978fa303ec4e6db0233becdbdcbad4a248173)
Reviewed-on: http://git-master/r/835801
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement NVGPU_GPU_IOCTL_ALLOC_AS_FLAGS_USERSPACE_MANAGED, which
enables creating userspace-managed GPU address spaces.
When an address space is marked as userspace-managed, the following
changes are in effect:
- Only fixed-address mappings are allowed.
- VA space allocation for fixed-address mappings is not required,
except to mark space as sparse.
- Maps and unmaps are always immediate. In particular, the mapping
ref increments at kickoffs and decrements at job completion are
skipped.
Bug 1614735
Bug 1623949
Bug 1660392
Change-Id: I834fe19b3f65e9b02c268952383eddee0e465759
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/738558
Reviewed-on: http://git-master/r/833253
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement NVGPU_GPU_IOCTL_GET_BUFFER_INFO. The new IOCTL can be used
to identify buffers and retrieve their sizes. This allows the
userspace to be agnostic to the dmabuf implementation, as the generic
dmabuf fd interface does not have a reliable way for buffer
identification.
Bug 1614735
Bug 1623949
Bug 1660392
Change-Id: Ic3dd0a9385c9852778110ccb80636dd6f4f36208
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/822845
Reviewed-on: http://git-master/r/833252
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add new IOCTL NVGPU_IOCTL_TSG_SET_PRIORITY to allow
setting timeslice for entire TSG
Return error from channel specific IOCTL_CHANNEL_SET_PRIORITY
if the channel is part of TSG
Separate out API gk20a_channel_get_timescale_from_timeslice()
to get timeslice_timeout and scale from timeslice period
Use this API to get timeslice_timeout and scale for TSG and
store it in tsg_gk20a structure
Then trigger runlist update so that new timeslice values
will be re-written to runlist for TSG
Bug 200146615
Change-Id: I555467d034f81b372b31372f0835d72b1c159508
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/824206
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add IOCTL NVGPU_DBG_GPU_IOCTL_TIMEOUT to support
disabling/re-enabling scheduler timeout from user space
If user space application is closed without re-enabling
the timeouts, kernel will restore the timeouts' state
while releasing the debug session
This is needed for debugging purpose
Bug 1514061
Change-Id: I32efb47ad09d793f3e7fd8f0aaa9720c8bc91272
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/788176
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Get rid of the duplicate gpfifo struct to emphasize the fact that
nvgpu_gpfifo is the only memory layout for gpfifo entries that works.
This is the same layout that HW uses.
Also, add a local pointer to the gpfifo memory in
gk20a_submit_channel_gpfifo to get rid of repeated typecasts.
Bug 1592391
Bug 1550886
Change-Id: I5432859ef8e7c1aab5907e44098994d7bb807f50
Signed-off-by: Janne Hellsten <jhellsten@nvidia.com>
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/677341
(cherry picked from commit 724c8c6228af81dd440e825bddf545dd6b2b8bd7)
Reviewed-on: http://git-master/r/822548
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the gr ctx management to the GPU HAL. Also,
add support for a new interface to allocate gr ctxsw
buffers.
Bug 1677153
Change-Id: I5a7980acf4de0de7dbd94b7dd20f91a6196dc989
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/806961
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: http://git-master/r/817009
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The server now exposes a new GMMU map interface that
can accept a scatter-gather list. This is needed to support
SMMU-bypass configurations.
Bug 1677153
JIRA VFND-689
Change-Id: I7b5af145db57dcebe2c9125ec90c689798d7e69e
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/792558
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add vgpu framework and build for T18x.
Bug 1677153
JIRA VFND-693
Change-Id: Icf9fd8e0b5769228aee59c54f9b000b992e5fcca
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/792559
Reviewed-on: http://git-master/r/806178
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for CDE scatter buffers. When the bus addresses for
surfaces are not contiguous as seen by the GPU (e.g., when SMMU is
bypassed), CDE swizzling needs additional per-page information. This
information is populated in a scatter buffer when required.
Bug 1604102
Change-Id: I3384e2cfb5d5f628ed0f21375bdac8e36b77ae4f
Signed-off-by: Jussi Rasanen <jrasanen@nvidia.com>
Reviewed-on: http://git-master/r/789436
Reviewed-on: http://git-master/r/791243
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add hal initializaiton
- create folders vgpu/gk20a and vgpu/gm20b for specific code
Bug 1653185
Change-Id: If94d45e22a1d73d2e4916673736cc29751be4e40
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/774148
GVS: Gerrit_Virtual_Submit
Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-by: Ken Adams <kadams@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add batch support for mapping and unmapping. Batching essentially
helps transform some per-map/unmap overhead to per-batch overhead,
namely gk20a_busy()/gk20a_idle() calls, GPU L2 flushes, and GPU TLB
invalidates. Batching with size 64 has been measured to yield >20x
speed-up in low-level fixed-address mapping microbenchmarks.
Bug 1614735
Bug 1623949
Change-Id: Ie22b9caea5a7c3fc68a968d1b7f8488dfce72085
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/733231
(cherry picked from commit de4a7cfb93e8228a4a0c6a2815755a8df4531c91)
Reviewed-on: http://git-master/r/763812
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this ioctl can be called only by a ctrlfd
created from the /dev/nvhost-ctrl-gpu node
therefore NVGPU_GPU_IOCTL_MAGIC, not
NVGPU_DBG_GPU_IOCTL_MAGIC should be used
for this
bug 200111987
Change-Id: I9fce7eae9f8203a15270ac1d25b575aebd9ccf88
Signed-off-by: Sam Payne <spayne@nvidia.com>
Reviewed-on: http://git-master/r/755164
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
(cherry picked from commit 609d45ddd98c31ecd089d2e213ee1b6c560fc21e)
Reviewed-on: http://git-master/r/760830
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Client notification support is now added for the following:
- stalling and non-stalling GR sema release
- non-stalling FIFO channel intr
- non-stalling CE2 nonblockpipe intr
Bug 200097077
Change-Id: Icd3c076d7880e1c9ef1fcc0fc58eed9f23f39277
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/736064
(cherry picked from commit 0585d1f14d5a5ae1ccde8ccb7b7daa5593b3d1bc)
Reviewed-on: http://git-master/r/759824
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
| |
Bug 1587825
Change-Id: I66f2988b7f1884b53bb8f3cd09ad1ead1652ffda
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/751484
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add reference counting for channels, and wait for reference count to
get to 0 in gk20a_channel_free() before actually freeing the channel.
Also, change free channel tracking a bit by employing a list of free
channels, which simplifies the procedure of finding available channels
with reference counting.
Each use of a channel must have a reference taken before use or held
by the caller. Taking a reference of a wild channel pointer may fail, if
the channel is either not opened or in a process of being closed. Also,
add safeguards for protecting accidental use of closed channels,
specifically, by setting ch->g = NULL in channel free. This will make it
obvious if freed channel is attempted to be used.
The last user of a channel might be the deferred interrupt handler,
so wait for deferred interrupts to be processed twice in the channel
free procedure: once for providing last notifications to the channel
and once to make sure there are no stale pointers left after referencing
to the channel has been denied.
Finally, fix some races in channel and TSG force reset IOCTL path,
by pausing the channel scheduler in gk20a_fifo_recover_ch() and
gk20a_fifo_recover_tsg(), while the affected engines have been identified,
the appropriate MMU faults triggered, and the MMU faults handled. In this
case, make sure that the MMU fault does not attempt to query the hardware
about the failing channel or TSG ids. This should make channel recovery
more safe also in the regular (i.e., not in the interrupt handler) context.
Bug 1530226
Bug 1597493
Bug 1625901
Bug 200076344
Bug 200071810
Change-Id: Ib274876908e18219c64ea41e50ca443df81d957b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/448463
(cherry picked from commit 3f03aeae64ef2af4829e06f5f63062e8ebd21353)
Reviewed-on: http://git-master/r/755147
Reviewed-by: Automatic_Commit_Validation_User
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For both adding and querying zbc entry, added callbacks in gr ops.
Native gpu driver (gk20a) and vgpu will both hook there. For vgpu, it
will add or query zbc entry from RM server.
Bug 1558561
Change-Id: If8a4850ecfbff41d8592664f5f93ad8c25f6fbce
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/732775
(cherry picked from commit a3787cf971128904c2712338087685b02673065d)
Reviewed-on: http://git-master/r/737880
(cherry picked from commit fca2a0457c968656dc29455608f35acab094d816)
Reviewed-on: http://git-master/r/753278
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
That is a kernel supporting code for cyclestats mode E.
Cyclestats mode E implemented following Windows-design in user-space
and required the following operations to be implemented:
- attach a client for shared hardware buffer of device
- detach client from shared hardware buffer
- flush means copy of available data from hardware buffer to private
client buffers according to perfmon IDs assigned for clients
- perfmon IDs management for user-space clients
- a NVGPU_GPU_FLAGS_SUPPORT_CYCLE_STATS_SNAPSHOT capability added
Bug 1573150
Change-Id: I9e09f0fbb2be5a95c47e6d80a2e23fa839b46f9a
Signed-off-by: Leonid Moiseichuk <lmoiseichuk@nvidia.com>
Reviewed-on: http://git-master/r/740653
(cherry picked from commit 79fe89fd4cea39d8ab9dbef0558cd806ddfda87f)
Reviewed-on: http://git-master/r/753274
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement NVGPU_AS_IOCTL_GET_BUFFER_COMPBITS_INFO for requesting info
on compbits-mappable buffers; and NVGPU_AS_IOCTL_MAP_BUFFER_COMPBITS,
which enables mapping compbits to the GPU address space of said
buffers. This, subsequently, enables moving comptag swizzling from GPU
to CDEH/CDEV formats to userspace.
Compbits mapping is conservative and it may map more than what is
strictly needed. This is because two reasons: 1) mapping must be done
on small page alignment (4kB), and 2) GPU comptags are swizzled all
around the aggregate cache line, which means that the whole cache line
must be visible even if only some comptag lines are required from
it. Cache line size is not necessarily a multiple of the small page
size.
Bug 200077571
Change-Id: I5ae88fe6b616e5ea37d3bff0dff46c07e9c9267e
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/719710
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement NVGPU_AS_MAP_BUFFER_FLAGS_MAPPABLE_COMPBITS, which adds
extra alignment to compbits allocation for safe compbits mapping.
Bug 200077571
Change-Id: I3a74ebb81412e4e1e69501debeb9ef4e2056ef1a
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/730763
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/740693
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for reading num FBPs and FBP enable mask.
Bug 1621056
Change-Id: I92ec1123373308ed280d4ffd30fe77ae6073ac45
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/715826
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
New members are added in nvgpu_gpu_characterstics to export more
information required specially from CUDA tools.
Change-Id: I907f3bcbd272405a13f47ef6236bc2cff01c6c80
Signed-off-by: Sujeet Baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/679202
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current CUDA drivers have been using the regops to
directly accessing the GPU registers from user space through
the dbg node. This is a security hole and needs to be avoided.
The patch alternatively implements the similar functionality
in the kernel and provide an ioctl for it.
Bug 200083334
Change-Id: Ic5ff5a215cbabe7a46837bc4e15efcceb0df0367
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/711758
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for unmapped ptes during gmmu map.
Bug 1587825
Change-Id: I6e42ef58bae70ce29e5b82852f77057855ca9971
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/696507
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use busy looping for l2 tag flush and elpg flush
operations. This is making total flash time more
accurate and reduced overall time compared with
usleep. Also added trace points to measure
performance for these operations.
Also corrected timeout error check for non-silicon
platforms.
Bug 200081799
Change-Id: I63410bb7528db9258501633996fbdee5fdec1c74
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/710472
(cherry picked from commit 18684cf9d5d6870a1a1fd5711c4fc2d733caad20)
Reviewed-on: http://git-master/r/710986
GVS: Gerrit_Virtual_Submit
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow enabling of PC sampling hardware workaround. It is only
applicable to gm20b.
Bug 1517458
Bug 1573150
Change-Id: Iad6a3ae556489fb7ab9628637d291849d2cd98ea
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/710421
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Adds a new host1x pushbuffer queue
- Includes support for virtualized VIC
Bug 1509609
Change-Id: Ie9b604dc8d8e43166dedb13953c4edac813da18b
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/655506
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CUDA devtools need to be able to flush the GPU's cache
in a sideband fashion and so cannot use methods. This
change implements an nvgpu_gpu_ioctl to flush and
optionally invalidate the GPU's L2 cache and flush fb.
Change-Id: Ib06a0bc8d8880ffbfe4b056518cc3c3df0cc4988
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Signed-off-by: Mayank Kaushik <mkaushik@nvidia.com>
Reviewed-on: http://git-master/r/671809
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Map/unmap buffers for HWPM and deal with its instance block, the minimum
work required to run the HWPM via regops from userspace.
Bug 1517458
Bug 1573150
Change-Id: If14086a88b54bf434843d7c2fee8a9113023a3b0
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/673689
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the ioctl to open a new gpu channel to also the control node for
improved process startup performance, in addition to the current open
ioctl in the channel node. The new channel fd creation is refactored to
a separate function which is called from both ctrl and channel ioctls.
Bug 1604952
Change-Id: I3357ceec694c0e6d7a85807183884324cb725d3a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/679516
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use busy looping on L2 and TLB maintenance operations. This speeds
them up by an order of magnitude.
Add also trace points to measure performance for memory ops and
interrupt processing.
Change-Id: Ic4a8525d3d946b2b8f57b4b8ddcfc61605619399
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/681640
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handle the gr and fifo exceptions delivered from the server
and update the channel state as needed.
Bug 1551865
Change-Id: Ie19626c6e8a72f92ffd134983fe6d84e5c6c8736
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/670329
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a couple of trace points for tracking when we need to wait for
space in the gpfifo ring buffer. This wait can introduce significant
latencies to rendering with large gpfifo entry inputs so it's good to
be able to measure how often this path is taken.
Bug 1592391
Bug 1550886
Change-Id: I7f362e9c307eeffeeecaaba268ef2e3613e54597
Signed-off-by: Janne Hellsten <jhellsten@nvidia.com>
Reviewed-on: http://git-master/r/674021
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
For compute channel on gk20a, set lockboost size to zero.
Bug 1573856
Change-Id: I369cebf72241e4017e7d380c82caff6014e42984
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/594843
GVS: Gerrit_Virtual_Submit
|
|
|
|
|
|
|
|
|
|
|
|
| |
Trace cde context allocation and deallocation with ftrace.
Bug 200052943
Change-Id: Ieeb625166662971fb3eb3fb29c986fdb6809c10b
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/602886
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
| |
Bug 200052943
Change-Id: Ied6454bbfb5df9ab29497ecbf2aac495f6d89362
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/602887
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gops->gr.detect_sm_arch was not populated for vgpu. Also,
populate some members of the PMU VM struct as they are used
to report GPU characteristics to userspace.
Bug 1576949
Change-Id: I9ddc361d1418b942da97a82b553aac81f5f51182
Signed-off-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-on: http://git-master/r/601931
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement NVGPU_AS_IOCTL_GET_VA_REGIONS which returns a list of GPU VA
regions for different page sizes. This is required for the userspace
for safe fixed-address address space allocation.
Bug 1551752
Change-Id: I63ddde30935db2471bec498dae0caa870e89c1a5
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/590814
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the following info into GPU characteristics: available big page
sizes, support indicators for sync fence fds and cycle stats, gpc
mask, SM version, SM SPA version and warp count, and IOCTL interface
levels. Also, add new IOCTL to fetch TPC masks.
Bug 1551769
Bug 1558186
Change-Id: I8a47d882645f29c7bf0c8f74334ebf47240e41de
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/562904
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Some kernel APIs rely on user space knowing class numbers. Allow
querying the numbers from kernel.
Bug 1567274
Change-Id: Idec2fe8ee983ee74bcbf9dfc98f71bbcc1492cfb
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/594402
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Kernel support for allowing a GPU debugger to suspend and resume
SMs. Invocation of "suspend" on a given channel will suspend all
SMs if the channel is resident, else remove the channel form the
runlist. Similarly, "resume" will either resume all SMs if the
channel was resident, or re-enable the channel in the runlist.
Change-Id: I3b4ae21dc1b91c1059c828ec6db8125f8a0ce194
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Signed-off-by: Mayank Kaushik <mkaushik@nvidia.com>
Reviewed-on: http://git-master/r/552115
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
| |
nvgpu framework and build for T18x
Bug 1567274
Change-Id: I77835302a1110573008869d1106eface512bb9b1
Signed-off-by: Ken Adams <kadams@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
| |
Add ioctl to nvhost-ctrl to create a new TSG.
Bug 200042993
Change-Id: Icdd0edb1d9e374740ace6da9eb3a10c57c62617a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement support for 64kB large page size. Add an API to create an
address space via IOCTL so that we can accept flags, and assign one
flag for enabling 64kB large page size.
Also adds APIs to set per-context large page size. This is possible
only on Maxwell, so return error if caller tries to set large page
size on Kepler.
Default large page size is still 128kB.
Change-Id: I20b51c8f6d4a984acae8411ace3de9000c78e82f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Correct some old comments and remove uses of the BIT macro to make it
easier to sync this file to userspace.
Change-Id: Ie897fc73e28b8194e0c5357eef7ae233395e9ba3
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/552916
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Lauri Peltonen <lpeltonen@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
|