aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/intel_ringbuffer.h
Commit message (Collapse)AuthorAge
* drm/i915: Snapshot seqno of most recently submitted request.Tomas Elf2015-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hang checker needs to inspect whether or not the ring request list is empty as well as if the given engine has reached or passed the most recently submitted request. The problem with this is that the hang checker cannot grab the struct_mutex, which is required in order to safely inspect requests since requests might be deallocated during inspection. In the past we've had kernel panics due to this very unsynchronized access in the hang checker. One solution to this problem is to not inspect the requests directly since we're only interested in the seqno of the most recently submitted request - not the request itself. Instead the seqno of the most recently submitted request is stored separately, which the hang checker then inspects, circumventing the issue of synchronization from the hang checker entirely. This fixes a regression introduced in commit 44cdd6d219bc64f6810b8ed0023a4d4db9e0fe68 Author: John Harrison <John.C.Harrison@Intel.com> Date: Mon Nov 24 18:49:40 2014 +0000 drm/i915: Convert 'ring_idle()' to use requests not seqnos v2 (Chris Wilson): - Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather than compute it over again. - Remove extra whitespace. Issue: VIZ-5998 Signed-off-by: Tomas Elf <tomas.elf@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Add regressing commit citation provided by Chris.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Extend the parser to check register writes against a mask/value pair.Francisco Jerez2015-06-15
| | | | | | | | | | | | | | | | | | In some cases it might be unnecessary or dangerous to give userspace the right to write arbitrary values to some register, even though it might be desirable to give it control of some of its bits. This patch extends the register whitelist entries to contain a mask/value pair in addition to the register offset. For registers with non-zero mask, any LRM writes and LRI writes where the bits of the immediate given by the mask don't match the specified value will be rejected. This will be used in my next patch to grant userspace partial write access to some sensitive registers. Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
* drm/i915: Split the batch pool by engineChris Wilson2015-04-10
| | | | | | | | | | | | | | | | | | | | I woke up one morning and found 50k objects sitting in the batch pool and every search seemed to iterate the entire list... Painting the screen in oils would provide a more fluid display. One issue with the current design is that we only check for retirements on the current ring when preparing to submit a new batch. This means that we can have thousands of "active" batches on another ring that we have to walk over. The simplest way to avoid that is to split the pools per ring and then our LRU execution ordering will also ensure that the inactive buffers remain at the front. v2: execlists still requires duplicate code. v3: execlists requires more duplicate code Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Move common request allocation code into a common functionJohn Harrison2015-04-01
| | | | | | | | | | | | | | | The request allocation code is largely duplicated between legacy mode and execlist mode. The actual difference between the two versions of the code is pretty minimal. This patch moves the common code out into a separate function. This is then called by the execution specific version prior to setting up the one different value. For: VIZ-5190 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: add frontbuffer tracking to FBCPaulo Zanoni2015-03-17
| | | | | | | | | | | | | | | | | | Kill the blt/render tracking we currently have and use the frontbuffer tracking infrastructure. Don't enable things by default yet. v2: (Rodrigo) Fix small conflict on rebase and typo at subject. v3: (Paulo) Rebase on RENDER_CS change. v4: (Paulo) Rebase. v5: (Paulo) Simplify: flushes don't have origin (Daniel). Also rebase due to patch order changes. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Rename 'flags' to 'dispatch_flags' for better code readingJohn Harrison2015-02-25
| | | | | | | | | | | | | | | | | | There is a flags word that is passed through the execbuffer code path all the way from initial decoding of the user parameters down to the very final dispatch buffer call. It is simply called 'flags'. Unfortuantely, there are many other flags words floating around in the same blocks of code. Even more once the GPU scheduler arrives. This patch makes it more obvious exactly which flags word is which by renaming 'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word already have an 'I915_DISPATCH_' prefix on them and so are not quite so ambiguous. OTC-Jira: VIZ-1587 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> [danvet: Resolve conflict with Chris' rework of the bb parsing.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Shift driver's HWSP usage out of reserved rangeThomas Daniel2015-02-24
| | | | | | | | | | | | | | | | As of Gen6, the general purpose area of the hardware status page has shrunk and now begins at dword 0x30. i915 driver uses dword 0x20 to store the seqno which is now reserved. So shift our HWSP dwords up into the general purpose range before this bites us. Note that all available documentation just says this is reserved without going into details about what it's used for. Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [danvet: Add clarification from Thomas that unfortunately Bspec is silent on what "reserverd" precisely means.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Make intel_ring_setup_status_page() staticDamien Lespiau2015-02-13
| | | | | | | | | | This function is only used in intel_ringbuffer.c, so restrict it to that file. The function was moved around to avoid a forward declaration and group it with its user. Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Squash in fixup from Wu Fengguang.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Remove FIXME_lrc_ctx backpointerNick Hoath2015-01-27
| | | | | | | | | | | | | | | The first pass implementation of execlists required a backpointer to the context to be held in the intel_ringbuffer. However the context pointer is available higher in the call stack. Remove the backpointer from the ring buffer structure and instead pass it down through the call stack. v2: Integrate this changeset with the removal of duplicate request/execlist queue item members. v3: Rebase v4: Rebase. Remove passing of context when the request is passed. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Removed duplicate members from submit_requestNick Hoath2015-01-27
| | | | | | | | | | | | | | | | Where there were duplicate variables for the tail, context and ring (engine) in the gem request and the execlist queue item, use the one from the request and remove the duplicate from the execlist queue item. Issue: VIZ-4274 v1: Rebase v2: Fixed build issues. Keep separate postfix & tail pointers as these are used in different ways. Reinserted missing full tail pointer update. Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Reviewed-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: s/init()/init_hw()/ in intel_engine_csDaniel Vetter2014-12-03
| | | | | | | | | | | | This is (mostly, some exceptions that need fixing) the hw setup function which starts the ring. And not the function which allocates all the resources. Make this clear by giving it a better name. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Reviewed-by: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Consolidate ring freespace calculationsDave Gordon2014-12-03
| | | | | | | | | | | | | | | | | | | | | | There are numerous places in the code where the driver's idea of how much space is left in a ring is updated using the driver's latest notions of the positions of 'head' and 'tail' for the ring. Among them are some that update one or both of these values before (re)doing the calculation. In particular, there are four different places in the code where 'last_retired_head' is copied to 'head' and then set to -1; and two of these do not have a guard to check that it has actually been updated since last time it was consumed, leaving the possibility that the dummy -1 can be transferred from 'last_retired_head' to 'head', causing the space calculation to produce 'impossible' results (previously seen on Android/VLV). This code therefore consolidates all the calculation and updating of these values, such that there is only one place where the ring space is updated, and it ALWAYS uses (and consumes) 'last_retired_head' if (and ONLY if) it has been updated since the last call. Signed-off-by: Dave Gordon <david.s.gordon@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Convert 'trace_irq' to use requests rather than seqnosJohn Harrison2014-12-03
| | | | | | | | | | | | | Updated the trace_irq code to use requests instead of seqnos. This includes reference counting the request object to ensure it sticks around when required. Note that getting access to the reference counting functions means moving the inline i915_trace_irq_get() function from intel_ringbuffer.h to i915_drv.h. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> [danvet: Resolve conflict due to shuffled merge order.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Remove 'outstanding_lazy_seqno'John Harrison2014-12-03
| | | | | | | | | | | | The OLS value is now obsolete. Exactly the same value is guarateed to be always available as PLR->seqno. Thus it is safe to remove the OLS completely. And also to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention valid. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Replace last_[rwf]_seqno with last_[rwf]_reqJohn Harrison2014-12-03
| | | | | | | | | | | | | | | | | | | The object structure contains the last read, write and fenced seqno values for use in syncrhonisation operations. These have now been replaced with their request structure counterparts. Note that to ensure that objects do not end up with dangling pointers, the assignments of last_*_req include reference count updates. Thus a request cannot be freed if an object is still hanging on to it for any reason. v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did not get updated when 'last_rendering_seqno' became 'last_read|write_seqno' several millenia ago. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Add helper functions to aid seqno -> request transitionJohn Harrison2014-12-03
| | | | | | | | | | | | | | | | Added helper functions for retrieving the ring and seqno entries from a request structure. This allows the internal workings of the request structure to be hidden from code that is using these. It also allows for useful workarounds/debug code to be added as or when necessary. Note that it is intended that the majority (if not all) uses of the seqno accessor will disappear eventually as code is updated to use the request structure itself rather than working with seqno values. For: VIZ-4377 Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Remove DRI1 ring accessors and APIChris Wilson2014-11-19
| | | | | | | | | | | | | | With the deprecation of UMS, and by association DRI1, we have a tough choice when updating the ring access routines. We either rewrite the DRI1 routines blindly without testing (so likely to be broken) or take the liberty of declaring them no longer supported and remove them entirely. This takes the latter approach. v2: Also remove the DRI1 sarea updates Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Fix rebase conflicts.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demandThomas Daniel2014-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Same as with the context, pinning to GGTT regardless is harmful (it badly fragments the GGTT and can even exhaust it). Unfortunately, this case is also more complex than the previous one because we need to map and access the ringbuffer in several places along the execbuffer path (and we cannot make do by leaving the default ringbuffer pinned, as before). Also, the context object itself contains a pointer to the ringbuffer address that we have to keep updated if we are going to allow the ringbuffer to move around. v2: Same as with the context pinning, we cannot really do it during an interrupt. Also, pin the default ringbuffers objects regardless (makes error capture a lot easier). v3: Rebased. Take a pin reference of the ringbuffer for each item in the execlist request queue because the hardware may still be using the ringbuffer after the MI_USER_INTERRUPT to notify the seqno update is executed. The ringbuffer must remain pinned until the context save is complete. No longer pin and unpin ringbuffer in populate_lr_context() - this transient address is meaningless and the pinning can cause a sleep while atomic. v4: Moved ringbuffer pin and unpin into the lr_context_pin functions. Downgraded pinning check BUG_ONs to WARN_ONs. v5: Reinstated WARN_ONs for unexpected execlist states. Removed unused variable. Issue: VIZ-4277 Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Akash Goel <akash.goels@gmail.com> Reviewed-by: Deepak S<deepak.s@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: Clean up execlist queue items in retire_workThomas Daniel2014-11-19
| | | | | | | | | | | | | | | | | | | | No longer create a work item to clean each execlist queue item. Instead, move retired execlist requests to a queue and clean up the items during retire_requests. v2: Fix legacy ring path broken during overzealous cleanup v3: Update idle detection to take execlists queue into account v4: Grab execlist lock when checking queue state v5: Fix leaking requests by freeing in execlists_retire_requests. Issue: VIZ-4274 Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> Reviewed-by: Deepak S <deepak.s@linux.intel.com> Reviewed-by: Akash Goel <akash.goels@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Initialize workarounds in logical ring mode tooMichel Thierry2014-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Following the legacy ring submission example, update the ring->init_context() hook to support the execlist submission mode. v2: update to use the new workaround macros and cleanup unused code. This takes care of both bdw and chv workarounds. v2.1: Add missing call to init_context() during deferred context creation. v3: Split init_context (emit) in legacy/lrc modes. For lrc, get the ringbuf from the context (Mika/Daniel). v4: Merge init_context interfaces back, the legacy mode only needs the ring, but the lrc mode needs the ring and context (Mika). Issue: VIZ-4092 Issue: GMIN-3475 Change-Id: Ie3d093b2542ab0e2a44b90460533e2f979788d6c Cc: Deepak S <deepak.s@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [danvet: Align function paramater lists properly.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: Apply workarounds in render ring init functionArun Siluvery2014-09-03
| | | | | | | | | | | | | | | | | | | | | | | | For BDW workarounds are currently initialized in init_clock_gating() but they are lost during reset, suspend/resume etc; this patch moves the WAs that are part of register state context to render ring init fn otherwise default context ends up with incorrect values as they don't get initialized until init_clock_gating fn. v2: Add workarounds to golden render state This method has its own issues, first of all this is different for each gen and it is generated using a tool so adding new workaround and mainitaining them across gens is not a straightforward process. v3: Use LRIs to emit these workarounds (Ville) Instead of modifying the golden render state the same LRIs are emitted from within the driver. v4: Use abstract name when exporting gen specific routines (Chris) For: VIZ-4092 Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: Handle context switch eventsThomas Daniel2014-08-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle all context status events in the context status buffer on every context switch interrupt. We only remove work from the execlist queue after a context status buffer reports that it has completed and we only attempt to schedule new contexts on interrupt when a previously submitted context completes (unless no contexts are queued, which means the GPU is free). We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock grabbed, FWIW), because it might sleep, which is not a nice thing to do. Instead, do the runtime_pm get/put together with the create/destroy request, and handle the forcewake get/put directly. Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> v2: Unreferencing the context when we are freeing the request might free the backing bo, which requires the struct_mutex to be grabbed, so defer unreferencing and freeing to a bottom half. v3: - Ack the interrupt inmediately, before trying to handle it (fix for missing interrupts by Bob Beckett <robert.beckett@intel.com>). - Update the Context Status Buffer Read Pointer, just in case (spotted by Damien Lespiau). v4: New namespace and multiple rebase changes. v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt", as suggested by Daniel. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Checkpatch ...] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: Two-stage execlist submit processMichel Thierry2014-08-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Context switch (and execlist submission) should happen only when other contexts are not active, otherwise pre-emption occurs. To assure this, we place context switch requests in a queue and those request are later consumed when the right context switch interrupt is received (still TODO). v2: Use a spinlock, do not remove the requests on unqueue (wait for context switch completion). Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> v3: Several rebases and code changes. Use unique ID. v4: - Move the queue/lock init to the late ring initialization. - Damien's kmalloc review comments: check return, use sizeof(*req), do not cast. v5: - Do not reuse drm_i915_gem_request. Instead, create our own. - New namespace. Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Add temporary ring->ctx backpointerOscar Mateo2014-08-14
| | | | | | | | | | | | | | | | | | | The execlist patches have a bit a convoluted and long history and due to that have the actual submission still misplaced deeply burried in the low-level ringbuffer handling code. This design goes back to the legacy ringbuffer code with its tricky lazy request and simple work submissiion using ring tail writes. For that reason they need a ring->ctx backpointer. The goal is to unburry that code and move it up into a level where the full execlist context is available so that we can ditch this backpointer. Until that's done make it really obvious that there's work still to be done. Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Thomas Daniel <thomas.daniel@intel.com> Acked-by: Thomas Daniel <thomas.daniel@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: GEN-specific logical ring emit batchbuffer startOscar Mateo2014-08-11
| | | | | | | | | | Dispatch_execbuffer's evil twin. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Ditch the check for aliasing ppgtt. It'll break soon and execlists requires full ppgtt anyway.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: Interrupts with logical ringsOscar Mateo2014-08-11
| | | | | | | | | | | | | | | | | | | | | | We need to attend context switch interrupts from all rings. Also, fixed writing IMR/IER and added HWSTAM at ring init time. Notice that, if added to irq_enable_mask, the context switch interrupts would be incorrectly masked out when the user interrupts are due to no users waiting on a sequence number. Therefore, this commit adds a bitmask of interrupts to be kept unmasked at all times. v2: Disable HWSTAM, as suggested by Damien (nobody listens to these interrupts, anyway). v3: Add new get/put_irq functions. Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1) Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2 & v3) Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Drop the GEN8_ prefix from the context switch interrupt define and move it to its brethren.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: GEN-specific logical ring emit flushOscar Mateo2014-08-11
| | | | | | | | | | | | | | | | Same as the legacy-style ring->flush. v2: The BSD invalidate bit still exists in GEN8! Add it for the VCS rings (but still consolidate the blt and bsd ring flushes into one). This was noticed by Brad Volkin. v3: The command for BSD and for other rings is slightly different: get it exactly the same as in gen6_ring_flush + gen6_bsd_ring_flush Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Checkpatch.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: GEN-specific logical ring emit requestOscar Mateo2014-08-11
| | | | | | | | | | | | | | Very similar to the legacy add_request, only modified to account for logical ringbuffer. v2: Use MI_GLOBAL_GTT, as suggested by Brad Volkin. v3: Unify render and non-render in the same function, as noticed by Brad Volkin. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: New logical ring submission mechanismOscar Mateo2014-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | Well, new-ish: if all this code looks familiar, that's because it's a clone of the existing submission mechanism (with some modifications here and there to adapt it to LRCs and Execlists). And why did we do this instead of reusing code, one might wonder? Well, there are some fears that the differences are big enough that they will end up breaking all platforms. Also, Execlists offer several advantages, like control over when the GPU is done with a given workload, that can help simplify the submission mechanism, no doubt. I am interested in getting Execlists to work first and foremost, but in the future this parallel submission mechanism will help us to fine tune the mechanism without affecting old gens. v2: Pass the ringbuffer only (whenever possible). Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Appease checkpatch. Again. And drop the legacy sarea gunk that somehow crept in.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: GEN-specific logical ring initOscar Mateo2014-08-11
| | | | | | | | | | | | | | | Logical rings do not need most of the initialization their legacy ringbuffer counterparts do: we just need the pipe control object for the render ring, enable Execlists on the hardware and a few workarounds. v2: Squash with: "drm/i915: Extract pipe control fini & make init outside accesible". Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: Make checkpatch happy.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: Generic logical ring init and cleanupOscar Mateo2014-08-11
| | | | | | | | | | | | | | | | | | | Allocate and populate the default LRC for every ring, call gen-specific init/cleanup, init/fini the command parser and set the status page (now inside the LRC object). These are things all engines/rings have in common. Stopping the ring before cleanup and initializing the seqnos is left as a TODO task (we need more infrastructure in place before we can achieve this). v2: Check the ringbuffer backing obj for ring_is_initialized, instead of the context backing obj (similar, but not exactly the same). Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: Add a context and an engine pointers to the ringbufferDaniel Vetter2014-08-11
| | | | | | | | | | | | | | Any given ringbuffer is unequivocally tied to one context and one engine. By setting the appropriate pointers to them, the ringbuffer struct holds all the infromation you might need to submit a workload for processing, Execlists style. v2: Drop ring->ctx since that looks terribly ill-defined for legacy ringbuffer submission. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v1) Acked-by: Damien Lespiau <damien.lespiau@intel.com> (v2) Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: Allocate ringbuffers for Logical Ring ContextsOscar Mateo2014-08-11
| | | | | | | | | | | | | | | As we have said a couple of times by now, logical ring contexts have their own ringbuffers: not only the backing pages, but the whole management struct. In a previous version of the series, this was achieved with two separate patches: drm/i915/bdw: Allocate ringbuffer backing objects for default global LRC drm/i915/bdw: Allocate ringbuffer for user-created LRCs Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Don't accumulate hangcheck score on forward progressMika Kuoppala2014-08-07
| | | | | | | | | | | | | | | | | If the actual head has progressed forward inside a batch (request), don't accumulate hangcheck score. As the hangcheck score in increased only by acthd jumping backwards, the result is that we only declare an active batch as stuck if it is trapped inside a loop. Or that the looping will dominate the batch progression so that it overcomes the bonus that forward progress gives. v2: Improved commit message (Chris Wilson) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> [danvet: s/active_loop/active (loop)/ as requested by Chris.] Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Generalize intel_ring_get_tail to take a ringbufOscar Mateo2014-07-08
| | | | | | | | | | | | Again, it's low-level enough to simply take a ringbuf and nothing else. Trivial change. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915/bdw: implement semaphore signalBen Widawsky2014-07-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Semaphore signalling works similarly to previous GENs with the exception that the per ring mailboxes no longer exist. Instead you must define your own space, somewhere in the GTT. The comments in the code define the layout I've opted for, which should be fairly future proof. Ie. I tried to define offsets in abstract terms (NUM_RINGS, seqno size, etc). NOTE: If one wanted to move this to the HWSP they could. I've decided one 4k object would be easier to deal with, and provide potential wins with cache locality, but that's all speculative. v2: Update the macro to not need the other ring's ring->id (Chris) Update the comment to use the correct formula (Chris) v3: Move the macros the ringbuffer.h to prevent churn in next patch (Ville) v4: Fixed compilation rebase conflict commit 1ec9e26ddab06459e89a890431b2de064c5d1056 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri Feb 14 14:01:11 2014 +0100 drm/i915: Consolidate binding parameters into flags v5: VCS2 rebase Replace hweight_long with hweight32 v6 (Rodrigo): * Add missed VC2 gen8 ring signal init * fixing conflicst on rebase * minor fixes on address table * remove WARN_ON Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [danvet: s/BUG_ON/WARN_ON/] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Updating comments.Rodrigo Vivi2014-07-07
| | | | | | | | | | | | ring index calculation table was out of date after other rings were added, although the formula is flexible and scale when adding new rings. So this patch just update the comments and add a brief explanation why to use sync_seqno[ring index]. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Reorder semaphore deadlock checkChris Wilson2014-06-11
| | | | | | | | | | | | | | | | | | If a semaphore is waiting on another ring, which in turn happens to be waiting on the first ring, but that second semaphore has been signalled, we will be able to kick the second ring and so can treat the first ring as a valid WAIT and not as HUNG. v2: Be paranoid and cap the potential recursion depth whilst visiting the semaphore signallers. (Mika) References: https://bugs.freedesktop.org/show_bug.cgi?id=54226 References: https://bugs.freedesktop.org/show_bug.cgi?id=75502 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
* drm/i915: s/i915_hw_context/intel_contextOscar Mateo2014-05-22
| | | | | | | | | | | | | | | | | Up until now, contexts had one (and only one) backing object that was used by the hardware to save/restore render ring contexts (via the MI_SET_CONTEXT command). Other rings did not have or need this, so our i915_hw_context struct had a 1:1 relationship with a a real HW context. With Logical Ring Contexts and Execlists, this is not possible anymore: all rings need a backing object, and it cannot be reused. To prepare for that, rename our contexts to the more generic term intel_context. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Split the ringbuffers from the rings (3/3)Oscar Mateo2014-05-22
| | | | | | | | | | | Manual cleanup after the previous Coccinelle script. Yes, I could write another Coccinelle script to do this but I don't want labor-replacing robots making an honest programmer's work obsolete (also, I'm lazy). Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Split the ringbuffers from the rings (2/3)Oscar Mateo2014-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This refactoring has been performed using the following Coccinelle semantic script: @@ struct intel_engine_cs r; @@ ( - (r).obj + r.buffer->obj | - (r).virtual_start + r.buffer->virtual_start | - (r).head + r.buffer->head | - (r).tail + r.buffer->tail | - (r).space + r.buffer->space | - (r).size + r.buffer->size | - (r).effective_size + r.buffer->effective_size | - (r).last_retired_head + r.buffer->last_retired_head ) @@ struct intel_engine_cs *r; @@ ( - (r)->obj + r->buffer->obj | - (r)->virtual_start + r->buffer->virtual_start | - (r)->head + r->buffer->head | - (r)->tail + r->buffer->tail | - (r)->space + r->buffer->space | - (r)->size + r->buffer->size | - (r)->effective_size + r->buffer->effective_size | - (r)->last_retired_head + r->buffer->last_retired_head ) @@ expression E; @@ ( - LP_RING(E)->obj + LP_RING(E)->buffer->obj | - LP_RING(E)->virtual_start + LP_RING(E)->buffer->virtual_start | - LP_RING(E)->head + LP_RING(E)->buffer->head | - LP_RING(E)->tail + LP_RING(E)->buffer->tail | - LP_RING(E)->space + LP_RING(E)->buffer->space | - LP_RING(E)->size + LP_RING(E)->buffer->size | - LP_RING(E)->effective_size + LP_RING(E)->buffer->effective_size | - LP_RING(E)->last_retired_head + LP_RING(E)->buffer->last_retired_head ) Note: On top of this this patch also removes the now unused ringbuffer fields in intel_engine_cs. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> [danvet: Add note about fixup patch included here.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Split the ringbuffers from the rings (1/3)Oscar Mateo2014-05-22
| | | | | | | | | | | | | | | | As advanced by the previous patch, the ringbuffers and the engine command streamers belong in different structs. This is so because, while they used to be tightly coupled together, the new Logical Ring Contexts (LRC for short) have a ringbuffer each. In legacy code, we will use the buffer* pointer inside each ring to get to the pertaining ringbuffer (the actual switch will be done in the next patch). In the new Execlists code, this pointer will be NULL and we will use instead the one inside the context instead. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: s/intel_ring_buffer/intel_engine_csOscar Mateo2014-05-22
| | | | | | | | | | | In the upcoming patches we plan to break the correlation between engine command streamers (a.k.a. rings) and ringbuffers, so it makes sense to refactor the code and make the change obvious. No functional changes. Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Use hash tables for the command parserBrad Volkin2014-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For clients that submit large batch buffers the command parser has a substantial impact on performance. On my HSW ULT system performance drops as much as ~20% on some tests. Most of the time is spent in the command lookup code. Converting that from the current naive search to a hash table lookup reduces the performance drop to ~10%. The choice of value for I915_CMD_HASH_ORDER allows all commands currently used in the parser tables to hash to their own bucket (except for one collision on the render ring). The tradeoff is that it wastes memory. Because the opcodes for the commands in the tables are not particularly well distributed, reducing the order still leaves many buckets empty. The increased collisions don't seem to have a huge impact on the performance gain, but for now anyhow, the parser trades memory for performance. NB: Ville noticed that the error paths through the ring init code will leak memory. I've not addressed that here. We can do a follow up pass to handle all of the leaks. v2: improved comment describing selection of hash key mask (Damien) replace a BUG_ON() with an error return (Tvrtko, Ville) commit message improvements Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Support 64b execbufBen Widawsky2014-05-05
| | | | | | | | | | | | | Previously, our code only had a 32b offset value for where the batchbuffer starts. With full PPGTT, and 64b canonical GPU address space, that is an insufficient value. The code to expand is pretty straight forward, and only one platform needs to do anything with the extra bits. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Move ring_begin to signal()Ben Widawsky2014-05-05
| | | | | | | | | | | | | | | | | | | | Add_request has always contained both the semaphore mailbox updates as well as the breadcrumb writes. Since the semaphore signal is the one which actually knows about the number of dwords it needs to emit to the ring, we move the ring_begin to that function. This allows us to remove the hideously shared #define On a related not, gen8 will use a different number of dwords for semaphores, but not for add request. v2: Make number of dwords an explicit part of signalling (via function argument). (Chris) v3: very slight comment change Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Virtualize the ringbuffer signal funcBen Widawsky2014-05-05
| | | | | | | | | | | | | | This abstraction again is in preparation for gen8. Gen8 will bring new semantics for doing this operation. While here, make the writes of MI_NOOPs explicit for non-existent rings. This should have been implicit before. NOTE: This is going to be removed in a few patches. Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Move semaphore specific ring members to structBen Widawsky2014-05-05
| | | | | | | | | | | | | This will be helpful in abstracting some of the code in preparation for gen8 semaphores. v2: Move mbox stuff to a separate struct v3: Rebased over VCS2 work Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915:Initialize the second BSD ring on BDW GT3 machineZhao Yakui2014-05-05
| | | | | | | | | | | | | | Based on the hardware spec, the BDW GT3 machine has two independent BSD ring that can be used to dispatch the video commands. So just initialize it. V3->V4: Follow Imre's comment to do some minor updates. For example: more comments are added to describe the semaphore between ring. Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> [danvet: Fix up checkpatch error.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/i915: Update the restrict check to filter out wrong Ring ID passed by ↵Zhao Yakui2014-05-05
| | | | | | | | user-space Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>