aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMichael <mike@rsy.com>2011-02-25 23:56:34 -0500
committerMauro Carvalho Chehab <mchehab@redhat.com>2011-03-02 11:45:42 -0500
commitd213ad08362909ab50fbd6568fcc9fd568268d29 (patch)
tree43a98270b52304c7bf9f53847225a8abfb52600f
parent1e6406b8f0dc1ae7d7c39c9e1ac6ca78e016ebfb (diff)
[media] ivtv: Fix corrective action taken upon DMA ERR interrupt to avoid hang
After upgrading the kernel from stock Ubuntu 7.10 to 10.04, with no hardware changes, I started getting the dreaded DMA TIMEOUT errors, followed by inability to encode until the machine was rebooted. I came across a post from Andy in March (http://www.gossamer-threads.com/lists/ivtv/users/40943#40943) where he speculates that perhaps the corrective actions being taken after a DMA ERROR are not sufficient to recover the situation. After some testing I suspect that this is indeed the case, and that in fact the corrective action may be what hangs the card's DMA engine, rather than the original error. Specifically these DMA ERROR IRQs seem to present with two different values in the IVTV_REG_DMASTATUS register: 0x11 and 0x13. The current corrective action is to clear that status register back to 0x01 or 0x03, and then issue the next DMA request. In the case of a 0x13 this seems to result in a minor glitch in the encoded stream due to the failed transfer that was not retried, but otherwise things continue OK. In the case of a 0x11 the card's DMA write engine is never heard from again, and a DMA TIMEOUT follows shortly after. 0x11 is the killer. I suspect that the two cases need to be handled differently. The difference is in bit 1 (0x02), which is set when the error is about to be successfully recovered, and clear when things are about to go bad. Bit 1 of DMASTATUS is described differently in different places either as a positive "write finished", or an inverted "write busy". If we take the first definition, then when an error arises with state 0x11, it means that the write did not complete. It makes sense to start a new transfer, as in the current code. But if we take the second definition, then 0x11 means "an error but the write engine is still busy". Trying to feed it a new transfer in this situation might not be a good idea. As an experiment, I added code to ignore the DMA ERROR IRQ if DMASTATUS is 0x11. I.e., don't start a new transfer, don't clear our flags, etc. The hope was that the card would complete the transfer and issue a ENC DMA COMPLETE, either successfully or with an error condition there. However the card still hung. The only remaining corrective action being taken with a 0x11 status was then the write back to the status register to clear the error, i.e. DMASTATUS = DMASTATUS & ~3. This would have the effect of clearing the error bit 4, while leaving the lower bits indicating DMA write busy. Strangely enough, removing this write to the status register solved the problem! If the DMA ERROR IRQ with DMASTATUS=0x11 is completely ignored, with no corrective action at all, then the card will complete the transfer and issue a new IRQ. If the status register is written to when it has the value 0x11, then the DMA engine hangs. Perhaps it's illegal to write to DMASTATUS while the read or write busy bit is set? At any rate, it appears that the current corrective action is indeed making things worse rather than better. I put together a patch that modifies ivtv_irq_dma_err to do the following: - Don't write back to IVTV_REG_DMASTATUS. - If write-busy is asserted, leave the card alone. Just extend the timeout slightly. - If write-busy is de-asserted, retry the current transfer. This has completely fixed my DMA TIMEOUT woes. DMA ERR events still occur, but now they seem to be correctly handled. 0x11 events no longer hang the card, and 0x13 events no longer result in a glitch in the stream, as the failed transfer is retried. I'm happy. I've inlined the patch below in case it is of interest. As described above, I have a theory about why it works (based on a different interpretation of bit 1 of DMASTATUS), but I can't guarantee that my theory is correct. There may be another explanation, or it may be a fluke. Maybe ignoring that IRQ entirely would be equally effective? Maybe the status register read/writeback sequence is race condition if the card changes it in the mean time? Also as I am using a PVR-150 only, I have not been able to test it on other cards, which may be especially relevant for 350s that support concurrent decoding. Hopefully the patch does not break the DMA READ path. Mike [awalls@md.metrocast.net: Modified patch to add a verbose comment, make minor brace reformats, and clear the error flags in the IVTV_REG_DMASTATUS iff both read and write DMA were not in progress. Mike's conjecture about a race condition with the writeback is correct; it can confuse the DMA engine.] [Comment and analysis from the ML post by Michael <mike@rsy.com>] Signed-off-by: Andy Walls <awalls@md.metrocast.net> Cc: stable@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-rw-r--r--drivers/media/video/ivtv/ivtv-irq.c58
1 files changed, 51 insertions, 7 deletions
diff --git a/drivers/media/video/ivtv/ivtv-irq.c b/drivers/media/video/ivtv/ivtv-irq.c
index 9b4faf009196..9c29e964d400 100644
--- a/drivers/media/video/ivtv/ivtv-irq.c
+++ b/drivers/media/video/ivtv/ivtv-irq.c
@@ -628,22 +628,66 @@ static void ivtv_irq_enc_pio_complete(struct ivtv *itv)
628static void ivtv_irq_dma_err(struct ivtv *itv) 628static void ivtv_irq_dma_err(struct ivtv *itv)
629{ 629{
630 u32 data[CX2341X_MBOX_MAX_DATA]; 630 u32 data[CX2341X_MBOX_MAX_DATA];
631 u32 status;
631 632
632 del_timer(&itv->dma_timer); 633 del_timer(&itv->dma_timer);
634
633 ivtv_api_get_data(&itv->enc_mbox, IVTV_MBOX_DMA_END, 2, data); 635 ivtv_api_get_data(&itv->enc_mbox, IVTV_MBOX_DMA_END, 2, data);
636 status = read_reg(IVTV_REG_DMASTATUS);
634 IVTV_DEBUG_WARN("DMA ERROR %08x %08x %08x %d\n", data[0], data[1], 637 IVTV_DEBUG_WARN("DMA ERROR %08x %08x %08x %d\n", data[0], data[1],
635 read_reg(IVTV_REG_DMASTATUS), itv->cur_dma_stream); 638 status, itv->cur_dma_stream);
636 write_reg(read_reg(IVTV_REG_DMASTATUS) & 3, IVTV_REG_DMASTATUS); 639 /*
640 * We do *not* write back to the IVTV_REG_DMASTATUS register to
641 * clear the error status, if either the encoder write (0x02) or
642 * decoder read (0x01) bus master DMA operation do not indicate
643 * completed. We can race with the DMA engine, which may have
644 * transitioned to completed status *after* we read the register.
645 * Setting a IVTV_REG_DMASTATUS flag back to "busy" status, after the
646 * DMA engine has completed, will cause the DMA engine to stop working.
647 */
648 status &= 0x3;
649 if (status == 0x3)
650 write_reg(status, IVTV_REG_DMASTATUS);
651
637 if (!test_bit(IVTV_F_I_UDMA, &itv->i_flags) && 652 if (!test_bit(IVTV_F_I_UDMA, &itv->i_flags) &&
638 itv->cur_dma_stream >= 0 && itv->cur_dma_stream < IVTV_MAX_STREAMS) { 653 itv->cur_dma_stream >= 0 && itv->cur_dma_stream < IVTV_MAX_STREAMS) {
639 struct ivtv_stream *s = &itv->streams[itv->cur_dma_stream]; 654 struct ivtv_stream *s = &itv->streams[itv->cur_dma_stream];
640 655
641 /* retry */ 656 if (s->type >= IVTV_DEC_STREAM_TYPE_MPG) {
642 if (s->type >= IVTV_DEC_STREAM_TYPE_MPG) 657 /* retry */
658 /*
659 * FIXME - handle cases of DMA error similar to
660 * encoder below, except conditioned on status & 0x1
661 */
643 ivtv_dma_dec_start(s); 662 ivtv_dma_dec_start(s);
644 else 663 return;
645 ivtv_dma_enc_start(s); 664 } else {
646 return; 665 if ((status & 0x2) == 0) {
666 /*
667 * CX2341x Bus Master DMA write is ongoing.
668 * Reset the timer and let it complete.
669 */
670 itv->dma_timer.expires =
671 jiffies + msecs_to_jiffies(600);
672 add_timer(&itv->dma_timer);
673 return;
674 }
675
676 if (itv->dma_retries < 3) {
677 /*
678 * CX2341x Bus Master DMA write has ended.
679 * Retry the write, starting with the first
680 * xfer segment. Just retrying the current
681 * segment is not sufficient.
682 */
683 s->sg_processed = 0;
684 itv->dma_retries++;
685 ivtv_dma_enc_start_xfer(s);
686 return;
687 }
688 /* Too many retries, give up on this one */
689 }
690
647 } 691 }
648 if (test_bit(IVTV_F_I_UDMA, &itv->i_flags)) { 692 if (test_bit(IVTV_F_I_UDMA, &itv->i_flags)) {
649 ivtv_udma_start(itv); 693 ivtv_udma_start(itv);