litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	Linus Torvalds	2014-09-24
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull one last block fix from Jens Axboe: "We've had an issue with scsi-mq where probing takes forever. This was bisected down to the percpu changes for blk_mq_queue_enter(), and the fact we now suffer an RCU grace period when killing a queue. SCSI creates and destroys tons of queues, so this let to 10s of seconds of stalls at boot for some. Tejun has a real fix for this, but it's too involved for 3.17. So this is a temporary workaround to expedite the queue killing until we can fold in the real fix for 3.18 when that merge window opens" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe
\| *	blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe	Tejun Heo	2014-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	blk-mq uses percpu_ref for its usage counter which tracks the number of in-flight commands and used to synchronously drain the queue on freeze. percpu_ref shutdown takes measureable wallclock time as it involves a sched RCU grace period. This means that draining a blk-mq takes measureable wallclock time. One would think that this shouldn't matter as queue shutdown should be a rare event which takes place asynchronously w.r.t. userland. Unfortunately, SCSI probing involves synchronously setting up and then tearing down a lot of request_queues back-to-back for non-existent LUNs. This means that SCSI probing may take more than ten seconds when scsi-mq is used. This will be properly fixed by implementing a mechanism to keep q->mq_usage_counter in atomic mode till genhd registration; however, that involves rather big updates to percpu_ref which is difficult to apply late in the devel cycle (v3.17-rc6 at the moment). As a stop-gap measure till the proper fix can be implemented in the next cycle, this patch introduces __percpu_ref_kill_expedited() and makes blk_mq_freeze_queue() use it. This is heavy-handed but should work for testing the experimental SCSI blk-mq implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Christoph Hellwig <hch@infradead.org> Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count") Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* \|	Merge tag 'pci-v3.17-fixes-3' of ↵	Linus Torvalds	2014-09-24
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Here are a few fixes that should be in v3.17. - Reverting "Don't scan random busses" covers up a CardBus regression having to do with allocating CardBus bus numbers. - Reverting "Make sure bus numbers stay within parents bounds" covers up an ACPI _CRS bug that makes us reconfigure a bridge, causing a broken device behind it to stop responding. - The pciehp timeout change fixes some code we added in v3.17. Without the fix, we can send a new hotplug command too early, before the timeout has expired. I hope for better fixes for the reverts, but those will have to come after v3.17" * tag 'pci-v3.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: pciehp: Fix pcie_wait_cmd() timeout Revert "PCI: Make sure bus number resources stay within their parents bounds" Revert "PCI: Don't scan random busses in pci_scan_bridge()"
\| * \|	PCI: pciehp: Fix pcie_wait_cmd() timeout	Yinghai Lu	2014-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pcie_poll_cmd() take msecs instead of jiffies, so convert timeout to msecs. Fixes: 40b960831cfa ("PCI: pciehp: Compute timeout from hotplug command start time") Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
\| * \|	Revert "PCI: Make sure bus number resources stay within their parents bounds"	Bjorn Helgaas	2014-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 1820ffdccb9b ("PCI: Make sure bus number resources stay within their parents bounds") because it breaks some systems with LSI Logic FC949ES Fibre Channel Adapters, apparently by exposing a defect in those adapters. Dirk tested a Tyan VX50 (B4985) with this device that worked like this prior to 1820ffdccb9b: bus: [bus 00-7f] on node 0 link 1 ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-07]) pci 0000:00:0e.0: PCI bridge to [bus 0a] pci_bus 0000:0a: busn_res: can not insert [bus 0a] under [bus 00-07] (conflicts with (null) [bus 00-07]) pci 0000:0a:00.0: [1000:0646] type 00 class 0x0c0400 (FC adapter) Note that the root bridge [bus 00-07] aperture is wrong; this is a BIOS defect in the PCI0 _CRS method. But prior to 1820ffdccb9b, we didn't enforce that aperture, and the FC adapter worked fine at 0a:00.0. After 1820ffdccb9b, we notice that 00:0e.0's aperture is not contained in the root bridge's aperture, so we reconfigure it so it is contained: pci 0000:00:0e.0: bridge configuration invalid ([bus 0a-0a]), reconfiguring pci 0000:00:0e.0: PCI bridge to [bus 06-07] This effectively moves the FC device from 0a:00.0 to 07:00.0, which should be legal. But when we enumerate bus 06, the FC device doesn't respond, so we don't find anything. This is probably a defect in the FC device. Possible fixes (due to Yinghai): 1) Add a quirk to fix the _CRS information based on what amd_bus.c read from the hardware 2) Reset the FC device after we change its bus number 3) Revert 1820ffdccb9b Fix 1 would be relatively easy, but it does sweep the LSI FC issue under the rug. We might want to reconfigure bus numbers in the future for some other reason, e.g., hotplug, and then we could trip over this again. For that reason, I like fix 2, but we don't know whether it actually works, and we don't have a patch for it yet. This revert is fix 3, which also sweeps the LSI FC issue under the rug. Link: https://bugzilla.kernel.org/show_bug.cgi?id=84281 Reported-by: Dirk Gouders <dirk@gouders.net> Tested-by: Dirk Gouders <dirk@gouders.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.15+ CC: Yinghai Lu <yinghai@kernel.org>
\| * \|	Revert "PCI: Don't scan random busses in pci_scan_bridge()"	Bjorn Helgaas	2014-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit fc1b253141b3 ("PCI: Don't scan random busses in pci_scan_bridge()") because it breaks CardBus on some machines. David tested a Dell Latitude D505 that worked like this prior to fc1b253141b3: pci 0000:00:1e.0: PCI bridge to [bus 01] pci 0000:01:01.0: CardBus bridge to [bus 02-05] Note that the 01:01.0 CardBus bridge has a bus number aperture of [bus 02-05], but those buses are all outside the 00:1e.0 PCI bridge bus number aperture, so accesses to buses 02-05 never reach CardBus. This is later patched up by yenta_fixup_parent_bridge(), which changes the subordinate bus number of the 00:1e.0 PCI bridge: pci_bus 0000:01: Raising subordinate bus# of parent bus (#01) from #01 to #05 With fc1b253141b3, pci_scan_bridge() fails immediately when it notices that we can't allocate a valid secondary bus number for the CardBus bridge, and CardBus doesn't work at all: pci 0000:01:01.0: can't allocate child bus 01 from [bus 01] I'd prefer to fix this by integrating the yenta_fixup_parent_bridge() logic into pci_scan_bridge() so we fix the bus number apertures up front. But I don't think we can do that before v3.17, so I'm going to revert this to avoid the problem while we're working on the long-term fix. Link: https://bugzilla.kernel.org/show_bug.cgi?id=83441 Link: http://lkml.kernel.org/r/1409303414-5196-1-git-send-email-david.henningsson@canonical.com Reported-by: David Henningsson <david.henningsson@canonical.com> Tested-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.15+
* \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	Linus Torvalds	2014-09-24
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull crypto fixes from Herbert Xu: "This fixes three issues: - if ccp is loaded on a machine without ccp, it will incorrectly activate causing all requests to fail. Fixed by preventing ccp from loading if hardware isn't available. - not all IRQs were enabled for the qat driver, leading to potential stalls when it is used - disabled buggy AVX CTR implementation in aesni" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni - disable "by8" AVX CTR optimization crypto: ccp - Check for CCP before registering crypto algs crypto: qat - Enable all 32 IRQs
\| * \| \|	crypto: aesni - disable "by8" AVX CTR optimization	Mathias Krause	2014-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "by8" implementation introduced in commit 22cddcc7df8f ("crypto: aes - AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it handles counter block overflows differently. It only accounts the right most 32 bit as a counter -- not the whole block as all other implementations do. This makes it fail the cryptomgr test #4 that specifically tests this corner case. As we're quite late in the release cycle, just disable the "by8" variant for now. Reported-by: Romain Francoise <romain@orebokech.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
\| * \| \|	crypto: ccp - Check for CCP before registering crypto algs	Tom Lendacky	2014-09-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the ccp is built as a built-in module, then ccp-crypto (whether built as a module or a built-in module) will be able to load and it will register its crypto algorithms. If the system does not have a CCP this will result in -ENODEV being returned whenever a command is attempted to be queued by the registered crypto algorithms. Add an API, ccp_present(), that checks for the presence of a CCP on the system. The ccp-crypto module can use this to determine if it should register it's crypto alogorithms. Cc: stable@vger.kernel.org Reported-by: Scot Doyle <lkml14@scotdoyle.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Tested-by: Scot Doyle <lkml14@scotdoyle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
\| * \| \|	crypto: qat - Enable all 32 IRQs	Tadeusz Struk	2014-09-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable interrupts from all 32 bundles. Signed-off-by: Conor McLoughlin <conor.mcloughlin@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* \| \| \|	Merge tag 'media/v3.17-rc7' of ↵	Linus Torvalds	2014-09-24
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "For some last time fixes: - a regression detected on Kernel 3.16 related to VBI Teletext application breakage on drivers using videobuf2 (see https://bugzilla.kernel.org/show_bug.cgi?id=84401). The bug was noticed on saa7134 (migrated to VB2 on 3.16), but also affects em28xx (migrated on 3.9 to VB2); - two additional sanity checks at videobuf2; - two fixups to restore proper VBI support at the em28xx driver; - two Kernel oops fixups (at cx24123 and cx2341x drivers); - a bug at adv7604 where an if was doing just the opposite as it would be expected; - some documentation fixups to match the behavior defined at the Kernel" * tag 'media/v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] em28xx-v4l: get rid of field "users" in struct em28xx_v4l2" [media] em28xx: fix VBI handling logic [media] DocBook media: improve the poll() documentation [media] DocBook media: fix the poll() 'no QBUF' documentation [media] vb2: fix VBI/poll regression [media] cx2341x: fix kernel oops [media] cx24123: fix kernel oops due to missing parent pointer [media] adv7604: fix inverted condition [media] media/radio: fix radio-miropcm20.c build with io.h header file [media] vb2: fix plane index sanity check in vb2_plane_cookie() [media] DocBook media: update version number and V4L2 changes [media] DocBook media: fix fieldname in struct v4l2_subdev_selection [media] vb2: fix vb2 state check when start_streaming fails [media] videobuf2-core.h: fix comment [media] videobuf2-core: add comments before the WARN_ON [media] videobuf2-dma-sg: fix for wrong GFP mask to sg_alloc_table_from_pages
\| * \| \| \|	[media] em28xx-v4l: get rid of field "users" in struct em28xx_v4l2"	Frank Schaefer	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 747dba7de2a51a3db58b665ed3bc8c07921546ec. It breaks concurrent vbi and video capturing: While v4l2->users is the number of users of the whole device (all device nodes), v4l2_fh_is_singular() only checks the number of users of a specific device node. As a result. if one device node is open and a second device node is opened (closed), the device is reinitialized (streaming is stopped). Reported-by: Hans Verkuil <hans.verkuil@cisco.com> Tested-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] em28xx: fix VBI handling logic	Mauro Carvalho Chehab	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When both VBI and video are streaming, and video stream is stopped, a subsequent trial to restart it will fail, because S_FMT will return -EBUSY. That prevents applications like zvbi to work properly. Please notice that, while this fix it fully for zvbi, the best is to get rid of streaming_users and res_get logic as a hole. However, this single-line patch is better to be merged at -stable. Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] DocBook media: improve the poll() documentation	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The poll documentation was incomplete: document how events (POLLPRI) are handled and fix the documentation of what poll does for display devices and streaming I/O. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] DocBook media: fix the poll() 'no QBUF' documentation	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clarify what poll() returns if STREAMON was called but not QBUF. Make explicit the different behavior for this scenario for capture and output devices. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] vb2: fix VBI/poll regression	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The recent conversion of saa7134 to vb2 unconvered a poll() bug that broke the teletext applications alevt and mtt. These applications expect that calling poll() without having called VIDIOC_STREAMON will cause poll() to return POLLERR. That did not happen in vb2. This patch fixes that behavior. It also fixes what should happen when poll() is called when STREAMON is called but no buffers have been queued. In that case poll() will also return POLLERR, but only for capture queues since output queues will always return POLLOUT anyway in that situation. This brings the vb2 behavior in line with the old videobuf behavior. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] cx2341x: fix kernel oops	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The v4l2_ctrl_config struct must be zeroed before passing it to v4l2_ctrl_new_custom(). This was always wrong, but with the recent v4l2-ctrls.c changes this is now much more likely to lead to a kernel bug. This is the only place where this struct wasn't initialized properly. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Reported-by: Pridvorov Andrey <ua0lnj@bk.ru> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] cx24123: fix kernel oops due to missing parent pointer	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I try to set the TV standard to e.g. PAL on my Hauppauge WinTV-HVR3000 I get the following oops: 9464.262345] CX24123: detected CX24123 [ 9464.262526] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 [ 9464.262555] IP: [<ffffffff816676b5>] acpi_i2c_install_space_handler+0x15/0xc0 [ 9464.262576] PGD 0 [ 9464.262584] Oops: 0000 [#1] PREEMPT SMP [ 9464.262597] Modules linked in: cx24123 cx22702 cx88_dvb(+) videobuf_dvb cx88_vp3054_i2c cx88_blackbird cx8802 ir_lirc_codec ir_xmp_decoder ir_sanyo_decoder ir_jvc_decoder ir_mce_kbd_decoder ir_sharp_decoder lirc_dev ir_sony_decoder ir_rc6_decoder ir_nec_decoder ir_rc5_decoder rc_hauppauge wm8775 tuner_simple tuner_types tda9887 cx8800 cx88xx btcx_risc videobuf_dma_sg videobuf_core mt2131 s5h1409 tda8290 tuner cx25840 cx23885 altera_ci tda18271 altera_stapl videobuf2_dvb tveeprom cx2341x videobuf2_dma_sg dvb_core rc_core videobuf2_memops videobuf2_core v4l2_common videodev media nouveau x86_pkg_temp_thermal cfbfillrect cfbimgblt cfbcopyarea ttm drm_kms_helper processor button isci [ 9464.262786] CPU: 2 PID: 2417 Comm: modprobe Not tainted 3.17.0-rc1-telek #322 [ 9464.262796] Hardware name: ASUSTeK COMPUTER INC. Z9PE-D8 WS/Z9PE-D8 WS, BIOS 5404 02/10/2014 [ 9464.262807] task: ffff881097959ad0 ti: ffff88109967c000 task.ti: ffff88109967c000 [ 9464.262817] RIP: 0010:[<ffffffff816676b5>] [<ffffffff816676b5>] acpi_i2c_install_space_handler+0x15/0xc0 [ 9464.262834] RSP: 0018:ffff88109967fbd8 EFLAGS: 00010246 [ 9464.262843] RAX: 0000000000000000 RBX: ffff880892a89540 RCX: 0000000000000000 [ 9464.262853] RDX: 0000000080000001 RSI: ffff880892e75870 RDI: ffff880892a89540 [ 9464.262862] RBP: ffff88109967fbf8 R08: ffff881099b2ccc0 R09: ffff880891efa088 [ 9464.262872] R10: 0000000000000000 R11: 0000000000000022 R12: 0000000000000000 [ 9464.262883] R13: ffff880892a895b0 R14: 00000000ffffffed R15: ffff88089b48f800 [ 9464.262893] FS: 00007fe42b6d7700(0000) GS:ffff88089fc40000(0000) knlGS:0000000000000000 [ 9464.262904] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9464.262912] CR2: 0000000000000230 CR3: 0000001094078000 CR4: 00000000000407e0 [ 9464.262922] Stack: [ 9464.262927] ffff880892a89540 0000000000000000 ffff880892a895b0 ffff88109a155a80 [ 9464.262944] ffff88109967fc20 ffffffff81666a36 0000000000000020 ffff880892a89540 [ 9464.262960] ffffffffa01c8d40 ffff88109967fc40 ffffffff81666c67 ffff880892a89000 [ 9464.262977] Call Trace: [ 9464.262987] [<ffffffff81666a36>] i2c_register_adapter+0x166/0x340 [ 9464.262998] [<ffffffff81666c67>] i2c_add_adapter+0x57/0x60 [ 9464.263011] [<ffffffffa01e2c58>] cx24123_attach+0x108/0x1ba [cx24123] [ 9464.263025] [<ffffffffa01c5a76>] dvb_register+0x404/0x245b [cx88_dvb] [ 9464.263039] [<ffffffffa0059183>] ? videobuf_queue_core_init+0xe3/0x140 [videobuf_core] [ 9464.263052] [<ffffffffa01c54b1>] cx8802_dvb_probe+0x1e1/0x261 [cx88_dvb] [ 9464.263066] [<ffffffffa01a3b00>] cx8802_register_driver+0x190/0x20d [cx8802] [ 9464.263077] [<ffffffffa01cc000>] ? 0xffffffffa01cc000 [ 9464.263089] [<ffffffffa01cc025>] dvb_init+0x25/0x27 [cx88_dvb] [ 9464.263101] [<ffffffff810002c4>] do_one_initcall+0x84/0x1c0 [ 9464.263113] [<ffffffff811893fa>] ? __vunmap+0x9a/0x100 [ 9464.263125] [<ffffffff81122a66>] load_module+0x1216/0x1790 [ 9464.263134] [<ffffffff8111ff70>] ? __symbol_put+0x70/0x70 [ 9464.263145] [<ffffffff811aa8cc>] ? vfs_read+0x11c/0x170 [ 9464.263156] [<ffffffff811201d9>] ? copy_module_from_fd.isra.53+0x119/0x170 [ 9464.263168] [<ffffffff81123116>] SyS_finit_module+0x76/0x80 [ 9464.263181] [<ffffffff818d19e9>] system_call_fastpath+0x16/0x1b [ 9464.263190] Code: 81 31 c0 e8 2e f6 e8 ff 48 83 c4 08 5b 5d eb de 66 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 41 be ed ff ff ff 48 8b 47 70 <48> 8b 80 30 02 00 00 48 85 c0 74 58 4c 8b 68 08 4d 85 ed 74 4f [ 9464.263347] RIP [<ffffffff816676b5>] acpi_i2c_install_space_handler+0x15/0xc0 [ 9464.263361] RSP <ffff88109967fbd8> [ 9464.263367] CR2: 0000000000000230 [ 9464.266919] ---[ end trace 57fd490bdb72e733 ]--- I traced this to a NULL i2c_adapter parent pointer when cx24123 creates its own i2c adapter. The acpi_i2c_install_space_handler function appeared in 3.17, so that's probably why this hasn't been seen before. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] adv7604: fix inverted condition	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The log_status function should show HDMI information, but the test checking for an HDMI input was inverted. Fix this. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Cc: stable@vger.kernel.org # for v3.12 and up Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] media/radio: fix radio-miropcm20.c build with io.h header file	Randy Dunlap	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix build errors in radio-miropcm20.c due to missing header file: drivers/media/radio/radio-miropcm20.c: In function 'rds_waitread': drivers/media/radio/radio-miropcm20.c:90:3: error: implicit declaration of function 'inb' [-Werror=implicit-function-declaration] drivers/media/radio/radio-miropcm20.c: In function 'rds_rawwrite': drivers/media/radio/radio-miropcm20.c:106:3: error: implicit declaration of function 'outb' [-Werror=implicit-function-declaration] Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] vb2: fix plane index sanity check in vb2_plane_cookie()	Zhaowei Yuan	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's also invalid when plane_no is equal to vb->num_planes Signed-off-by: Zhaowei Yuan <zhaowei.yuan@samsung.com> Cc: stable@vger.kernel.org # for v3.7 and up Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] DocBook media: update version number and V4L2 changes	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note: the revision text for the v4l2_pix_format change from Laurent erroneously mentioned 3.16 when it only got merged for 3.17. Fixed that as well. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] DocBook media: fix fieldname in struct v4l2_subdev_selection	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Field 'rect' is really named 'r'. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] vb2: fix vb2 state check when start_streaming fails	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit bd994ddb2a12a3ff48cd549ec82cdceaea9614df (vb2: Fix stream start and buffer completion race) broke the buffer state check in vb2_buffer_done. So accept all three possible states there since I can no longer tell the difference between vb2_buffer_done called from start_streaming or from elsewhere. Instead add a WARN_ON at the end of start_streaming that will check whether any buffers were added to the done list, since that implies that the wrong state was used as well. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: stable@vger.kernel.org # for v3.15 and up Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] videobuf2-core.h: fix comment	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The comment for start_streaming that tells the developer with which vb2 state buffers should be returned to vb2 gave the wrong state. Very confusing. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] videobuf2-core: add comments before the WARN_ON	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recently WARN_ON() calls have been added to warn if the driver is not properly returning buffers to vb2 in start_streaming (if it fails) or stop_streaming(). Add comments before those WARN_ON calls that refer to the videobuf2-core.h header that explains what drivers are supposed to do in these situations. That should help point developers in the right direction if they see these warnings. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Acked-by: Pawel Osciak <pawel@osciak.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
\| * \| \| \|	[media] videobuf2-dma-sg: fix for wrong GFP mask to sg_alloc_table_from_pages	Hans Verkuil	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sg_alloc_table_from_pages() only allocates a sg_table, so it should just use GFP_KERNEL, not gfp_flags. If gfp_flags contains __GFP_DMA32 then mm/sl[au]b.c will call BUG_ON: [ 358.027515] ------------[ cut here ]------------ [ 358.027546] kernel BUG at mm/slub.c:1416! [ 358.027558] invalid opcode: 0000 [#1] PREEMPT SMP [ 358.027576] Modules linked in: mt2131 s5h1409 tda8290 tuner cx25840 cx23885 btcx_risc altera_ci tda18271 altera_stapl videobuf2_dvb tveeprom cx2341x videobuf2_dma_sg dvb_core rc_core videobuf2_memops videobuf2_core nouveau zr36067 videocodec v4l2_common videodev media x86_pkg_temp_thermal cfbfillrect cfbimgblt cfbcopyarea ttm drm_kms_helper processor button isci [ 358.027712] CPU: 19 PID: 3654 Comm: cat Not tainted 3.16.0-rc6-telek #167 [ 358.027723] Hardware name: ASUSTeK COMPUTER INC. Z9PE-D8 WS/Z9PE-D8 WS, BIOS 5404 02/10/2014 [ 358.027741] task: ffff880897c7d960 ti: ffff88089b4d4000 task.ti: ffff88089b4d4000 [ 358.027753] RIP: 0010:[<ffffffff81196040>] [<ffffffff81196040>] new_slab+0x280/0x320 [ 358.027776] RSP: 0018:ffff88089b4d7ae8 EFLAGS: 00010002 [ 358.027787] RAX: ffff880897c7d960 RBX: 0000000000000000 RCX: ffff88089b4d7b50 [ 358.027798] RDX: 00000000ffffffff RSI: 0000000000000004 RDI: ffff88089f803b00 [ 358.027809] RBP: ffff88089b4d7bb8 R08: 0000000000000000 R09: 0000000100400040 [ 358.027821] R10: 0000160000000000 R11: ffff88109bc02c40 R12: 0000000000000001 [ 358.027832] R13: ffff88089f8000c0 R14: ffff88089f803b00 R15: ffff8810bfcf4be0 [ 358.027845] FS: 00007f83fe5c0700(0000) GS:ffff8810bfce0000(0000) knlGS:0000000000000000 [ 358.027858] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 358.027868] CR2: 0000000001dfd568 CR3: 0000001097d5a000 CR4: 00000000000407e0 [ 358.027878] Stack: [ 358.027885] ffffffff81198860 ffff8810bfcf4be0 ffff880897c7d960 0000000000001b00 [ 358.027905] ffff880897c7d960 0000000000000000 ffff8810bfcf4bf0 0000000000000000 [ 358.027924] 0000000000000000 0000000100000100 ffffffff813ef84a 00000004ffffffff [ 358.027944] Call Trace: [ 358.027956] [<ffffffff81198860>] ? __slab_alloc+0x400/0x4e0 [ 358.027973] [<ffffffff813ef84a>] ? sg_kmalloc+0x1a/0x30 [ 358.027985] [<ffffffff81198f17>] __kmalloc+0x127/0x150 [ 358.027997] [<ffffffff813ef84a>] ? sg_kmalloc+0x1a/0x30 [ 358.028009] [<ffffffff813ef84a>] sg_kmalloc+0x1a/0x30 [ 358.028023] [<ffffffff813eff84>] __sg_alloc_table+0x74/0x180 [ 358.028035] [<ffffffff813ef830>] ? sg_kfree+0x20/0x20 [ 358.028048] [<ffffffff813f00af>] sg_alloc_table+0x1f/0x60 [ 358.028061] [<ffffffff813f0174>] sg_alloc_table_from_pages+0x84/0x1f0 [ 358.028077] [<ffffffffa007c3f9>] vb2_dma_sg_alloc+0x159/0x230 [videobuf2_dma_sg] [ 358.028095] [<ffffffffa003d55a>] __vb2_queue_alloc+0x10a/0x680 [videobuf2_core] [ 358.028113] [<ffffffffa003e110>] __reqbufs.isra.14+0x220/0x3e0 [videobuf2_core] [ 358.028130] [<ffffffffa003e79d>] __vb2_init_fileio+0xbd/0x380 [videobuf2_core] [ 358.028147] [<ffffffffa003f563>] __vb2_perform_fileio+0x5b3/0x6e0 [videobuf2_core] [ 358.028164] [<ffffffffa003f871>] vb2_fop_read+0xb1/0x100 [videobuf2_core] [ 358.028184] [<ffffffffa06dd2e5>] v4l2_read+0x65/0xb0 [videodev] [ 358.028198] [<ffffffff811a243f>] vfs_read+0x8f/0x170 [ 358.028210] [<ffffffff811a30a1>] SyS_read+0x41/0xb0 [ 358.028224] [<ffffffff818f02e9>] system_call_fastpath+0x16/0x1b [ 358.028234] Code: 66 90 e9 dc fd ff ff 0f 1f 40 00 41 8b 4d 68 e9 d5 fe ff ff 0f 1f 80 00 00 00 00 f0 41 80 4d 00 40 e9 03 ff ff ff 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 44 89 c6 4c 89 45 d0 e8 0c 82 ff ff 48 [ 358.028415] RIP [<ffffffff81196040>] new_slab+0x280/0x320 [ 358.028432] RSP <ffff88089b4d7ae8> [ 358.032208] ---[ end trace 6443240199c706e4 ]--- Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Cc: stable@vger.kernel.org # for v3.13 and up Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
* \| \| \| \|	Merge tag 'md/3.17-more-fixes' of git://git.neil.brown.name/md	Linus Torvalds	2014-09-24
\|\ \ \ \ \ \| \|_\|_\|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull bugfixes for md/raid1 from Neil Brown: "It is amazing how much easier it is to find bugs when you know one is there. Two bug reports resulted in finding 7 bugs! All are tagged for -stable. Those that can't cause (rare) data corruption, cause lockups. Particularly, but not only, fixing new "resync" code" * tag 'md/3.17-more-fixes' of git://git.neil.brown.name/md: md/raid1: fix_read_error should act on all non-faulty devices. md/raid1: count resync requests in nr_pending. md/raid1: update next_resync under resync_lock. md/raid1: Don't use next_resync to determine how far resync has progressed md/raid1: make sure resync waits for conflicting writes to complete. md/raid1: clean up request counts properly in close_sync() md/raid1: be more cautious where we read-balance during resync. md/raid1: intialise start_next_window for READ case to avoid hang
\| * \| \| \|	md/raid1: fix_read_error should act on all non-faulty devices.	NeilBrown	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a devices is being recovered it is not InSync and is not Faulty. If a read error is experienced on that device, fix_read_error() will be called, but it ignores non-InSync devices. So it will neither fix the error nor fail the device. It is incorrect that fix_read_error() ignores non-InSync devices. It should only ignore Faulty devices. So fix it. This became a bug when we allowed reading from a device that was being recovered. It is suitable for any subsequent -stable kernel. Fixes: da8840a747c0dbf49506ec906757a6b87b9741e9 Cc: stable@vger.kernel.org (v3.5+) Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
\| * \| \| \|	md/raid1: count resync requests in nr_pending.	NeilBrown	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both normal IO and resync IO can be retried with reschedule_retry() and so be counted into ->nr_queued, but only normal IO gets counted in ->nr_pending. Before the recent improvement to RAID1 resync there could only possibly have been one or the other on the queue. When handling a read failure it could only be normal IO. So when handle_read_error() called freeze_array() the fact that freeze_array only compares ->nr_queued against ->nr_pending was safe. But now that these two types can interleave, we can have both normal and resync IO requests queued, so we need to count them both in nr_pending. This error can lead to freeze_array() hanging if there is a read error, so it is suitable for -stable. Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761 cc: stable@vger.kernel.org (v3.13+) Reported-by: Brassow Jonathan <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
\| * \| \| \|	md/raid1: update next_resync under resync_lock.	NeilBrown	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	raise_barrier() uses next_resync as part of its calculations, so it really should be updated first, instead of afterwards. next_resync is always used under resync_lock so update it under resync lock to, just before it is used. That is safest. This could cause normal IO and resync IO to interact badly so it suitable for -stable. Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761 cc: stable@vger.kernel.org (v3.13+) Signed-off-by: NeilBrown <neilb@suse.de>
\| * \| \| \|	md/raid1: Don't use next_resync to determine how far resync has progressed	NeilBrown	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	next_resync is (approximately) the location for the next resync request. However it does not reliably determine the earliest location at which resync might be happening. This is because resync requests can complete out of order, and we only limit the number of current requests, not the distance from the earliest pending request to the latest. mddev->curr_resync_completed is a reliable indicator of the earliest position at which resync could be happening. It is updated less frequently, but is actually reliable which is more important. So use it to determine if a write request is before the region being resynced and so safe from conflict. This error can allow resync IO to interfere with normal IO which could lead to data corruption. Hence: stable. Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761 cc: stable@vger.kernel.org (v3.13+) Signed-off-by: NeilBrown <neilb@suse.de>
\| * \| \| \|	md/raid1: make sure resync waits for conflicting writes to complete.	NeilBrown	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The resync/recovery process for raid1 was recently changed so that writes could happen in parallel with resync providing they were in different regions of the device. There is a problem though: While a write request will always wait for conflicting resync to complete, a resync request will not always wait for conflicting writes to complete. Two changes are needed to fix this: 1/ raise_barrier (which waits until it is safe to do resync) must wait until current_window_requests is zero 2/ wait_battier (which waits at the start of a new write request) must update current_window_requests if the request could possible conflict with a concurrent resync. As concurrent writes and resync can lead to data loss, this patch is suitable for -stable. Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761 Cc: stable@vger.kernel.org (v3.13+) Cc: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
\| * \| \| \|	md/raid1: clean up request counts properly in close_sync()	NeilBrown	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there are outstanding writes when close_sync is called, the change to ->start_next_window might cause them to decrement the wrong counter when they complete. Fix this by merging the two counters into the one that will be decremented. Having an incorrect value in a counter can cause raise_barrier() to hangs, so this is suitable for -stable. Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761 cc: stable@vger.kernel.org (v3.13+) Signed-off-by: NeilBrown <neilb@suse.de>
\| * \| \| \|	md/raid1: be more cautious where we read-balance during resync.	NeilBrown	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 79ef3a8aa1cb1523cc231c9a90a278333c21f761 made it possible for reads to happen concurrently with resync. This means that we need to be more careful where read_balancing is allowed during resync - we can no longer be sure that any resync that has already started will definitely finish. So keep read_balancing to before recovery_cp, which is conservative but safe. This bug makes it possible to read from a device that doesn't have up-to-date data, so it can cause data corruption. So it is suitable for any kernel since 3.11. Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761 cc: stable@vger.kernel.org (v3.13+) Signed-off-by: NeilBrown <neilb@suse.de>
\| * \| \| \|	md/raid1: intialise start_next_window for READ case to avoid hang	NeilBrown	2014-09-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r1_bio->start_next_window is not initialised in the READ case, so allow_barrier may incorrectly decrement conf->current_window_requests which can cause raise_barrier() to block forever. Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761 cc: stable@vger.kernel.org (v3.13+) Reported-by: Brassow Jonathan <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
* \| \| \| \|	Merge tag 'rdma-for-linus' of ↵	Linus Torvalds	2014-09-23
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma fixes from Roland Dreier: "Last late set of InfiniBand/RDMA fixes for 3.17: - fixes for the new memory region re-registration support - iSER initiator error path fixes - grab bag of small fixes for the qib and ocrdma hardware drivers - larger set of fixes for mlx4, especially in RoCE mode" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits) IB/mlx4: Fix VF mac handling in RoCE IB/mlx4: Do not allow APM under RoCE IB/mlx4: Don't update QP1 in native mode IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses IB/core: When marshaling uverbs path, clear unused fields IB/mlx4: Avoid executing gid task when device is being removed IB/mlx4: Fix lockdep splat for the iboe lock IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up IB/mlx4: Reorder steps in RoCE GID table initialization IB/mlx4: Don't duplicate the default RoCE GID IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs() IB/iser: Bump version to 1.4.1 IB/iser: Allow bind only when connection state is UP IB/iser: Fix RX/TX CQ resource leak on error flow RDMA/ocrdma: Use right macro in query AH RDMA/ocrdma: Resolve L2 address when creating user AH mlx4: Correct error flows in rereg_mr IB/qib: Correct reference counting in debugfs qp_stats IPoIB: Remove unnecessary port query ...
\| \| \ \ \ \
\| \| \ \ \ \
\| \| \ \ \ \
\| \| \ \ \ \
\| \| \ \ \ \
\| \| \ \ \ \
\| \| \ \ \ \
\| \| \ \ \ \
\| *-------. \ \ \ \	Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next	Roland Dreier	2014-09-22
\| \|\ \ \ \ \ \ \ \ \
\| \| \| \| \| \| * \| \| \| \|	IB/qib: Correct reference counting in debugfs qp_stats	Mike Marciniszyn	2014-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This particular reference count is not needed with the rcu protection, and the current code leaks a reference count, causing a hang in qib_qp_destroy(). Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| \| \| * \| \| \| \|	IB/qib: Change get_user_pages() usage to always NULL vmas	Mike Marciniszyn	2014-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The static helper routine, __qib_get_user_pages(), accepts a vma arg, but current use always passes NULL. This has caused some confusion associated with the correct use of this argument, but since the current use case doesn't require the flexiblity, the best thing to do is to simplfy the code to always pass NULL to get_user_pages(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| \| \| * \| \| \| \|	IB/ipath: Change get_user_pages() usage to always NULL vmas	Mike Marciniszyn	2014-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The static helper routine, __ipath_get_user_pages(), accepts a vma arg, but current use always passes NULL. This has caused some confusion associated with the correct use of this argument, but since the current use case doesn't require the flexiblity, the best thing to do is to simplfy the code to always pass NULL to get_user_pages(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| \| * \| \| \| \| \|	RDMA/ocrdma: Use right macro in query AH	devesh.sharma@emulex.com	2014-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ocrdma_query_ah() does not use correct macro, and checks the wrong bit for the validity of address handle in vector table. Fix this. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| \| * \| \| \| \| \|	RDMA/ocrdma: Resolve L2 address when creating user AH	devesh.sharma@emulex.com	2014-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because of IP-based GIDs, userspace AHs must have MAC and VLAN ID resolved separately. Presently, user AHs are broken for ocrdma. This patch resolves L2 addresses while creating user AH and obtains the right DMAC and VLAN ID before creating AH. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| \| * \| \| \| \| \|	RDMA/ocrdma: Do not skip setting deferred_arm	Devesh Sharma	2014-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ib_request_notify_cq() is called for the first time, ocrdma tries to skip setting deffered_arm flag. This may lead CQ to an un-armed state thus never generating a CQ event and leaving consumer hung. This patch removes the part of code that skips setting deferred_arm. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| \| * \| \| \| \| \|	RDMA/ocrdma: Report correct value of max_fast_reg_page_list_len	Devesh Sharma	2014-09-19
\| \| \| \| \| \|/ / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix ocrdma_query_device() to report correct value of max_fast_reg_page_list_len. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| * \| \| \| \| \|	IB/mlx4: Fix VF mac handling in RoCE	Jack Morgenstein	2014-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had several problems here. First, a race condition on QP1 mac handling between mlx4_ib_update_qps and mlx4_ib_modify_qp, which is fixed by taking the qp mutex in mlx4_ib_update_qps. Also, qp->pri.smac_port was not updated in mlx4_ib_update_qps. Last, in __mlx4_ib_modify_qp we did not properly handle the case where the mac is zero, but port is non-zero. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| * \| \| \| \| \|	IB/mlx4: Do not allow APM under RoCE	Jack Morgenstein	2014-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Automatic Path Migration is not supported under RoCE. Therefore, return a "not-supported" error if the caller attempts to set an alternate path in a QP context. In addition, if there are no IB ports configured, do not report APM capability in the device flags returned by mlx4_ib_query_device. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| * \| \| \| \| \|	IB/mlx4: Don't update QP1 in native mode	Jack Morgenstein	2014-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For native functions (non-SR-IOV), there's no reason to update the smac_index, as QP1 is a GSI QP. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| * \| \| \| \| \|	IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header	Jack Morgenstein	2014-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The source MAC is needed in RoCE when building the QP1 header. Currently, this is obtained from the source net device. However, the net device may not yet exist, or can be destroyed in parallel to this QP1 send operation (e.g through the VPI port change flow) so accessing it may cause a kernel crash. To fix this, we maintain a source MAC cache per port for the net device in struct mlx4_ib_roce. This cached MAC is initialized to be the default MAC address obtained during HCA initialization via QUERY_PORT. This cached MAC is updated via the netdev event notifier handler. Since the cached MAC is held in an atomic64 object, we do not need locking when accessing it. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| \| \| * \| \| \| \| \|	mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses	Jack Morgenstein	2014-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a chance that the VF mlx4 RoCE driver (mlx4_ib) may see a 0-mac as the current default MAC address when a RoCE interface first comes up. In this case, the RoCE driver registers the 0-mac to get its MAC index -- used in the INIT2RTR transition when it creates its proxy Q1 qp's. If we do not allow QP1 to be created, the RoCE driver will not come up. If we do not register the 0-mac, but simply use a random mac-index, QP1 will attempt to send packets with an someone's else source MAC which will get the system into more troubled. Since a 0-mac was previously used to indicate a free slot, this leads to errors, both when the 0-mac is registered and when it is unregistered. The required fix is to check in addition that the slot containing the 0-mac has a reference count of zero. Additionally, when comparing MAC addresses, need to mask out the 2 MSBs of the u64 mac on both sides of the comparison. Note that when the EN driver (mlx4_en) comes up, it set itself a proper mac --> the RoCE driver gets to be notified on that and further handing is done with the update qp command, as was added by commit 9433c188915c ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes"). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>