aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/hfi1/file_ops.c
Commit message (Collapse)AuthorAge
* IB/hfi1: Fix a subcontext memory leakMichael J. Ruhl2017-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 224d71f910102c966cdcd782c97e096d5e26e4da upstream. The only context that frees user_exp_rcv data structures is the last context closed (from a sub-context set). This leaks the allocations from the other sub-contexts. Separate the common frees from the specific frees and call them at the appropriate time. Using KEDR to check for memory leaks we get: Before test: [leak_check] Possible leaks: 25 After test: [leak_check] Possible leaks: 31 (6 leaked data structures) After patch applied (before and after test have the same value) [leak_check] Possible leaks: 25 Each leak is 192 + 13440 + 6720 = 20352 bytes per sub-context. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* IB/hfi1: Fix an Oops on pci device force removeTadeusz Struk2016-11-15
| | | | | | | | | | | | | | | This patch fixes an Oops on device unbind, when the device is used by a PSM user process. PSM processes access device resources which are freed on device removal. Similar protection exists in uverbs in ib_core for Verbs clients, but PSM doesn't use ib_uverbs hence a separate protection is required for PSM clients. Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Restore EPROM read abilityDean Luick2016-10-02
| | | | | | | | | | | | | | | Partially revert commit d07903174202 ("IB/hfi1: Remove EPROM functionality from data device"), bringing back the ability to read from the EPROM. This code will be used for driver-only acccess to the EPROM, hence change EPROM read to save to a buffer instead of copy touser. Also allow any offset and remove missed includes and leftover declarations. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Fix resource release in context allocationJakub Pawlak2016-10-02
| | | | | | | | | | | | | Correct resource free in allocate_ctxt() function. When context creation fails allocated resources are properly released and pointer in receive context data table is set back to NULL. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Fix user-space buffers mapping with IOMMU enabledTymoteusz Kielan2016-10-02
| | | | | | | | | | | | | | | The dma_XXX API functions return bus addresses which are physical addresses when IOMMU is disabled. Buffer mapping to user-space is done via remap_pfn_range() with PFN based on bus address instead of physical. This results in wrong pages being mapped to user-space when IOMMU is enabled. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Fix mm_struct use after freeIra Weiny2016-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing with CONFIG_SLUB_DEBUG_ON=y resulted in the kernel panic below. This is the result of the mm_struct sometimes being free'd prior to hfi1_file_close being called. This was due to the combination of 2 reasons: 1) hfi1_file_close is deferred in process exit and it therefore may not be called synchronously with process exit. 2) exit_mm is called prior to exit_files in do_exit. Normally this is ok however, our kernel bypass code requires us to have access to the mm_struct for house keeping both at "normal" close time as well as at process exit. Therefore, the fix is to simply keep a reference to the mm_struct until we are done with it. [ 3006.340150] general protection fault: 0000 [#1] SMP [ 3006.346469] Modules linked in: hfi1 rdmavt rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod snd_hda_code c_realtek iTCO_wdt snd_hda_codec_generic iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass c rct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw snd_hda_intel gf128mul snd_hda_codec glue_helper snd_hda_core ablk_helper sn d_hwdep cryptd snd_seq snd_seq_device snd_pcm snd_timer snd soundcore pcspkr shpchp mei_me sg lpc_ich mei i2c_i801 mfd_core ioatdma ipmi_devi ntf wmi ipmi_si ipmi_msghandler acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 jbd2 mbcache mlx4_en ib_core sr_mod s d_mod cdrom crc32c_intel mgag200 drm_kms_helper syscopyarea sysfillrect igb sysimgblt fb_sys_fops ptp mlx4_core ttm isci pps_core ahci drm li bsas libahci dca firewire_ohci i2c_algo_bit scsi_transport_sas firewire_core crc_itu_t i2c_core libata [last unloaded: mlx4_ib] [ 3006.461759] CPU: 16 PID: 11624 Comm: mpi_stress Not tainted 4.7.0-rc5+ #1 [ 3006.469915] Hardware name: Intel Corporation W2600CR ........../W2600CR, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 3006.483027] task: ffff8804102f0040 ti: ffff8804102f8000 task.ti: ffff8804102f8000 [ 3006.491971] RIP: 0010:[<ffffffff810f0383>] [<ffffffff810f0383>] __lock_acquire+0xb3/0x19e0 [ 3006.501905] RSP: 0018:ffff8804102fb908 EFLAGS: 00010002 [ 3006.508447] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000001 RCX: 0000000000000000 [ 3006.517012] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880410b56a40 [ 3006.525569] RBP: ffff8804102fb9b0 R08: 0000000000000001 R09: 0000000000000000 [ 3006.534119] R10: ffff8804102f0040 R11: 0000000000000000 R12: 0000000000000000 [ 3006.542664] R13: ffff880410b56a40 R14: 0000000000000000 R15: 0000000000000000 [ 3006.551203] FS: 00007ff478c08700(0000) GS:ffff88042e200000(0000) knlGS:0000000000000000 [ 3006.560814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3006.567806] CR2: 00007f667f5109e0 CR3: 0000000001c06000 CR4: 00000000000406e0 [ 3006.576352] Stack: [ 3006.579157] ffffffff8124b819 ffffffffffffffff 0000000000000000 ffff8804102fb940 [ 3006.588072] 0000000000000002 0000000000000000 ffff8804102f0040 0000000000000007 [ 3006.596971] 0000000000000006 ffff8803cad6f000 0000000000000000 ffff8804102f0040 [ 3006.605878] Call Trace: [ 3006.609220] [<ffffffff8124b819>] ? uncharge_batch+0x109/0x250 [ 3006.616382] [<ffffffff810f2313>] lock_acquire+0xd3/0x220 [ 3006.623056] [<ffffffffa0a30bfc>] ? hfi1_release_user_pages+0x7c/0xa0 [hfi1] [ 3006.631593] [<ffffffff81775579>] down_write+0x49/0x80 [ 3006.638022] [<ffffffffa0a30bfc>] ? hfi1_release_user_pages+0x7c/0xa0 [hfi1] [ 3006.646569] [<ffffffffa0a30bfc>] hfi1_release_user_pages+0x7c/0xa0 [hfi1] [ 3006.654898] [<ffffffffa0a2efb6>] cacheless_tid_rb_remove+0x106/0x330 [hfi1] [ 3006.663417] [<ffffffff810efd36>] ? mark_held_locks+0x66/0x90 [ 3006.670498] [<ffffffff817771f6>] ? _raw_spin_unlock_irqrestore+0x36/0x60 [ 3006.678741] [<ffffffffa0a2f1ee>] tid_rb_remove+0xe/0x10 [hfi1] [ 3006.686010] [<ffffffffa0a0c5d5>] hfi1_mmu_rb_unregister+0xc5/0x100 [hfi1] [ 3006.694387] [<ffffffffa0a2fcb9>] hfi1_user_exp_rcv_free+0x39/0x120 [hfi1] [ 3006.702732] [<ffffffffa09fc6ea>] hfi1_file_close+0x17a/0x330 [hfi1] [ 3006.710489] [<ffffffff81263e9a>] __fput+0xfa/0x230 [ 3006.716595] [<ffffffff8126400e>] ____fput+0xe/0x10 [ 3006.722696] [<ffffffff810b95c6>] task_work_run+0x86/0xc0 [ 3006.729379] [<ffffffff81099933>] do_exit+0x323/0xc40 [ 3006.735672] [<ffffffff8109a2dc>] do_group_exit+0x4c/0xc0 [ 3006.742371] [<ffffffff810a7f55>] get_signal+0x345/0x940 [ 3006.748958] [<ffffffff810340c7>] do_signal+0x37/0x700 [ 3006.755328] [<ffffffff8127872a>] ? poll_select_set_timeout+0x5a/0x90 [ 3006.763146] [<ffffffff811609cb>] ? __audit_syscall_exit+0x1db/0x260 [ 3006.770853] [<ffffffff8110f3e3>] ? rcu_read_lock_sched_held+0x93/0xa0 [ 3006.778765] [<ffffffff812347a4>] ? kfree+0x1e4/0x2a0 [ 3006.784986] [<ffffffff8108e75a>] ? exit_to_usermode_loop+0x33/0xac [ 3006.792551] [<ffffffff8108e785>] exit_to_usermode_loop+0x5e/0xac [ 3006.799907] [<ffffffff81003dca>] do_syscall_64+0x12a/0x190 [ 3006.806664] [<ffffffff81777a7f>] entry_SYSCALL64_slow_path+0x25/0x25 [ 3006.814396] Code: 24 08 44 89 44 24 10 89 4c 24 18 e8 a8 d8 ff ff 48 85 c0 8b 4c 24 18 44 8b 44 24 10 44 8b 4c 24 08 4c 8b 14 24 0f 84 30 08 00 00 <f0> ff 80 98 01 00 00 8b 3d 48 ad be 01 45 8b a2 90 0b 00 00 85 [ 3006.837158] RIP [<ffffffff810f0383>] __lock_acquire+0xb3/0x19e0 [ 3006.844401] RSP <ffff8804102fb908> [ 3006.851170] ---[ end trace b7b9f21cf06c27df ]--- [ 3006.927420] Kernel panic - not syncing: Fatal exception [ 3006.933954] Kernel Offset: disabled [ 3006.940961] ---[ end Kernel panic - not syncing: Fatal exception [ 3006.948249] ------------[ cut here ]------------ Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent") Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Add missing error code assignment before testChristophe Jaillet2016-08-22
| | | | | | | | It is likely that checking the result of 'setup_ctxt' is expected here. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* Merge tag 'for-linus-2' of ↵Linus Torvalds2016-08-04
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
| * IB/hfi1: Fix TID caching actionsDean Luick2016-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | Per file descriptor TID caching actions depend on a global that can change midway through the lifetime of that file descriptor. Make the use of caching consistent for the life of the file descriptor by using the presence of the cache handler to decide when to use the cache functions. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Make use of mm consistentIra Weiny2016-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is opened, and unregisters it when the device is closed. The driver incorrectly assumes that the close will always happen from the same context as the open. In particular, closes due to SIGKILL or OOM killer activity may happen from a different context. In these cases, the wrong mm is passed to mmu_notifier_unregister(), which causes improper reference counting for the victim mm, and eventual memory corruption. Preserve the mm for all open file descriptors and use this mm rather than current->mm for memory operations for the lifetime of that fd. Note: this patch leaves 1 use of current->mm in place. This use is removed in a follow on patch because other functional changes were required prior to that use being removed. If registration fails, there is no reason to keep the handler object around. Free the handler object rather than add it to the list to prevent any mmu_notifier operations, including unregister, when registration fails. Suggested-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Use the same capability state for all shared contextsDean Luick2016-08-02
| | | | | | | | | | | | | | | | | | | | | | Save the current capability state at user context creation time. Report this saved value for all shared contexts. Also get rid of unnecessary hfi1_get_base_kinfo function. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Restructure hfi1_file_openIra Weiny2016-08-02
| | | | | | | | | | | | | | | | Rearrange the file open call in prep for new changes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Remove unused uctxt->subpid and uctxt->pidDean Luick2016-08-02
| | | | | | | | | | | | | | | | These are no longer needed. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Refine user process affinity algorithmSebastian Sanchez2016-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When performing process affinity recommendations for MPI ranks, the current algorithm doesn't take into account multiple HFI units. Also, real cores and HT cores are not distinguished from one another. Therefore, all HT cores are recommended to be assigned first within the local NUMA node before recommending the assignments of cores in other NUMA nodes. It's ideal to assign all real cores across all NUMA nodes first, then all HT 1 cores, then all HT 2 cores, and so on to balance CPU workload. CPU cores in other NUMA nodes could be running interrupt handlers, and this is not taken into account. To balance the CPU workload for user processes, the following recommendation algorithm is used: For each user process that is opening a context on HFI Y: a) If all cores are assigned to user processes, start assignments all over from the first core b) Assign real cores first, then HT cores (First set of HT cores on all physical cores, then second set of HT cores, and, so on) in the following order: 1. Same NUMA node as HFI Y and not running an IRQ handler 2. Same NUMA node as HFI Y and running an IRQ handler 3. Different NUMA node to HFI Y and not running an IRQ handler 4. Different NUMA node to HFI Y and running an IRQ handler c) Mark core as assigned in the global affinity structure. As user processes are done, remove core assignments from global affinity structure. This implementation allows an arbitrary number of HT cores and provides support for multiple HFIs. This is being included in the kernel rather than user space due to the fact that user space has no way of knowing the CPU recommendations for contexts running as part of other jobs. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/hfi1: Remove unnecessary done label in hfi1_write_iterIra Weiny2016-08-02
| | | | | | | | | | | | | | | | | | Simple code clean up of hfi1_write_iter. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | Merge tag 'for-linus' of ↵Linus Torvalds2016-08-04
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
| * IB/hfi1: NULL arg to sc_return_credits is OKMarkus Elfring2016-08-03
| | | | | | | | | | | | | | | | | | | | The sc_return_credits() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | IB/hfi1: Prevent context lossIra Weiny2016-06-17
|/ | | | | | | | | If a context has already been assigned to an FD, prevent another assignment. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Move driver out of stagingDennis Dalessandro2016-05-26
The TODO list for the hfi1 driver was completed during 4.6. In addition other objections raised (which are far beyond what was in the TODO list) have been addressed as well. It is now time to remove the driver from staging and into the drivers/infiniband sub-tree. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>