Commit graph

2285 commits

Author SHA1 Message Date
Linus Torvalds 1ea406c0e0 Main batch of InfiniBand/RDMA changes for 3.13:
- Re-enable flow steering verbs with new improved userspace ABI
  - Fixes for slow connection due to GID lookup scalability
  - IPoIB fixes
  - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
  - Further improvements to SRP error handling
  - Add new transport type for Cisco usNIC
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2
 r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj
 y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm
 o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3
 gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546
 yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7
 uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn
 fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT
 wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg
 GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy
 5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz
 VDvT9DEy6zjSMCLpMcdo
 =xxC+
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:
 - Re-enable flow steering verbs with new improved userspace ABI
 - Fixes for slow connection due to GID lookup scalability
 - IPoIB fixes
 - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
 - Further improvements to SRP error handling
 - Add new transport type for Cisco usNIC

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  IB/core: Re-enable create_flow/destroy_flow uverbs
  IB/core: extended command: an improved infrastructure for uverbs commands
  IB/core: Remove ib_uverbs_flow_spec structure from userspace
  IB/core: Use a common header for uverbs flow_specs
  IB/core: Make uverbs flow structure use names like verbs ones
  IB/core: Rename 'flow' structs to match other uverbs structs
  IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
  IB/ucma: Convert use of typedef ctl_table to struct ctl_table
  IB/cm: Convert to using idr_alloc_cyclic()
  IB/mlx5: Fix page shift in create CQ for userspace
  IB/mlx4: Fix device max capabilities check
  IB/mlx5: Fix list_del of empty list
  IB/mlx5: Remove dead code
  IB/core: Encorce MR access rights rules on kernel consumers
  IB/mlx4: Fix endless loop in resize CQ
  RDMA/cma: Remove unused argument and minor dead code
  RDMA/ucma: Discard events for IDs not yet claimed by user space
  IB/core: Add Cisco usNIC rdma node and transport types
  RDMA/nes: Remove self-assignment from nes_query_qp()
  IB/srp: Report receive errors correctly
  ...
2013-11-18 15:36:04 -08:00
Roland Dreier b4fdf52b3f Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next 2013-11-17 08:22:19 -08:00
Matan Barak 69ad5da41b IB/core: Re-enable create_flow/destroy_flow uverbs
This commit reverts commit 7afbddfae9 ("IB/core: Temporarily disable
create_flow/destroy_flow uverbs").  Since the uverbs extensions
functionality was experimental for v3.12, this patch re-enables the
support for them and flow-steering for v3.13.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-17 08:22:09 -08:00
Yann Droneaud f21519b23c IB/core: extended command: an improved infrastructure for uverbs commands
Commit 400dbc9658 ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.

According to the commit 400dbc9658, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.

But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].

So the following patch is an attempt to a revised extensible command
infrastructure.

This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.

Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.

Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).

So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).

The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.

Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.

The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.

Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:

                             legacy      extended

   Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
  Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)

For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".

One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.

The proposed scheme will format input (command) and output (response)
buffers this way:

- command:

  legacy header +
  extended header +
  command data (core + hw):

    +----------------------------------------+
    | flags     |   00      00    |  command |
    |        in_words    |   out_words       |
    +----------------------------------------+
    |                 response               |
    |                 response               |
    | provider_in_words | provider_out_words |
    |                 padding                |
    +----------------------------------------+
    |                                        |
    .              <uverbs input>            .
    .              (in_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .             <provider input>           .
    .          (provider_in_words * 8)       .
    |                                        |
    +----------------------------------------+

- response, if present:

    +----------------------------------------+
    |                                        |
    .          <uverbs output space>         .
    .             (out_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .         <provider output space>        .
    .         (provider_out_words * 8)       .
    |                                        |
    +----------------------------------------+

The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.

Note:

The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility).  This was suggested by Roland Dreier in a previous
review[2].  But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.

[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com

[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com

[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

[ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-17 08:22:09 -08:00
Eli Cohen cf1c5e1f1c IB/mlx5: Fix page shift in create CQ for userspace
When creating a CQ, we must use mlx5 adapter page shift.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15 14:36:36 -08:00
Eli Cohen 79d3da9c51 IB/mlx4: Fix device max capabilities check
Move the check on max supported CQEs after the final number of entries is
evaluated.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15 14:36:36 -08:00
Eli Cohen 7e2e19210a IB/mlx5: Remove dead code
The value of the local variable index is never used in reg_mr_callback().

Signed-off-by: Eli Cohen <eli@mellanox.com>

[ Remove now-unused variable delta too.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15 14:36:14 -08:00
Eli Cohen 93b80ac297 IB/mlx4: Fix endless loop in resize CQ
When calling get_sw_cqe() we need pass the consumer_index and not the
masked value. Failure to do so will cause incorrect result of
get_sw_cqe() possibly leading to endless loop.

This problem was reported and analyzed by Michael Rice from HP.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15 10:24:17 -08:00
Linus Torvalds 2f466d33f5 PCI changes for the v3.13 merge window:
Resource management
     - Fix host bridge window coalescing (Alexey Neyman)
     - Pass type, width, and prefetchability for window alignment (Wei Yang)
 
   PCI device hotplug
     - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu)
 
   Power management
     - Remove pci_pm_complete() (Liu Chuansheng)
 
   MSI
     - Fail initialization if device is not in PCI_D0 (Yijing Wang)
 
   MPS (Max Payload Size)
     - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang)
     - Use pcie_set_readrq() to simplify code (Yijing Wang)
     - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang)
 
   SR-IOV
     - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas)
     - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang)
 
   Virtualization
     - Add x86 MSI masking ops (Konrad Rzeszutek Wilk)
 
   Freescale i.MX6
     - Support i.MX6 PCIe controller (Sean Cross)
     - Increase link startup timeout (Marek Vasut)
     - Probe PCIe in fs_initcall() (Marek Vasut)
     - Fix imprecise abort handler (Tim Harvey)
     - Remove redundant of_match_ptr (Sachin Kamat)
 
   Renesas R-Car
     - Support Gen2 internal PCIe controller (Valentine Barshak)
 
   Samsung Exynos
     - Add MSI support (Jingoo Han)
     - Turn off power when link fails (Jingoo Han)
     - Add Jingoo Han as maintainer (Jingoo Han)
     - Add clk_disable_unprepare() on error path (Wei Yongjun)
     - Remove redundant of_match_ptr (Sachin Kamat)
 
   Synopsys DesignWare
     - Add irq_create_mapping() (Pratyush Anand)
     - Add header guards (Seungwon Jeon)
 
   Miscellaneous
     - Enable native PCIe services by default on non-ACPI (Andrew Murray)
     - Cleanup _OSC usage and messages (Bjorn Helgaas)
     - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas)
     - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman)
     - Remove unused pci_mem_start (Myron Stowe)
     - Make sysfs functions static (Sachin Kamat)
     - Warn on invalid return from driver probe (Stephen M. Cameron)
     - Remove Intel Haswell D3 delays (Todd E Brandt)
     - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu)
     - Use pci_is_pcie() to simplify code (Yijing Wang)
     - Use PCIe capability accessors to simplify code (Yijing Wang)
     - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang)
     - Removed unused "is_pcie" from struct pci_dev (Yijing Wang)
     - Simplify sysfs CPU affinity implementation (Yijing Wang))
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSgUzsAAoJEFmIoMA60/r8wmsQAJhwmtkUYR2L4T1g9smAyjJz
 bLm5zoC6WdywFcbTpTBfsTrS1CHIQG5akRgkEXGdr99epiho5F2lwmagWsUR4ijL
 39Qn3knAUMgtNjoVXXI106h/DfTyxSmkZBfih2AQFyWobJq+0kg7hjQQA3+836b4
 8ssWr1+NSl6JJTqYQ0Paw1kSqvvYoXsu5rWFEfCHk8D0s/1bvr5ldAUpk2jTg93I
 uo9/5+O264yt1YoKZOMqAMZLUfd5DaWY1mV3yeF0Uauy1pBmol5csE8ckqJPDrES
 PRdJT1+PhBeLYWcgXANOBZsW58ddxA0pQ5jQV6VJHQWsm5cE82OBpYJf6xUZ2moV
 o6DZ0KRnCPVA3NllYYR16H+wbMfADwwO83QoA+QTIZJy/WgpDH3Cst+m8KePGqbL
 uFgDdXSws9Bs1BCFs7bfYzAM3OdkBFnn+ac7JoPXKP5ibgAp9nDlurgK2r90zRnp
 j15vHMx0mV+e8B8/iwiW5eRtg7NoCHYiNfFy7JalOlsPmYr2KFazBVKclp13Hng7
 fe/Jy6X4UhWoQPdqsy4ftvSQb0gm1MClxFJeZ3VAt6LY9j8OP6S/Vdf6lpAL85KR
 lAQoQzB+lOhTPdXxFY2xgGkITkqPDOQMjPfowYUYFwybqBuG6BHXZPJobL+niBlb
 Nh+M2WlUUA9Z3V6rWJB6
 =CTPk
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Resource management
    - Fix host bridge window coalescing (Alexey Neyman)
    - Pass type, width, and prefetchability for window alignment (Wei Yang)

  PCI device hotplug
    - Convert acpiphp, acpiphp_ibm to dynamic debug (Lan Tianyu)

  Power management
    - Remove pci_pm_complete() (Liu Chuansheng)

  MSI
    - Fail initialization if device is not in PCI_D0 (Yijing Wang)

  MPS (Max Payload Size)
    - Use pcie_get_mps() and pcie_set_mps() to simplify code (Yijing Wang)
    - Use pcie_set_readrq() to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_mpss to simplify code (Yijing Wang)

  SR-IOV
    - Enable upstream bridges even for VFs on virtual buses (Bjorn Helgaas)
    - Use pci_is_root_bus() to avoid catching virtual buses (Wei Yang)

  Virtualization
    - Add x86 MSI masking ops (Konrad Rzeszutek Wilk)

  Freescale i.MX6
    - Support i.MX6 PCIe controller (Sean Cross)
    - Increase link startup timeout (Marek Vasut)
    - Probe PCIe in fs_initcall() (Marek Vasut)
    - Fix imprecise abort handler (Tim Harvey)
    - Remove redundant of_match_ptr (Sachin Kamat)

  Renesas R-Car
    - Support Gen2 internal PCIe controller (Valentine Barshak)

  Samsung Exynos
    - Add MSI support (Jingoo Han)
    - Turn off power when link fails (Jingoo Han)
    - Add Jingoo Han as maintainer (Jingoo Han)
    - Add clk_disable_unprepare() on error path (Wei Yongjun)
    - Remove redundant of_match_ptr (Sachin Kamat)

  Synopsys DesignWare
    - Add irq_create_mapping() (Pratyush Anand)
    - Add header guards (Seungwon Jeon)

  Miscellaneous
    - Enable native PCIe services by default on non-ACPI (Andrew Murray)
    - Cleanup _OSC usage and messages (Bjorn Helgaas)
    - Remove pcibios_last_bus boot option on non-x86 (Bjorn Helgaas)
    - Convert bus code to use bus_, drv_, and dev_groups (Greg Kroah-Hartman)
    - Remove unused pci_mem_start (Myron Stowe)
    - Make sysfs functions static (Sachin Kamat)
    - Warn on invalid return from driver probe (Stephen M. Cameron)
    - Remove Intel Haswell D3 delays (Todd E Brandt)
    - Call pci_set_master() in core if driver doesn't do it (Yinghai Lu)
    - Use pci_is_pcie() to simplify code (Yijing Wang)
    - Use PCIe capability accessors to simplify code (Yijing Wang)
    - Use cached pci_dev->pcie_cap to simplify code (Yijing Wang)
    - Removed unused "is_pcie" from struct pci_dev (Yijing Wang)
    - Simplify sysfs CPU affinity implementation (Yijing Wang)"

* tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (79 commits)
  PCI: Enable upstream bridges even for VFs on virtual buses
  PCI: Add pci_upstream_bridge()
  PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq()
  PCI: Warn on driver probe return value greater than zero
  PCI: Drop warning about drivers that don't use pci_set_master()
  PCI: Workaround missing pci_set_master in pci drivers
  powerpc/pci: Use pci_is_pcie() to simplify code [fix]
  PCI: Update pcie_ports 'auto' behavior for non-ACPI platforms
  PCI: imx6: Probe the PCIe in fs_initcall()
  PCI: Add R-Car Gen2 internal PCI support
  PCI: imx6: Remove redundant of_match_ptr
  PCI: Report pci_pme_active() kmalloc failure
  mn10300/PCI: Remove useless pcibios_last_bus
  frv/PCI: Remove pcibios_last_bus
  PCI: imx6: Increase link startup timeout
  PCI: exynos: Remove redundant of_match_ptr
  PCI: imx6: Fix imprecise abort handler
  PCI: Fail MSI/MSI-X initialization if device is not in PCI_D0
  PCI: imx6: Remove redundant dev_err() in imx6_pcie_probe()
  x86/PCI: Coalesce multiple overlapping host bridge windows
  ...
2013-11-14 14:02:00 +09:00
Al Viro 441a9d0e1e qib_fs: fix (some) dcache abuses
* lookup_one_len() really wants i_mutex held on directory.
* leaks galore - just mount ipathfs, then
cd /sys/bus/pci/drivers/qib_ib; echo *:*:*.* >unbind
on a box with that card present and try to umount ipathfs...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-13 08:08:19 -05:00
Dave Jones 4127c365c9 RDMA/nes: Remove self-assignment from nes_query_qp()
Assigning a value to itself is pointless.

Spotted with coverity, no hardware to test.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-09 02:34:27 -08:00
Mike Marciniszyn 2fadd83184 IB/qib: Fix txselect regression
Commit 7fac33014f54("IB/qib: checkpatch fixes") was overzealous in
removing a simple_strtoul for a parse routine, setup_txselect().  That
routine is required to handle a multi-value string.

Unwind that aspect of the fix.

Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:12 -08:00
Mike Marciniszyn 78a5886472 IB/qib: Fix checkpatch __packed warnings
Convert __attribute__ ((packed)) to __packed.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:12 -08:00
Jan Kara 603e772992 IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast()
qib_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
more interestingly the function qib_user_sdma_queue_pkts() (and also
qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault.

So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
leave mmap_sem locking for mm.

This deadlock has actually been observed in the wild when the node
is under memory pressure.

Cc: <stable@vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:11 -08:00
Jan Kara 4adcf7fb67 IB/ipath: Convert ipath_user_sdma_pin_pages() to use get_user_pages_fast()
ipath_user_sdma_queue_pkts() gets called with mmap_sem held for
writing.  Except for get_user_pages() deep down in
ipath_user_sdma_pin_pages() we don't seem to need mmap_sem at all.

Even more interestingly the function ipath_user_sdma_queue_pkts() (and
also ipath_user_sdma_coalesce() called somewhat later) call
copy_from_user() which can hit a page fault and we deadlock on trying
to get mmap_sem when handling that fault.  So just make
ipath_user_sdma_pin_pages() use get_user_pages_fast() and leave
mmap_sem locking for mm.

This deadlock has actually been observed in the wild when the node
is under memory pressure.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>

[ Merged in fix for call to get_user_pages_fast from Tetsuo Handa
  <penguin-kernel@I-love.SAKURA.ne.jp>.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:11 -08:00
Naresh Gottumukkala d5e3f37833 RDMA/ocrdma: Remove redundant check in ocrdma_build_fr()
Remove the redundant check of comparing if a 32-bit value is greater
than 0xffffffffULL.

Reported by Dan Carpenter.

Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:06 -08:00
Naresh Gottumukkala 1852d1da3b RDMA/ocrdma: Fix a crash in rmmod
1) ocrdma_remove_free() is called from a call_rcu callback funtion
   context, which can be a bottom-half context. So the code in
   ocrdma_remove_free should not sleep.

   But ocrdma_cleanup_hw() can sleep, So move it ocrdma_remove()
   instead of ocrdma_remove_free.

2) Fix a couple of kbuild test robot warnings.

Signed-off-by: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:06 -08:00
Dan Carpenter 6ebacdfc07 RDMA/ocrdma: Silence an integer underflow warning
We recently added a cap on "max_wqe_allocated" in 43a6b4025c
('RDMA/ocrdma: Create IRD queue fix').

My static checker complains that the cap has a problem because it
casts large values to negative.  "attrs->cap.max_send_wr" is a u32.
It comes from the user, but it's capped in ocrdma_check_qp_params() so
it can't wrap here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:05 -08:00
Eli Cohen 1b77d2bd75 mlx5: Use enum to indicate adapter page size
The Connect-IB adapter has an inherent page size which equals 4K.
Define an new enum that equals the page shift and use it instead of
using the value 12 throughout the code.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:01 -08:00
Eli Cohen c2a3431e61 IB/mlx5: Update opt param mask for RTS2RTS
RTS to RTS transition should allow update of alternate path.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:01 -08:00
Eli Cohen 07c9113fe8 IB/mlx5: Remove "Always false" comparison
mlx5_cur and mlx5_new cannot have negative values so remove the
redundant condition.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:01 -08:00
Eli Cohen 2d036fad94 IB/mlx5: Remove dead code in mr.c
In mlx5_mr_cache_init() the size variable is not used so remove it to
avoid compiler warnings when running with make W=1.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:00 -08:00
Eli Cohen bf0bf77f65 mlx5: Support communicating arbitrary host page size to firmware
Connect-IB firmware requires 4K pages to be communicated with the
driver. This patch breaks larger pages to 4K units to enable support
for architectures utilizing larger page size, such as PowerPC.  This
patch also fixes several places that referred to PAGE_SHIFT instead of
explicit 12 which is the inherent page shift on Connect-IB.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:00 -08:00
Moshe Lazer cfd8f1d49b IB/mlx5: Fix srq free in destroy qp
On destroy QP the driver walks over the relevant CQ and removes CQEs
reported for the destroyed QP.  It also frees the related SRQ entry
without checking that this is actually an SRQ-related CQE.  In case of
a CQ used for both send and receive QP, we could free SRQ entries for
send CQEs.  This patch resolves this issue by verifying that this is a
SRQ related CQE by checking the SRQ number in the CQE is not zero.

Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:59 -08:00
Eli Cohen 1faacf82df IB/mlx5: Simplify mlx5_ib_destroy_srq
Make use of destroy_srq_kernel() to clear SRQ resouces.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:59 -08:00
Eli Cohen 9641b74ebe IB/mlx5: Fix overflow check in IB_WR_FAST_REG_MR
Make sure not to overflow when reading the page list from struct
ib_fast_reg_page_list.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:59 -08:00
Eli Cohen 746b5583c1 IB/mlx5: Multithreaded create MR
Use asynchronous commands to execute up to eight concurrent create MR
commands. This is to fill memory caches faster so we keep consuming
from there.  Also, increase timeout for shrinking caches to five
minutes.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:59 -08:00
Eli Cohen 51ee86a4af IB/mlx5: Fix check of number of entries in create CQ
Verify that the value is non negative before rounding up to power of 2.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:58 -08:00
Ben Hutchings 649fb5ec0e IB/cxgb4: Fix formatting of physical address
Physical addresses may be wider than virtual addresses (e.g. on i386
with PAE) and must not be formatted with %p.

Compile-tested only.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:30 -08:00
Jack Morgenstein 571b8b92c7 net/mlx4_core: Initialize all mailbox buffers to zero before use
To guarantee that all unused fields in all FW commands for both inboxes
and outboxes are zeroed out, initialize the mailbox buffer to all zeroes.

This is especially important for SRIOV comm-channel virtual commands
(such as QUERY_FUNC_CAP), where if new fields are added to support new
features, the driver can depend on older kernels passing zeroes in these
fields.

In addition to zeroing out the mailbox buffer at allocation time, all
(now unnecessary) calls to memset by the callers of
mlx4_alloc_cmd_mailbox() are removed.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:22:47 -05:00
Jack Morgenstein 5a0d0a6161 mlx4: Structures and init/teardown for VF resource quotas
This is step #1 for implementing SRIOV resource quotas for VFs.

Quotas are implemented per resource type for VFs and the PF, to prevent
any entity from simply grabbing all the resources for itself and leaving
the other entities unable to obtain such resources.

Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                             VLAN, and Counters.

The quota system works as follows:
Each entity (VF or PF) is given a max number of a given resource (its quota),
and a guaranteed minimum number for each resource (starvation prevention).

For QPs, CQs, SRQs, MPTs and MTTs:
50% of the available quantity for the resource is divided equally among
the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
The other 50% is the "free pool", allocated on a first-come-first-serve basis.
For each VF/PF, resources are first allocated from its "guaranteed-minimum"
pool. When that pool is exhausted, the driver attempts to allocate from
the resource "free-pool".

The quota (i.e., max) for the VFs and the PF is:
  The free-pool amount (50% of the real max) + the guaranteed minimum

For MACs:
  Guarantee 2 MACs per VF/PF per port. As a result, since we have only
  128 MACs per port, reduce the allowable number of VFs from 64 to 63.
  Any remaining MACs are put into a free pool.

For VLANs:
  For the PF, the per-port quota is 128 and guarantee is 64
     (to allow the PF to register at least a VLAN per VF in VST mode).
  For the VFs, the per-port quota is 64 and the guarantee is 0.
      We assume that VGT VFs are trusted not to abuse the VLAN resource.

For Counters:
  For all functions (PF and VFs), the quota is 128 and the guarantee is 0.

In this patch, we define the needed structures, which are added to the
resource-tracker struct.  In addition, we do initialization
for the resource quota, and adjust the query_device response to use quotas
rather than resource maxima.

As part of the implementation, we introduce a new field in
mlx4_dev: quotas.  This field holds the resource quotas used
to report maxima to the upper layers (ib_core, via query_device).

The HCA maxima of these values are passed to the VFs (via
QUERY_HCA) so that they may continue to use these in handling
QPs, CQs, SRQs and MPTs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 16:19:07 -05:00
Yann Droneaud 7afbddfae9 IB/core: Temporarily disable create_flow/destroy_flow uverbs
The create_flow/destroy_flow uverbs and the associated extensions to
the user-kernel verbs ABI are under review and are too experimental to
freeze at this point.

So userspace is not exposed to experimental features and an uinstable
ABI, temporarily disable this for v3.12 (with a Kconfig option behind
staging to reenable it if desired).

The feature will be enabled after proper cleanup for v3.13.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com
Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com

[ Add a Kconfig option to reenable these verbs.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-21 09:44:17 -07:00
Roland Dreier 59b5b28d1a Merge branch 'misc' into for-next 2013-10-14 10:10:46 -07:00
Joe Perches 2b50176d11 IB: Remove unnecessary semicolons
These aren't necessary after switch blocks.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-14 10:10:00 -07:00
Eli Cohen 5431390707 IB/mlx5: Ensure proper synchronization accessing memory
Call mlx5_ib_populate_pas() before mapping the DMA buffer to ensure
the hardware reads the values written by the CPU.

Found by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:24:00 -07:00
Eli Cohen fe45f82704 IB/mlx5: Fix alignment of reg umr gather buffers
The hardware requires that gather buffers for UMR work requests be
aligned to 2K.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:59 -07:00
Sagi Grimberg ada9f5d007 IB/mlx5: Fix eq names to display nicely in /proc/interrupts
It's helpful for a driver to put the pci slot name in its interrupt
names, so /proc/interrupts will show the pci slot of the device.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:59 -07:00
Eli Cohen a4774e9095 IB/mlx5: Fix opt param mask according to firmware spec
Failed to configure opt mask to configure rre from init to rtr.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:58 -07:00
Eli Cohen 75959f56fe mlx5: Fix opt param mask for sq err to rts transition
Add missing entry in the table for UC transport.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:58 -07:00
Eli Cohen 81bea28ffd IB/mlx5: Disable atomic operations
Currently Atomic operations don't work properly.  Disable them for the
time being.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:58 -07:00
Eli Cohen a0c84c326f IB/mlx5: Avoid async events on invalid port number
On a single ported Connect-IB, its possible for the firmware to issue
events on the non-existing 2nd port.  Make sure to ignore events
generated for such ports.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:57 -07:00
Eli Cohen 203099fd73 IB/mlx5: Decrease memory consumption of mr caches
Change the logic so we do not allocate memory nor map the device
before actually posting to the REG_UMR QP. In addition, unmap and free
the memory after we get completion.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:56 -07:00
Moshe Lazer 56e1ab0f13 IB/mlx5: Fix memory leak in mlx5_ib_create_srq
The patch fixes the rollback in case of failure in creating SRQ.

Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:55 -07:00
Moshe Lazer 3c4619114c IB/mlx5: Flush cache workqueue before destroying it
Destroying the workqueue without flushing it first can lead to a case
in which the kernel tries to push a delayed work to the workqueue
which does not exist anymore.

Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:55 -07:00
Eli Cohen b125a54bfd IB/mlx5: Fix send work queue size calculation
1. Make sure wqe_cnt does not exceed the limit published by firmware.

2. There is no requirement that the number of outstanding work
   requests will be a power of two. Remove the ilog2 in the
   calculation of sq.max_post to fix that.

3. Add case for IB_QPT_XRC_TGT in sq_overhead and return 0 as XRC
   target QPs do not have a send queue.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-10 09:23:55 -07:00
Bjorn Helgaas 03078633a6 IB/qib: Drop qib_tune_pcie_caps() and qib_tune_pcie_coalesce() return values
The callers of qib_tune_pcie_caps() and qib_tune_pcie_coalesce() don't
check the return values, so this patch drops the return values altogether.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
2013-10-04 14:30:19 -06:00
Yijing Wang 0ce0e62f1f IB/qib: Use pcie_set_mps() and pcie_get_mps() to simplify code
Refactor qib_tune_pcie_caps().  Use pcie_get_mps(), pcie_set_mps(),
pcie_get_readrq(), and pcie_set_readrq() to simplify the code.  The PCI
core caches the "PCIe Max Payload Size Supported" in pci_dev->pcie_mpss,
so use that instead of pcie_capability_read_word().  Remove the unused
val2fld() and fld2val().

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
2013-09-24 14:17:06 -06:00
Yijing Wang dcaa73dc34 IB/qib: Use pci_is_root_bus() to check whether it is a root bus
Use pci_is_root_bus() instead of "if (bus->parent)" statement
for better readability.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
2013-09-24 12:13:03 -06:00
Martin Schwidefsky 0244ad004a Remove GENERIC_HARDIRQ config option
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-13 15:09:52 +02:00
Linus Torvalds 2e515bf096 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "The usual trivial updates all over the tree -- mostly typo fixes and
  documentation updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
  doc: Documentation/cputopology.txt fix typo
  treewide: Convert retrun typos to return
  Fix comment typo for init_cma_reserved_pageblock
  Documentation/trace: Correcting and extending tracepoint documentation
  mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
  power: Documentation: Update s2ram link
  doc: fix a typo in Documentation/00-INDEX
  Documentation/printk-formats.txt: No casts needed for u64/s64
  doc: Fix typo "is is" in Documentations
  treewide: Fix printks with 0x%#
  zram: doc fixes
  Documentation/kmemcheck: update kmemcheck documentation
  doc: documentation/hwspinlock.txt fix typo
  PM / Hibernate: add section for resume options
  doc: filesystems : Fix typo in Documentations/filesystems
  scsi/megaraid fixed several typos in comments
  ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
  treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
  page_isolation: Fix a comment typo in test_pages_isolated()
  doc: fix a typo about irq affinity
  ...
2013-09-06 09:36:28 -07:00