1
0
Fork 0
Commit Graph

513138 Commits (b3b3a99c5371e2e96a2c680e6ac20218bddbd422)

Author SHA1 Message Date
Jeff Kirsher 750c607e30 fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
The use of dropped doesn't really mean dropped mailbox messages, but
rather specifically messages which were too large to fit in the remote
Rx FIFO. Rename the stat to more clearly indicate what it means.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:56:08 -07:00
David S. Miller bae97d8410 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

A final pull request, I know it's very late but this time I think it's worth a
bit of rush.

The following patchset contains Netfilter/nf_tables updates for net-next, more
specifically concatenation support and dynamic stateful expression
instantiation.

This also comes with a couple of small patches. One to fix the ebtables.h
userspace header and another to get rid of an obsolete example file in tree
that describes a nf_tables expression.

This time, I decided to paste the original descriptions. This will result in a
rather large commit description, but I think these bytes to keep.

Patrick McHardy says:

====================
netfilter: nf_tables: concatenation support

The following patches add support for concatenations, which allow multi
dimensional exact matches in O(1).

The basic idea is to split the data registers, currently consisting of
4 registers of 16 bytes each, into smaller units, 16 registers of 4
bytes each, and making sure each register store always leaves the
full 32 bit in a well defined state, meaning smaller stores will
zero the remaining bits.

Based on that, we can load multiple adjacent registers with different
values, thereby building a concatenated bigger value, and use that
value for set lookups.

Sets are changed to use variable sized extensions for their key and
data values, removing the fixed limit of 16 bytes while saving memory
if less space is needed.

As a side effect, these patches will allow some nice optimizations in
the future, like using jhash2 in nft_hash, removing the masking in
nft_cmp_fast, optimized data comparison using 32 bit word size etc.
These are not done so far however.

The patches are split up as follows:

 * the first five patches add length validation to register loads and
   stores to make sure we stay within bounds and prepare the validation
   functions for the new addressing mode

 * the next patches prepare for changing to 32 bit addressing by
   introducing a struct nft_regs, which holds the verdict register as
   well as the data registers. The verdict members are moved to a new
   struct nft_verdict to allow to pull struct nft_data out of the stack.

 * the next patches contain preparatory conversions of expressions and
   sets to use 32 bit addressing

 * the next patch introduces so far unused register conversion helpers
   for parsing and dumping register numbers over netlink

 * following is the real conversion to 32 bit addressing, consisting of
   replacing struct nft_data in struct nft_regs by an array of u32s and
   actually translating and validating the new register numbers.

 * the final two patches add support for variable sized data items and
   variable sized keys / data in set elements

The patches have been verified to work correctly with nft binaries using
both old and new addressing.
====================

Patrick McHardy says:

====================
netfilter: nf_tables: dynamic stateful expression instantiation

The following patches are the grand finale of my nf_tables set work,
using all the building blocks put in place by the previous patches
to support something like iptables hashlimit, but a lot more powerful.

Sets are extended to allow attaching expressions to set elements.
The dynset expression dynamically instantiates these expressions
based on a template when creating new set elements and evaluates
them for all new or updated set members.

In combination with concatenations this effectively creates state
tables for arbitrary combinations of keys, using the existing
expression types to maintain that state. Regular set GC takes care
of purging expired states.

We currently support two different stateful expressions, counter
and limit. Using limit as a template we can express the functionality
of hashlimit, but completely unrestricted in the combination of keys.
Using counter we can perform accounting for arbitrary flows.

The following examples from patch 5/5 show some possibilities.
Userspace syntax is still WIP, especially the listing of state
tables will most likely be seperated from normal set listings
and use a more structured format:

1. Limit the rate of new SSH connections per host, similar to iptables
   hashlimit:

        flow ip saddr timeout 60s \
        limit 10/second \
        accept

2. Account network traffic between each set of /24 networks:

        flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
        counter

3. Account traffic to each host per user:

        flow skuid . ip daddr \
        counter

4. Account traffic for each combination of source address and TCP flags:

        flow ip saddr . tcp flags \
        counter

The resulting set content after a Xmas-scan look like this:

{
        192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
        192.168.122.1 . ack : counter packets 74 bytes 3848,
        192.168.122.1 . psh | ack : counter packets 35 bytes 3144
}

In the future the "expressions attached to elements" will be extended
to also support user created non-stateful expressions to allow to
efficiently select beween a set of parameter sets, f.i. a set of log
statements with different prefixes based on the interface, which currently
require one rule each. This will most likely have to wait until the next
kernel version though.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 18:51:19 -04:00
Jeff Kirsher a7731cc8a1 fm10k: update xcast mode before synchronizing multicast addresses
When the PF receives a request to update a multicast address for the VF,
it checks the enabled multicast mode first. Fix a bug where the VF tried
to set a multicast address before requesting the required xcast mode.
This ensures the multicast addresses are honored as long as the xcast
mode was allowed.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:49:04 -07:00
Jeff Kirsher 54b3c9cfad fm10k: start service timer on probe
Since the service task handles varying work that doesn't all require the
interface to be up, launch the service timer immediately. This ensures
that we continually check the mailbox, as well as handle other tasks
while the device is down.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:48:43 -07:00
Jeff Kirsher 4185756253 fm10k: fix function header comment
The header comment included a miscopy of a C-code line, and also
mis-used Rx FIFO when it clearly meant Tx FIFO

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:48:30 -07:00
Jeff Kirsher ada2411d4f fm10k: comment next_vf_mbx flow
Add a header comment explaining why we have the somewhat crazy mailbox
flow. This flow is necessary as it prevents the PF<->SM mailbox from
being flooded by the VF messages, which normally trigger a message to
the PF. This helps prevent the case where we see a PF mailbox timeout.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:48:17 -07:00
Jeff Kirsher 9de15bda37 fm10k: don't handle mailbox events in iov_event path and always process mailbox
Since we already schedule the service task, we can just wait for this
task to handle the mailbox events from the VF. This reduces some complex
code flow, and makes it so we have a single path for handling the VF
messages. There is a possibility that we have a slight delay in handling
VF messages, but it should be minimal.

The result of tx_complete and !rx_ready is insufficient to determine
whether we need to process the mailbox. There is a possible race
condition whereby the VF fills up the mbmem for us, but we have already
recently processed the mailboxes in the interrupt. During this time,
the interrupt is disabled. Thus, our Rx FIFO is empty, but the mbmem now
has data in it. Since we continually check whether Rx FIFO is empty, we
then never call process. This results in the possibility to prevent PF
from handling the VF mailbox messages.

Instead, just call process every time, despite the fact that we may or
may not have anything to process for the VF. There should be minimal
overhead for doing this, and it resolves an issue where the VF never
comes up due to never getting response for its SET_LPORT_STATE message.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:47:28 -07:00
Jeff Kirsher b382bb1b3e fm10k: use separate workqueue for fm10k driver
Since we run the watchdog periodically, which might take a while and
potentially monopolize the system default workqueue, create our own
separate work queue. This also helps reduce and stabilize latency
between scheduling the work in our interrupt and actually performing
the work. Still use a timer for the regular scheduled interval but
queue the work onto its own work queue.

It seemed overkill to create a single workqueue per interface, so we
just spawn a single work queue for all interfaces upon driver load. For
this reason, use a multi-threaded workqueue with one thread per
processor, rather than single threaded queue.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:38:06 -07:00
Jeff Kirsher ded8b20d5f fm10k: Set PF queues to unlimited bandwidth during virtualization
When returning virtualization queues from the VF back to the PF, do not
retain the VF rate limiter.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Todd Russell <todd.a.russell@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:37:42 -07:00
Jeff Kirsher e05546ec05 fm10k: expose tx_timeout_count as an ethtool stat
Named it tx_hang_count to differentiate it from tx_hwtstamp_timeout.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:37:28 -07:00
Jeff Kirsher 10df06fa2a fm10k: only increment tx_timeout_count in Tx hang path
We were incrementing the tx_timeout_count for both the Tx hang
and then for all reset flows.  Instead, we should only increment
tx_timeout_count in the Tx hang path, so that our Tx hang counter
does not increment when it was not caused by a Tx hang.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:37:15 -07:00
Jeff Kirsher 5deaf50bfb fm10k: remove extraneous "Reset interface" message
Since we already print this message when a reset is requested via the
RESET_REQUESTED flag, we do not need to print it before setting the
flag.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:35:48 -07:00
Jeff Kirsher fbdb159f7d fm10k: separate PF only stats so that VF does not display them
This patch resolves an issue with ethtool stats displaying useless
values on the VF, because some stats simply have no meaning to the VF.
Resolve this by splitting these out into PF_STATS and only showing them
if we aren't the VF.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:35:33 -07:00
Jeff Kirsher c0e61781a8 fm10k: use hw->mac.max_queues for stats
Even though it shouldn't strictly matter, don't count queue stats higher
than the max_queues value stored for this mac. This ensures that we
don't attempt to check queues which don't belong to use in VFs. This
shouldn't be a visible change, as the VFs should see zero for queues
which don't belong to them.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:35:16 -07:00
Jeff Kirsher 29a928ee63 fm10k: only show actual queues, not the maximum in hardware
Currently, we show statistics for all 128 queues, even though we don't
necessarily have that many queues available especially in the VF case.
Instead, use the hw->mac.max_queues value, which tells us how many
queues we actually have, rather than the space for the rings we
allocated. In this way, we prevent dumping statistics that are useless
on the VF.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:35:02 -07:00
Jeff Kirsher 661b206791 fm10k: allow creation of VLAN on default vid
Previously, the user was not allowed to create a VLAN interface on top
of the switch default vid.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:34:33 -07:00
Linus Torvalds ca2ec32658 Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:
 "Part one:

   - struct filename-related cleanups

   - saner iov_iter_init() replacements (and switching the syscalls to
     use of those)

   - ntfs switch to ->write_iter() (Anton)

   - aio cleanups and splitting iocb into common and async parts
     (Christoph)

   - assorted fixes (me, bfields, Andrew Elble)

  There's a lot more, including the completion of switchover to
  ->{read,write}_iter(), d_inode/d_backing_inode annotations, f_flags
  race fixes, etc, but that goes after #for-davem merge.  David has
  pulled it, and once it's in I'll send the next vfs pull request"

* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (35 commits)
  sg_start_req(): use import_iovec()
  sg_start_req(): make sure that there's not too many elements in iovec
  blk_rq_map_user(): use import_single_range()
  sg_io(): use import_iovec()
  process_vm_access: switch to {compat_,}import_iovec()
  switch keyctl_instantiate_key_common() to iov_iter
  switch {compat_,}do_readv_writev() to {compat_,}import_iovec()
  aio_setup_vectored_rw(): switch to {compat_,}import_iovec()
  vmsplice_to_user(): switch to import_iovec()
  kill aio_setup_single_vector()
  aio: simplify arguments of aio_setup_..._rw()
  aio: lift iov_iter_init() into aio_setup_..._rw()
  lift iov_iter into {compat_,}do_readv_writev()
  NFS: fix BUG() crash in notify_change() with patch to chown_common()
  dcache: return -ESTALE not -EBUSY on distributed fs race
  NTFS: Version 2.1.32 - Update file write from aio_write to write_iter.
  VFS: Add iov_iter_fault_in_multipages_readable()
  drop bogus check in file_open_root()
  switch security_inode_getattr() to struct path *
  constify tomoyo_realpath_from_path()
  ...
2015-04-14 15:31:03 -07:00
Jeff Kirsher de44519916 fm10k: fix unused warnings
The were several functions which had parameters which were never or
sometimes used in functions.  To resolve possible compiler warnings,
use __always_unused or __maybe_unused kernel macros to resolve.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:23:36 -07:00
Jeff Kirsher 8b4a98c7fa fm10k: Add netconsole support
This change adds a function called "fm10k_netpoll" that's used to define
"ndo_poll_controller" in "fm10k_netdev_ops". This is required to enable
support for "netconsole" in fm10k.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:08:37 -07:00
Jeff Kirsher 4c16ceae8d fm10k: Have the VF get the default VLAN during init
Currently, the VFs do not read the default VLAN during initialization,
so they will not be able to indicate untagged frames properly.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:07:27 -07:00
Jeff Kirsher a4bcea83d1 fm10k: Correct spelling mistake
Corrected a spelling mistake that was found over time.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-14 15:07:06 -07:00
Jeff Kirsher 97c71e3ccf fm10k: Remove redundant rx_errors in ethtool
Output of ethtool was reporting 2 rx_errors entries. This change
removes one of the redundant entries.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
2015-04-14 14:57:13 -07:00
Jeff Kirsher f4e25f6e36 fm10k: Corrected an error in Tx statistics
The function collecting Tx statistics was actually using values from the RX
ring. Thus, Tx and Rx statistics values reported by "ifconfig" will
return identical values. This change corrects this error and the Tx
statistics is now reading from the Tx ring.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
2015-04-14 14:42:27 -07:00
Linus Torvalds 6c8a53c9e6 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Core kernel changes:

   - One of the more interesting features in this cycle is the ability
     to attach eBPF programs (user-defined, sandboxed bytecode executed
     by the kernel) to kprobes.

     This allows user-defined instrumentation on a live kernel image
     that can never crash, hang or interfere with the kernel negatively.
     (Right now it's limited to root-only, but in the future we might
     allow unprivileged use as well.)

     (Alexei Starovoitov)

   - Another non-trivial feature is per event clockid support: this
     allows, amongst other things, the selection of different clock
     sources for event timestamps traced via perf.

     This feature is sought by people who'd like to merge perf generated
     events with external events that were measured with different
     clocks:

       - cluster wide profiling

       - for system wide tracing with user-space events,

       - JIT profiling events

     etc.  Matching perf tooling support is added as well, available via
     the -k, --clockid <clockid> parameter to perf record et al.

     (Peter Zijlstra)

  Hardware enablement kernel changes:

   - x86 Intel Processor Trace (PT) support: which is a hardware tracer
     on steroids, available on Broadwell CPUs.

     The hardware trace stream is directly output into the user-space
     ring-buffer, using the 'AUX' data format extension that was added
     to the perf core to support hardware constraints such as the
     necessity to have the tracing buffer physically contiguous.

     This patch-set was developed for two years and this is the result.
     A simple way to make use of this is to use BTS tracing, the PT
     driver emulates BTS output - available via the 'intel_bts' PMU.
     More explicit PT specific tooling support is in the works as well -
     will probably be ready by 4.2.

     (Alexander Shishkin, Peter Zijlstra)

   - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware
     feature of Intel Xeon CPUs that allows the measurement and
     allocation/partitioning of caches to individual workloads.

     These kernel changes expose the measurement side as a new PMU
     driver, which exposes various QoS related PMU events.  (The
     partitioning change is work in progress and is planned to be merged
     as a cgroup extension.)

     (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P
     Waskiewicz Jr)

   - x86 Intel Haswell LBR call stack support: this is a new Haswell
     feature that allows the hardware recording of call chains, plus
     tooling support.  To activate this feature you have to enable it
     via the new 'lbr' call-graph recording option:

        perf record --call-graph lbr
        perf report

     or:

        perf top --call-graph lbr

     This hardware feature is a lot faster than stack walk or dwarf
     based unwinding, but has some limitations:

       - It reuses the current LBR facility, so LBR call stack and
         branch record can not be enabled at the same time.

       - It is only available for user-space callchains.

     (Yan, Zheng)

   - x86 Intel Broadwell CPU support and various event constraints and
     event table fixes for earlier models.

     (Andi Kleen)

   - x86 Intel HT CPUs event scheduling workarounds.  This is a complex
     CPU bug affecting the SNB,IVB,HSW families that results in counter
     value corruption.  The mitigation code is automatically enabled and
     is transparent.

     (Maria Dimakopoulou, Stephane Eranian)

  The perf tooling side had a ton of changes in this cycle as well, so
  I'm only able to list the user visible changes here, in addition to
  the tooling changes outlined above:

  User visible changes affecting all tools:

      - Improve support of compressed kernel modules (Jiri Olsa)
      - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo)
      - Bash completion for subcommands (Yunlong Song)
      - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
      - Support missing -f to override perf.data file ownership. (Yunlong Song)
      - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)

  User visible changes in individual tools:

    'perf data':

        New tool for converting perf.data to other formats, initially
        for the CTF (Common Trace Format) from LTTng (Jiri Olsa,
        Sebastian Siewior)

    'perf diff':

        Add --kallsyms option (David Ahern)

    'perf list':

        Allow listing events with 'tracepoint' prefix (Yunlong Song)

        Sort the output of the command (Yunlong Song)

    'perf kmem':

        Respect -i option (Jiri Olsa)

        Print big numbers using thousands' group (Namhyung Kim)

        Allow -v option (Namhyung Kim)

        Fix alignment of slab result table (Namhyung Kim)

    'perf probe':

        Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

        Support unnamed union/structure members data collection. (Masami Hiramatsu)

        Check kprobes blacklist when adding new events. (Masami Hiramatsu)

    'perf record':

        Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

        Support recording running/enabled time (Andi Kleen)

    'perf sched':

        Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)

    'perf report' and 'perf top':

        Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)

        Indicate which callchain entries are annotated in the
        TUI hists browser (Arnaldo Carvalho de Melo)

        Add pid/tid filtering to 'report' and 'script' commands (David Ahern)

        Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
        cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
        events (Arnaldo Carvalho de Melo)

    'perf stat':

        Report unsupported events properly (Suzuki K. Poulose)

        Output running time and run/enabled ratio in CSV mode (Andi Kleen)

    'perf trace':

        Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)

        Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)

        Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)

        Dump stack on segfaults (Arnaldo Carvalho de Melo)

        No need to explicitely enable evsels for workload started from perf, let it
        be enabled via perf_event_attr.enable_on_exec, removing some events that take
        place in the 'perf trace' before a workload is really started by it.
        (Arnaldo Carvalho de Melo)

        Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)

  There's also been a ton of infrastructure work done, such as the
  split-out of perf's build system into tools/build/ and other changes -
  see the shortlog and changelog for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits)
  perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
  perf evlist: Fix type for references to data_head/tail
  perf probe: Check the orphaned -x option
  perf probe: Support multiple probes on different binaries
  perf buildid-list: Fix segfault when show DSOs with hits
  perf tools: Fix cross-endian analysis
  perf tools: Fix error path to do closedir() when synthesizing threads
  perf tools: Fix synthesizing fork_event.ppid for non-main thread
  perf tools: Add 'I' event modifier for exclude_idle bit
  perf report: Don't call map__kmap if map is NULL.
  perf tests: Fix attr tests
  perf probe: Fix ARM 32 building error
  perf tools: Merge all perf_event_attr print functions
  perf record: Add clockid parameter
  perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
  perf sched replay: Support using -f to override perf.data file ownership
  perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
  perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
  perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
  perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
  ...
2015-04-14 14:37:47 -07:00
Russell King 4b2f883847 Merge branch 'devel-stable' into for-next 2015-04-14 22:28:32 +01:00
Russell King c848791f03 Merge branches 'misc', 'vdso' and 'fixes' into for-next
Conflicts:
	arch/arm/mm/proc-macros.S
2015-04-14 22:28:25 +01:00
Russell King 79403cda37 ARM: update errata 430973 documentation to cover Cortex A8 r1p*
This errata covers all r1 variants of Cortex A8, it's not limited to
just r1p0..r1p2.  Update the documentation to reflect this.  The code
already applies the workaround to all r1p* A8 CPUs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:28:07 +01:00
Russell King 57ca654bef ARM: ensure delay timer has sufficient accuracy for delays
We have recently had an example of someone wanting to use a 90kHz timer
for the software delay loop.

udelay() needs to have at least microsecond resolution to allow drivers
access to a delay mechanism with a reasonable chance of delaying the
period they requested within at least a 50% marging of error, especially
for small delays.

Discussion about the udelay() accuracy can be found at:
	https://lkml.org/lkml/2011/1/9/37

Reject timers which are unable to supply this level of resolution.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:28:07 +01:00
Russell King 37463be865 ARM: switch to use the generic show_mem() implementation
Switch ARM to use the generic show_mem() implementation, which displays
the statistics from the mm zone rather than walking the page arrays.

Acked-by: Mel Gorman <mgorman <mgorman@suse.de>
Tested-by: Gregory Fong <gregory.0xf0@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:28:06 +01:00
Russell King a6d7467898 ARM: proc-v7: avoid errata 430973 workaround for non-Cortex A8 CPUs
Avoid the errata 430973 workaround for non-Cortex A8 CPUs.  Having this
workaround enabled introduces an additional branch target buffer flush
into the context switching path, something we wish to avoid.  To allow
this errata to be enabled in multiplatform kernels while reducing its
impact, rearrange the Cortex-A8 CPU support to avoid impacting on other
Version 7 CPUs.

Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:28:06 +01:00
Russell King e5a5de4447 ARM: enable ARM errata 643719 workaround by default
The effects of not having ARM errata 643719 enabled on affected CPUs
can be very confusing and hard to debug.  Rather than leave this to
chance, enable this workaround by default.  Now that we have rearranged
the code, it should have a low impact on the majority of CPUs.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:52 +01:00
Russell King aaf4b5d92c ARM: cache-v7: optimise test for Cortex A9 r0pX devices
Eliminate one unnecessary instruction from this test by pre-shifting
the Cortex A9 ID - we can shift the actual ID in the teq instruction
thereby losing the pX bit of the ID at no cost.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:52 +01:00
Russell King d3cd451dfb ARM: cache-v7: optimise branches in v7_flush_cache_louis
Optimise the branches such that for the majority of unaffected devices,
we avoid needing to execute the errata work-around code path by
branching to start_flush_levels early.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:52 +01:00
Russell King cd8b24d9e8 ARM: cache-v7: consolidate initialisation of cache level index
Both v7_flush_cache_louis and v7_flush_dcache_all both begin the
flush_levels loop with r10 initialised to zero.  In each case, this
is done immediately prior to entering the loop.  Branch to this
instruction in v7_flush_dcache_all from v7_flush_cache_louis and
eliminate the unnecessary initialisation in v7_flush_cache_louis.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:52 +01:00
Russell King 47b8484ea6 ARM: cache-v7: shift CLIDR to extract appropriate field before masking
Rather than have code which masks and then shifts, such as:

	mrc     p15, 1, r0, c0, c0, 1
ALT_SMP(ands	r3, r0, #7 << 21)
ALT_UP( ands	r3, r0, #7 << 27)
ALT_SMP(mov	r3, r3, lsr #20)
ALT_UP(	mov	r3, r3, lsr #26)

re-arrange this as a shift and then mask.  The masking is the same for
each field which we want to extract, so this allows the mask to be
shared amongst code paths:

	mrc     p15, 1, r0, c0, c0, 1
ALT_SMP(mov	r3, r0, lsr #20)
ALT_UP(	mov	r3, r0, lsr #26)
	ands	r3, r3, #7 << 1

Use this method for the LoUIS, LoUU and LoC fields.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:51 +01:00
Russell King 5aca370826 ARM: cache-v7: use movw/movt instructions
We always build cache-v7.S for ARMv7, so we can use the ARMv7 16-bit
move instructions to load large constants, rather than using constants
in a literal pool.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:51 +01:00
Russell King 89c6bc5884 ARM: allow 16-bit instructions in ALT_UP()
Allow ALT_UP() to cope with a 16-bit Thumb instruction by automatically
inserting a following nop instruction.  This allows us to care less
about getting the assembler to emit a 32-bit thumb instruction.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-14 22:26:51 +01:00
Linus Torvalds e95e7f6270 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NOHZ changes from Ingo Molnar:
 "This tree adds full dynticks support to KVM guests (support the
  disabling of the timer tick on the guest).  The main missing piece was
  the recognition of guest execution as RCU extended quiescent state and
  related changes"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kvm,rcu,nohz: use RCU extended quiescent state when running KVM guest
  context_tracking: Export context_tracking_user_enter/exit
  context_tracking: Run vtime_user_enter/exit only when state == CONTEXT_USER
  context_tracking: Add stub context_tracking_is_enabled
  context_tracking: Generalize context tracking APIs to support user and guest
  context_tracking: Rename context symbols to prepare for transition state
  ppc: Remove unused cpp symbols in kvm headers
2015-04-14 13:58:48 -07:00
Linus Torvalds 078838d565 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes from Ingo Molnar:
 "The main changes in this cycle were:

   - changes permitting use of call_rcu() and friends very early in
     boot, for example, before rcu_init() is invoked.

   - add in-kernel API to enable and disable expediting of normal RCU
     grace periods.

   - improve RCU's handling of (hotplug-) outgoing CPUs.

   - NO_HZ_FULL_SYSIDLE fixes.

   - tiny-RCU updates to make it more tiny.

   - documentation updates.

   - miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
  cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well
  cpu: Defer smpboot kthread unparking until CPU known to scheduler
  rcu: Associate quiescent-state reports with grace period
  rcu: Yet another fix for preemption and CPU hotplug
  rcu: Add diagnostics to grace-period cleanup
  rcutorture: Default to grace-period-initialization delays
  rcu: Handle outgoing CPUs on exit from idle loop
  cpu: Make CPU-offline idle-loop transition point more precise
  rcu: Eliminate ->onoff_mutex from rcu_node structure
  rcu: Process offlining and onlining only at grace-period start
  rcu: Move rcu_report_unblock_qs_rnp() to common code
  rcu: Rework preemptible expedited bitmask handling
  rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
  rcutorture: Enable slow grace-period initializations
  rcu: Provide diagnostic option to slow down grace-period initialization
  rcu: Detect stalls caused by failure to propagate up rcu_node tree
  rcu: Eliminate empty HOTPLUG_CPU ifdef
  rcu: Simplify sync_rcu_preempt_exp_init()
  rcu: Put all orphan-callback-related code under same comment
  rcu: Consolidate offline-CPU callback initialization
  ...
2015-04-14 13:36:04 -07:00
David S. Miller 87ffabb1f0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The dwmac-socfpga.c conflict was a case of a bug fix overlapping
changes in net-next to handle an error pointer differently.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 15:44:14 -04:00
David S. Miller 5e0e0dc10a Merge branch 'cxgb4-next'
Hariprasad Shenai says:

====================
cxgb4: Misc. fixes for sge

Increases value of MAX_IMM_TX_PKT_LEN to improve latency, fill freelist
starving threshold based on adapter type, add comments for tx flits and sge
length code and don't call t4_slow_intr_handler when we are not master PF.

This patch series has been created against net-next tree and includes patches on
cxgb4 driver

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 15:08:52 -04:00
Hariprasad Shenai c3c7b1217f cxgb4: Don't call t4_slow_intr_handler when we're not the Master PF
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 15:08:52 -04:00
Hariprasad Shenai 0aac3f56d4 cxgb4: Add comment for calculate tx flits and sge length code
Add comment for tx filt and sge length calucaltion code, also remove
a hardcoded value

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 15:08:52 -04:00
Hariprasad Shenai d52ce92037 cxgb4: Use device node in page allocation
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 15:08:52 -04:00
Hariprasad Shenai c098b026f0 cxgb4: Freelist starving threshold varies from adapter to adapter
fl_starv_thres could be different from adapter to adapter, don't use
hardcoded values

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 15:08:51 -04:00
Hariprasad Shenai 21dcfad640 cxgb4: Increased the value of MAX_IMM_TX_PKT_LEN from 128 to 256 bytes
This allows a significant latency drop for packets of sizes between 128 and 192
bytes

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 15:08:51 -04:00
Felix Fietkau 29ba877e7c bgmac: drop ring->num_slots
The ring size is always known at compile time, so make the code a bit
more efficient

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 14:57:11 -04:00
Felix Fietkau 4668ae1fbc bgmac: fix DMA rx corruption
The driver needs to inform the hardware about the first invalid (not yet
filled) rx slot, by writing its DMA descriptor pointer offset to the
BGMAC_DMA_RX_INDEX register.

This register was set to a value exceeding the rx ring size, effectively
allowing the hardware constant access to the full ring, regardless of
which slots are initialized.

To fix this issue, always mark the last filled rx slot as invalid.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 14:57:11 -04:00
Felix Fietkau 74b6f29163 bgmac: simplify dma init/cleanup
Instead of allocating buffers at device init time and initializing
descriptors at device open, do both at the same time (during open).
Free all buffers when closing the device.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 14:57:11 -04:00
Felix Fietkau b9650557f3 bgmac: increase rx ring size from 511 to 512
Limiting it to 511 looks like a failed attempt at leaving one descriptor
empty to allow the hardware to stop processing a buffer that has not
been prepared yet. However, this doesn't work because this affects the
total ring size as well

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 14:57:11 -04:00