1
0
Fork 0
Commit Graph

61 Commits (973f5fbedb0721ab964386a5fe5120998e71580c)

Author SHA1 Message Date
Daniel Borkmann 3b497806f6 bpf, s390x: implement jiting of BPF_J{LT, LE, SLT, SLE}
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
with BPF_X/BPF_K variants for the s390x eBPF JIT.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09 16:53:57 -07:00
Daniel Borkmann b0a0c2566f bpf, s390: fix jit branch offset related to ldimm64
While testing some other work that required JIT modifications, I
run into test_bpf causing a hang when JIT enabled on s390. The
problematic test case was the one from ddc665a4bb (bpf, arm64:
fix jit branch offset related to ldimm64), and turns out that we
do have a similar issue on s390 as well. In bpf_jit_prog() we
update next instruction address after returning from bpf_jit_insn()
with an insn_count. bpf_jit_insn() returns either -1 in case of
error (e.g. unsupported insn), 1 or 2. The latter is only the
case for ldimm64 due to spanning 2 insns, however, next address
is only set to i + 1 not taking actual insn_count into account,
thus fix is to use insn_count instead of 1. bpf_jit_enable in
mode 2 provides also disasm on s390:

Before fix:

  000003ff800349b6: a7f40003   brc     15,3ff800349bc                 ; target
  000003ff800349ba: 0000               unknown
  000003ff800349bc: e3b0f0700024       stg     %r11,112(%r15)
  000003ff800349c2: e3e0f0880024       stg     %r14,136(%r15)
  000003ff800349c8: 0db0               basr    %r11,%r0
  000003ff800349ca: c0ef00000000       llilf   %r14,0
  000003ff800349d0: e320b0360004       lg      %r2,54(%r11)
  000003ff800349d6: e330b03e0004       lg      %r3,62(%r11)
  000003ff800349dc: ec23ffeda065       clgrj   %r2,%r3,10,3ff800349b6 ; jmp
  000003ff800349e2: e3e0b0460004       lg      %r14,70(%r11)
  000003ff800349e8: e3e0b04e0004       lg      %r14,78(%r11)
  000003ff800349ee: b904002e   lgr     %r2,%r14
  000003ff800349f2: e3b0f0700004       lg      %r11,112(%r15)
  000003ff800349f8: e3e0f0880004       lg      %r14,136(%r15)
  000003ff800349fe: 07fe               bcr     15,%r14

After fix:

  000003ff80ef3db4: a7f40003   brc     15,3ff80ef3dba
  000003ff80ef3db8: 0000               unknown
  000003ff80ef3dba: e3b0f0700024       stg     %r11,112(%r15)
  000003ff80ef3dc0: e3e0f0880024       stg     %r14,136(%r15)
  000003ff80ef3dc6: 0db0               basr    %r11,%r0
  000003ff80ef3dc8: c0ef00000000       llilf   %r14,0
  000003ff80ef3dce: e320b0360004       lg      %r2,54(%r11)
  000003ff80ef3dd4: e330b03e0004       lg      %r3,62(%r11)
  000003ff80ef3dda: ec230006a065       clgrj   %r2,%r3,10,3ff80ef3de6 ; jmp
  000003ff80ef3de0: e3e0b0460004       lg      %r14,70(%r11)
  000003ff80ef3de6: e3e0b04e0004       lg      %r14,78(%r11)          ; target
  000003ff80ef3dec: b904002e   lgr     %r2,%r14
  000003ff80ef3df0: e3b0f0700004       lg      %r11,112(%r15)
  000003ff80ef3df6: e3e0f0880004       lg      %r14,136(%r15)
  000003ff80ef3dfc: 07fe               bcr     15,%r14

test_bpf.ko suite runs fine after the fix.

Fixes: 0546231057 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-04 11:18:01 -07:00
Martin KaFai Lau 783d28dd11 bpf: Add jited_len to struct bpf_prog
Add jited_len to struct bpf_prog.  It will be
useful for the struct bpf_prog_info which will
be added in the later patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 15:41:24 -04:00
Alexei Starovoitov 71189fa9b0 bpf: free up BPF_JMP | BPF_CALL | BPF_X opcode
free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual
indirect call by register and use kernel internal opcode to
mark call instruction into bpf_tail_call() helper.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31 19:29:47 -04:00
Laura Abbott e6c7c63001 s390: use set_memory.h header
set_memory_* functions have moved to set_memory.h.  Switch to this
explicitly

Link: http://lkml.kernel.org/r/1488920133-27229-5-git-send-email-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Linus Torvalds ff47d8c050 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:

 - New entropy generation for the pseudo random number generator.

 - Early boot printk output via sclp to help debug crashes on boot. This
   needs to be enabled with a kernel parameter.

 - Add proper no-execute support with a bit in the page table entry.

 - Bug fixes and cleanups.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (65 commits)
  s390/syscall: fix single stepped system calls
  s390/zcrypt: make ap_bus explicitly non-modular
  s390/zcrypt: Removed unneeded debug feature directory creation.
  s390: add missing "do {} while (0)" loop constructs to multiline macros
  s390/mm: add cond_resched call to kernel page table dumper
  s390: get rid of MACHINE_HAS_PFMF and MACHINE_HAS_HPAGE
  s390/mm: make memory_block_size_bytes available for !MEMORY_HOTPLUG
  s390: replace ACCESS_ONCE with READ_ONCE
  s390: Audit and remove any remaining unnecessary uses of module.h
  s390: mm: Audit and remove any unnecessary uses of module.h
  s390: kernel: Audit and remove any unnecessary uses of module.h
  s390/kdump: Use "LINUX" ELF note name instead of "CORE"
  s390: add no-execute support
  s390: report new vector facilities
  s390: use correct input data address for setup_randomness
  s390/sclp: get rid of common response code handling
  s390/sclp: don't add new lines to each printed string
  s390/sclp: make early sclp code readable
  s390/sclp: disable early sclp code as soon as the base sclp driver is active
  s390/sclp: move early printk code to drivers
  ...
2017-02-22 10:20:04 -08:00
Daniel Borkmann 9d876e79df bpf: fix unlocking of jited image when module ronx not set
Eric and Willem reported that they recently saw random crashes when
JIT was in use and bisected this to 74451e66d5 ("bpf: make jited
programs visible in traces"). Issue was that the consolidation part
added bpf_jit_binary_unlock_ro() that would unlock previously made
read-only memory back to read-write. However, DEBUG_SET_MODULE_RONX
cannot be used for this to test for presence of set_memory_*()
functions. We need to use ARCH_HAS_SET_MEMORY instead to fix this;
also add the corresponding bpf_jit_binary_lock_ro() to filter.h.

Fixes: 74451e66d5 ("bpf: make jited programs visible in traces")
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Bisected-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21 13:30:14 -05:00
Daniel Borkmann 74451e66d5 bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.

This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.

Before:

  7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
         f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)

After:

  7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
  [...]
  7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
         f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 13:40:05 -05:00
Daniel Borkmann 9383191da4 bpf: remove stubs for cBPF from arch code
Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
that a single __weak function in the core that can be overridden
similarly to the eBPF one. Also remove stale pr_err() mentions
of bpf_jit_compile.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 13:40:04 -05:00
Daniel Borkmann 9437964885 s390/bpf: remove redundant check for non-null image
After we already allocated the jit.prg_buf image via
bpf_jit_binary_alloc() and filled it out with instructions,
jit.prg_buf cannot be NULL anymore. Thus, remove the
unnecessary check. Tested on s390x with test_bpf module.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-01-16 07:27:55 +01:00
Martin KaFai Lau 17bedab272 bpf: xdp: Allow head adjustment in XDP prog
This patch allows XDP prog to extend/remove the packet
data at the head (like adding or removing header).  It is
done by adding a new XDP helper bpf_xdp_adjust_head().

It also renames bpf_helper_changes_skb_data() to
bpf_helper_changes_pkt_data() to better reflect
that XDP prog does not work on skb.

This patch adds one "xdp_adjust_head" bit to bpf_prog for the
XDP-capable driver to check if the XDP prog requires
bpf_xdp_adjust_head() support.  The driver can then decide
to error out during XDP_SETUP_PROG.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 14:25:13 -05:00
Michael Holzheu 6edf0aa4f8 s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop
In case of usage of skb_vlan_push/pop, in the prologue we store
the SKB pointer on the stack and restore it after BPF_JMP_CALL
to skb_vlan_push/pop.

Unfortunately currently there are two bugs in the code:

 1) The wrong stack slot (offset 170 instead of 176) is used
 2) The wrong register (W1 instead of B1) is saved

So fix this and use correct stack slot and register.

Fixes: 9db7f2b818 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-19 09:14:27 +02:00
Michael Holzheu 0fa963553a s390/bpf: reduce maximum program size to 64 KB
The s390 BFP compiler currently uses relative branch instructions
that only support jumps up to 64 KB. Examples are "j", "jnz", "cgrj",
etc.  Currently the maximum size of s390 BPF programs is set
to 0x7ffff.  If branches over 64 KB are generated the, kernel can
crash due to incorrect code.

So fix this an reduce the maximum size to 64 KB. Programs larger than
that will be interpreted.

Fixes: ce2b6ad9c1 ("s390/bpf: increase BPF_SIZE_MAX")

Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-19 09:14:27 +02:00
Daniel Borkmann d93a47f735 bpf, s390: add support for constant blinding
This patch adds recently added constant blinding helpers into the
s390 eBPF JIT. In the bpf_int_jit_compile() path, requirements are
to utilize bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair for rewriting the program into a blinded one, and to map the
BPF_REG_AX register to a CPU register. The mapping of BPF_REG_AX
is at r12 and similarly like in x86 case performs reloading when
ld_abs/ind is used. When blinding is not used, there's no additional
overhead in the generated image.

When BPF_REG_AX is used, we don't need to emit skb->data reload when
helper function changed skb->data, as this will be reloaded later
on anyway from stack on ld_abs/ind, where skb->data is needed. s390
allows for this w/o much additional complexity unlike f.e. x86.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:33 -04:00
Daniel Borkmann d1c55ab5e4 bpf: prepare bpf_int_jit_compile/bpf_prog_select_runtime apis
Since the blinding is strictly only called from inside eBPF JITs,
we need to change signatures for bpf_int_jit_compile() and
bpf_prog_select_runtime() first in order to prepare that the
eBPF program we're dealing with can change underneath. Hence,
for call sites, we need to return the latest prog. No functional
change in this patch.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 13:49:32 -04:00
Daniel Borkmann 8b614aebec bpf: move clearing of A/X into classic to eBPF migration prologue
Back in the days where eBPF (or back then "internal BPF" ;->) was not
exposed to user space, and only the classic BPF programs internally
translated into eBPF programs, we missed the fact that for classic BPF
A and X needed to be cleared. It was fixed back then via 83d5b7ef99
("net: filter: initialize A and X registers"), and thus classic BPF
specifics were added to the eBPF interpreter core to work around it.

This added some confusion for JIT developers later on that take the
eBPF interpreter code as an example for deriving their JIT. F.e. in
f75298f5c3 ("s390/bpf: clear correct BPF accumulator register"), at
least X could leak stack memory. Furthermore, since this is only needed
for classic BPF translations and not for eBPF (verifier takes care
that read access to regs cannot be done uninitialized), more complexity
is added to JITs as they need to determine whether they deal with
migrations or native eBPF where they can just omit clearing A/X in
their prologue and thus reduce image size a bit, see f.e. cde66c2d88
("s390/bpf: Only clear A and X for converted BPF programs"). In other
cases (x86, arm64), A and X is being cleared in the prologue also for
eBPF case, which is unnecessary.

Lets move this into the BPF migration in bpf_convert_filter() where it
actually belongs as long as the number of eBPF JITs are still few. It
can thus be done generically; allowing us to remove the quirk from
__bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
while reducing code duplication on this matter in current(/future) eBPF
JITs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Yang Shi <yang.shi@linaro.org>
Acked-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18 16:04:51 -05:00
Daniel Borkmann a91263d520 ebpf: migrate bpf_prog's flags to bitfield
As we need to add further flags to the bpf_prog structure, lets migrate
both bools to a bitfield representation. The size of the base structure
(excluding insns) remains unchanged at 40 bytes.

Add also tags for the kmemchecker, so that it doesn't throw false
positives. Even in case gcc would generate suboptimal code, it's not
being accessed in performance critical paths.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 05:02:39 -07:00
Kaixu Xia 2c9c3bbbbf bpf: s390: Fix build error caused by the struct bpf_array member name changed
There is a build error that "'struct bpf_array' has no member
named 'prog'" on s390. In commit 2a36f0b92e ("bpf: Make the
bpf_prog_array_map more generic"), the member 'prog' of struct
bpf_array is replaced by 'ptrs'. So this patch fixes it.

Fixes: 2a36f0b92e ("bpf: Make the bpf_prog_array_map more generic")
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-11 11:49:40 -07:00
Daniel Borkmann 7b36f92934 bpf: provide helper that indicates eBPF was migrated
During recent discussions we had with Michael, we found that it would
be useful to have an indicator that tells the JIT that an eBPF program
had been migrated from classic instructions into eBPF instructions, as
only in that case A and X need to be cleared in the prologue. Such eBPF
programs do not set a particular type, but all have BPF_PROG_TYPE_UNSPEC.
Thus, introduce a small helper for cde66c2d88 ("s390/bpf: Only clear
A and X for converted BPF programs") and possibly others in future.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30 11:13:20 -07:00
Michael Holzheu 9db7f2b818 s390/bpf: recache skb->data/hlen for skb_vlan_push/pop
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop
via helper functions. These functions may change skb->data/hlen.
This data is cached by s390 JIT to improve performance of ld_abs/ld_ind
instructions. Therefore after a change we have to reload the data.

In case of usage of skb_vlan_push/pop, in the prologue we store
the SKB pointer on the stack and restore it after BPF_JMP_CALL
to skb_vlan_push/pop.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 14:59:58 -07:00
Michael Holzheu cde66c2d88 s390/bpf: Only clear A and X for converted BPF programs
Only classic BPF programs that have been converted to eBPF need to clear
the A and X registers. We can check for converted programs with:

  bpf_prog->type == BPF_PROG_TYPE_UNSPEC

So add the check and skip initialization for real eBPF programs.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 14:59:58 -07:00
Michael Holzheu ce2b6ad9c1 s390/bpf: increase BPF_SIZE_MAX
Currently we have the restriction that jitted BPF programs can
have a maximum size of one page. The reason is that we use short
displacements for the literal pool.

The 20 bit displacements are available since z990 and BPF requires
z196 as minimum. Therefore we can remove this restriction and use
everywhere 20 bit signed long displacements.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 14:59:58 -07:00
Michael Holzheu 1df03ffdde s390/bpf: Fix multiple macro expansions
The EMIT6_DISP_LH macro passes the "disp" parameter to the _EMIT6_DISP_LH
macro. The _EMIT6_DISP_LH macro uses the "disp" parameter twice:

 unsigned int __disp_h = ((u32)disp) & 0xff000;
 unsigned int __disp_l = ((u32)disp) & 0x00fff;

The EMIT6_DISP_LH is used several times with EMIT_CONST_U64() as "disp"
parameter. Therefore always two constants are created per usage of
EMIT6_DISP_LH.

Fix this and add variable "_disp" to avoid multiple expansions.

* v2: Move "_disp" to _EMIT6_DISP_LH as suggested by Joe Perches

Fixes: 0546231057 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 14:59:58 -07:00
Michael Holzheu f75298f5c3 s390/bpf: clear correct BPF accumulator register
Currently we assumed the following BPF to eBPF register mapping:

 - BPF_REG_A -> BPF_REG_7
 - BPF_REG_X -> BPF_REG_8

Unfortunately this mapping is wrong. The correct mapping is:

 - BPF_REG_A -> BPF_REG_0
 - BPF_REG_X -> BPF_REG_7

So clear the correct registers and use the BPF_REG_A and BPF_REG_X
macros instead of BPF_REG_0/7.

Fixes: 0546231057 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 14:59:58 -07:00
Alexei Starovoitov 4e10df9a60 bpf: introduce bpf_skb_vlan_push/pop() helpers
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via
helper functions. These functions may change skb->data/hlen which are
cached by some JITs to improve performance of ld_abs/ld_ind instructions.
Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls,
re-compute header len and re-cache skb->data/hlen back into cpu registers.
Note, skb->data/hlen are not directly accessible from the programs,
so any changes to skb->data done either by these helpers or by other
TC actions are safe.

eBPF JIT supported by three architectures:
- arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is.
- x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls
  (experiments showed that conditional re-caching is slower).
- s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present
  in the program (re-caching is tbd).

These helpers allow more scalable handling of vlan from the programs.
Instead of creating thousands of vlan netdevs on top of eth0 and attaching
TC+ingress+bpf to all of them, the program can be attached to eth0 directly
and manipulate vlans as necessary.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:52:31 -07:00
Michael Holzheu b035b60ded s390/bpf: Fix backward jumps
Currently all backward jumps crash for JITed s390x eBPF programs
with an illegal instruction program check and kernel panic. Because
for negative values the opcode of the jump instruction is overriden
by the negative branch offset an illegal instruction is generated
by the JIT:

 000003ff802da378: c01100000002   lgfi    %r1,2
 000003ff802da37e: fffffff52065   unknown <-- illegal instruction
 000003ff802da384: b904002e       lgr     %r2,%r14

So fix this and mask the offset in order not to damage the opcode.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-06-25 09:39:18 +02:00
Michael Holzheu 6651ee070b s390/bpf: implement bpf_tail_call() helper
bpf_tail_call() arguments:

 - ctx......: Context pointer
 - jmp_table: One of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table
 - index....: Index in the jump table

In this implementation s390x JIT does stack unwinding and jumps into the
callee program prologue. Caller and callee use the same stack.

With this patch a tail call generates the following code on s390x:

 if (index >= array->map.max_entries)
         goto out
 000003ff8001c7e4: e31030100016   llgf    %r1,16(%r3)
 000003ff8001c7ea: ec41001fa065   clgrj   %r4,%r1,10,3ff8001c828

 if (tail_call_cnt++ > MAX_TAIL_CALL_CNT)
         goto out;
 000003ff8001c7f0: a7080001       lhi     %r0,1
 000003ff8001c7f4: eb10f25000fa   laal    %r1,%r0,592(%r15)
 000003ff8001c7fa: ec120017207f   clij    %r1,32,2,3ff8001c828

 prog = array->prog[index];
 if (prog == NULL)
         goto out;
 000003ff8001c800: eb140003000d   sllg    %r1,%r4,3
 000003ff8001c806: e31310800004   lg      %r1,128(%r3,%r1)
 000003ff8001c80c: ec18000e007d   clgij   %r1,0,8,3ff8001c828

 Restore registers before calling function
 000003ff8001c812: eb68f2980004   lmg     %r6,%r8,664(%r15)
 000003ff8001c818: ebbff2c00004   lmg     %r11,%r15,704(%r15)

 goto *(prog->bpf_func + tail_call_start);
 000003ff8001c81e: e31100200004   lg      %r1,32(%r1,%r0)
 000003ff8001c824: 47f01006       bc      15,6(%r1)

Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-09 11:47:10 -07:00
Michael Holzheu 88aeca15d6 s390/bpf: fix bpf frame pointer setup
Currently the bpf frame pointer is set to the old r15. This is
wrong because of packed stack. Fix this and adjust the frame pointer
to respect packed stack. This now generates a prolog like the following:

 3ff8001c3fa: eb67f0480024   stmg    %r6,%r7,72(%r15)
 3ff8001c400: ebcff0780024   stmg    %r12,%r15,120(%r15)
 3ff8001c406: b904001f       lgr     %r1,%r15      <- load backchain
 3ff8001c40a: 41d0f048       la      %r13,72(%r15) <- load adjusted bfp
 3ff8001c40e: a7fbfd98       aghi    %r15,-616
 3ff8001c412: e310f0980024   stg     %r1,152(%r15) <- save backchain

Fixes: 0546231057 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 19:31:39 -07:00
Michael Holzheu b9b4b1cef1 s390/bpf: Fix gcov stack space problem
When compiling the kernel for GCOV (CONFIG_GCOV_KERNEL,-fprofile-arcs),
gcc allocates a lot of stack space because of the large switch statement
in bpf_jit_insn().

This leads to the following compile warning:

 arch/s390/net/bpf_jit_comp.c: In function 'bpf_jit_prog':
 arch/s390/net/bpf_jit_comp.c:1144:1: warning: frame size of
  function 'bpf_jit_prog' is 12592 bytes which is more than
  half the stack size. The dynamic check would not be reliable.
  No check emitted for this function.

 arch/s390/net/bpf_jit_comp.c:1144:1: warning: the frame size of 12504
  bytes is larger than 1024 bytes [-Wframe-larger-than=]

And indead gcc allocates 12592 bytes of stack space:

 # objdump -d arch/s390/net/bpf_jit_comp.o
 ...
 0000000000000c60 <bpf_jit_prog>:
     c60:       eb 6f f0 48 00 24       stmg    %r6,%r15,72(%r15)
     c66:       b9 04 00 ef             lgr     %r14,%r15
     c6a:       e3 f0 fe d0 fc 71       lay     %r15,-12592(%r15)

As a workaround of that problem we now define bpf_jit_insn() as
noinline which then reduces the stack space.

 # objdump -d arch/s390/net/bpf_jit_comp.o
 ...
 0000000000000070 <bpf_jit_insn>:
      70:       eb 6f f0 48 00 24       stmg    %r6,%r15,72(%r15)
      76:       c0 d0 00 00 00 00       larl    %r13,76 <bpf_jit_insn+0x6>
      7c:       a7 f1 3f 80             tmll    %r15,16256
      80:       b9 04 00 ef             lgr     %r14,%r15
      84:       e3 f0 ff a0 ff 71       lay     %r15,-96(%r15)

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-04-30 13:50:36 +02:00
Michael Holzheu 771aada9ac s390/bpf: Adjust ALU64_DIV/MOD to match interpreter change
The s390x ALU64_DIV/MOD has been implemented according to the eBPF
interpreter specification that used do_div(). This function does a 64-bit
by 32-bit divide. It turned out that this was wrong and now the interpreter
uses div64_u64_rem() for full 64-bit division.

So fix this and use full 64-bit division in the s390x eBPF backend code.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-04-30 13:50:34 +02:00
Michael Holzheu 0546231057 s390/bpf: Add s390x eBPF JIT compiler backend
Replace 32 bit BPF JIT backend with new 64 bit eBPF backend.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-04-15 12:23:49 +02:00
Michael Holzheu 5a80244246 s390/bpf: Fix JMP_JGE_K (A >= K) and JMP_JGT_K (A > K)
Currently the signed COMPARE HALFWORD IMMEDIATE (chi) and COMPARE (c)
instructions are used to compare "A" with "K". This is not correct
because "A" and "K" are both unsigned. To fix this remove the
chi instruction (no unsigned analogon available) and use the
unsigned COMPARE LOGICAL (cl) instruction instead of COMPARE (c).

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-15 08:17:42 +01:00
Michael Holzheu ae75097459 s390/bpf: Fix JMP_JGE_X (A > X) and JMP_JGT_X (A >= X)
Currently the signed COMPARE (cr) instruction is used to compare "A"
with "X". This is not correct because "A" and "X" are both unsigned.
To fix this use the unsigned COMPARE LOGICAL (clr) instruction instead.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-09 10:10:32 +01:00
Michael Holzheu df3eed3d28 s390/bpf: Fix ALU_NEG (A = -A)
Currently the LOAD NEGATIVE (lnr) instruction is used for ALU_NEG. This
instruction always loads the negative value. Therefore, if A is already
negative, it remains unchanged. To fix this use LOAD COMPLEMENT (lcr)
instead.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-09 10:10:30 +01:00
Hannes Frederic Sowa 233577a220 net: filter: constify detection of pkt_type_offset
Currently we have 2 pkt_type_offset functions doing the same thing and
spread across the architecture files. Remove those and replace them
with a PKT_TYPE_OFFSET macro helper which gets the constant value from a
zero sized sk_buff member right in front of the bitfield with offsetof.
This new offset marker does not change size of struct sk_buff.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-13 17:07:21 -04:00
Daniel Borkmann 286aad3c40 net: bpf: be friendly to kmemcheck
Reported by Mikulas Patocka, kmemcheck currently barks out a
false positive since we don't have special kmemcheck annotation
for bitfields used in bpf_prog structure.

We currently have jited:1, len:31 and thus when accessing len
while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
we're reading uninitialized memory.

As we don't need the whole bit universe for pages member, we
can just split it to u16 and use a bool flag for jited instead
of a bitfield.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 16:58:56 -07:00
Daniel Borkmann 738cbe72ad net: bpf: consolidate JIT binary allocator
Introduced in commit 314beb9bca ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") and later on replicated in aa2d2c73c2
("s390/bpf,jit: address randomize and write protect jit code") for
s390 architecture, write protection for BPF JIT images got added and
a random start address of the JIT code, so that it's not on a page
boundary anymore.

Since both use a very similar allocator for the BPF binary header,
we can consolidate this code into the BPF core as it's mostly JIT
independant anyway.

This will also allow for future archs that support DEBUG_SET_MODULE_RONX
to just reuse instead of reimplementing it.

JIT tested on x86_64 and s390x with BPF test suite.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 16:58:56 -07:00
Daniel Borkmann 60a3b2253c net: bpf: make eBPF interpreter images read-only
With eBPF getting more extended and exposure to user space is on it's way,
hardening the memory range the interpreter uses to steer its command flow
seems appropriate.  This patch moves the to be interpreted bytecode to
read-only pages.

In case we execute a corrupted BPF interpreter image for some reason e.g.
caused by an attacker which got past a verifier stage, it would not only
provide arbitrary read/write memory access but arbitrary function calls
as well. After setting up the BPF interpreter image, its contents do not
change until destruction time, thus we can setup the image on immutable
made pages in order to mitigate modifications to that code. The idea
is derived from commit 314beb9bca ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks").

This is possible because bpf_prog is not part of sk_filter anymore.
After setup bpf_prog cannot be altered during its life-time. This prevents
any modifications to the entire bpf_prog structure (incl. function/JIT
image pointer).

Every eBPF program (including classic BPF that are migrated) have to call
bpf_prog_select_runtime() to select either interpreter or a JIT image
as a last setup step, and they all are being freed via bpf_prog_free(),
including non-JIT. Therefore, we can easily integrate this into the
eBPF life-time, plus since we directly allocate a bpf_prog, we have no
performance penalty.

Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
inspection of kernel_page_tables.  Brad Spengler proposed the same idea
via Twitter during development of this patch.

Joint work with Hannes Frederic Sowa.

Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 12:02:48 -07:00
Alexei Starovoitov 7ae457c1e5 net: filter: split 'struct sk_filter' into socket and bpf parts
clean up names related to socket filtering and bpf in the following way:
- everything that deals with sockets keeps 'sk_*' prefix
- everything that is pure BPF is changed to 'bpf_*' prefix

split 'struct sk_filter' into
struct sk_filter {
	atomic_t        refcnt;
	struct rcu_head rcu;
	struct bpf_prog *prog;
};
and
struct bpf_prog {
        u32                     jited:1,
                                len:31;
        struct sock_fprog_kern  *orig_prog;
        unsigned int            (*bpf_func)(const struct sk_buff *skb,
                                            const struct bpf_insn *filter);
        union {
                struct sock_filter      insns[0];
                struct bpf_insn         insnsi[0];
                struct work_struct      work;
        };
};
so that 'struct bpf_prog' can be used independent of sockets and cleans up
'unattached' bpf use cases

split SK_RUN_FILTER macro into:
    SK_RUN_FILTER to be used with 'struct sk_filter *' and
    BPF_PROG_RUN to be used with 'struct bpf_prog *'

__sk_filter_release(struct sk_filter *) gains
__bpf_prog_release(struct bpf_prog *) helper function

also perform related renames for the functions that work
with 'struct bpf_prog *', since they're on the same lines:

sk_filter_size -> bpf_prog_size
sk_filter_select_runtime -> bpf_prog_select_runtime
sk_filter_free -> bpf_prog_free
sk_unattached_filter_create -> bpf_prog_create
sk_unattached_filter_destroy -> bpf_prog_destroy
sk_store_orig_filter -> bpf_prog_store_orig_filter
sk_release_orig_filter -> bpf_release_orig_filter
__sk_migrate_filter -> bpf_migrate_filter
__sk_prepare_filter -> bpf_prepare_filter

API for attaching classic BPF to a socket stays the same:
sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
which is used by sockets, tun, af_packet

API for 'unattached' BPF programs becomes:
bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02 15:03:58 -07:00
Daniel Borkmann 3480593131 net: filter: get rid of BPF_S_* enum
This patch finally allows us to get rid of the BPF_S_* enum.
Currently, the code performs unnecessary encode and decode
workarounds in seccomp and filter migration itself when a filter
is being attached in order to overcome BPF_S_* encoding which
is not used anymore by the new interpreter resp. JIT compilers.

Keeping it around would mean that also in future we would need
to extend and maintain this enum and related encoders/decoders.
We can get rid of all that and save us these operations during
filter attaching. Naturally, also JIT compilers need to be updated
by this.

Before JIT conversion is being done, each compiler checks if A
is being loaded at startup to obtain information if it needs to
emit instructions to clear A first. Since BPF extensions are a
subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
for extensions can be removed at that point. To ease and minimalize
code changes in the classic JITs, we have introduced bpf_anc_helper().

Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
unfortunately didn't have access, but changes are analogous to
the rest.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Chema Gonzalez <chemag@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 22:16:58 -07:00
Heiko Carstens e84d2f8d2a net: filter: s390: fix JIT address randomization
This is the s390 variant of Alexei's JIT bug fix.
(patch description below stolen from Alexei's patch)

bpf_alloc_binary() adds 128 bytes of room to JITed program image
and rounds it up to the nearest page size. If image size is close
to page size (like 4000), it is rounded to two pages:
round_up(4000 + 4 + 128) == 8192
then 'hole' is computed as 8192 - (4000 + 4) = 4188
If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header)
then kernel will crash during bpf_jit_free():

kernel BUG at arch/x86/mm/pageattr.c:887!
Call Trace:
 [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40
 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
 [<ffffffff8106c90c>] worker_thread+0x11c/0x370

since bpf_jit_free() does:
  unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
  struct bpf_binary_header *header = (void *)addr;
to compute start address of 'bpf_binary_header'
and header->pages will pass junk to:
  set_memory_rw(addr, header->pages);

Fix it by making sure that &header->image[prandom_u32() % hole] and &header
are in the same page.

Fixes: aa2d2c73c2 ("s390/bpf,jit: address randomize and write protect jit code")

Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: <stable@vger.kernel.org> # v3.11+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-14 16:10:16 -04:00
Martin Schwidefsky 6e0de81759 s390/bpf,jit: initialize A register if 1st insn is BPF_S_LDX_B_MSH
The A register needs to be initialized to zero in the prolog if the
first instruction of the BPF program is BPF_S_LDX_B_MSH to prevent
leaking the content of %r5 to user space.

Cc: <stable@vger.kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-25 14:03:25 +02:00
Daniel Borkmann f8bbbfc3b9 net: filter: add jited flag to indicate jit compiled filters
This patch adds a jited flag into sk_filter struct in order to indicate
whether a filter is currently jited or not. The size of sk_filter is
not being expanded as the 32 bit 'len' member allows upper bits to be
reused since a filter can currently only grow as large as BPF_MAXINSNS.

Therefore, there's enough room also for other in future needed flags to
reuse 'len' field if necessary. The jited flag also allows for having
alternative interpreter functions running as currently, we can only
detect jit compiled filters by testing fp->bpf_func to not equal the
address of sk_run_filter().

Joint work with Alexei Starovoitov.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:08 -04:00
Tom Herbert 61b905da33 net: Rename skb->rxhash to skb->hash
The packet hash can be considered a property of the packet, not just
on RX path.

This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 15:58:20 -04:00
Heiko Carstens 3af57f78c3 s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructions
The s390 bpf jit compiler emits the signed divide instructions "dr" and "d"
for unsigned divisions.
This can cause problems: the dividend will be zero extended to a 64 bit value
and the divisor is the 32 bit signed value as specified A or X accumulator,
even though A and X are supposed to be treated as unsigned values.

The divide instrunctions will generate an exception if the result cannot be
expressed with a 32 bit signed value.
This is the case if e.g. the dividend is 0xffffffff and the divisor either 1
or also 0xffffffff (signed: -1).

To avoid all these issues simply use unsigned divide instructions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:54:49 -08:00
Eric Dumazet aee636c480 bpf: do not use reciprocal divide
At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c

He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c

The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Daniel Borkmann <dxchgb@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 17:02:08 -08:00
Martin Schwidefsky 6cef30034c s390/bpf,jit: fix prolog oddity
The prolog of functions generated by the bpf jit compiler uses an
instruction sequence with an "ahi" instruction to create stack space
instead of using an "aghi" instruction. Using the 32-bit "ahi" is not
wrong as the stack we are operating on is an order-4 allocation which
is always aligned to 16KB. But it is more consistent to use an "aghi"
as the stack pointer is a 64-bit value.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:59 +02:00
Heiko Carstens 0f20822a69 s390/dis: move disassembler function prototypes to proper header file
Now that the in-kernel disassembler has an own header file move the
disassembler related function prototypes to that header file.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:48 +02:00
Alexei Starovoitov d45ed4a4e3 net: fix unsafe set_memory_rw from softirq
on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&(&vb->lock)->rlock);
[   56.799593]   <Interrupt>
[   56.805889]     lock(&(&vb->lock)->rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
[   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
[   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
[   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
[   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
[   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
[   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
[   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
[   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
[   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
[   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
[   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
[   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
[   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 15:16:45 -04:00
Heiko Carstens 4784955a52 s390/bpf,jit: fix address randomization
Add misssing braces to hole calculation. This resulted in an addition
instead of an substraction. Which in turn means that the jit compiler
could try to write out of bounds of the allocated piece of memory.

This bug was introduced with aa2d2c73 "s390/bpf,jit: address randomize
and write protect jit code".

Fixes this one:

[   37.320956] Unable to handle kernel pointer dereference at virtual kernel address 000003ff80231000
[   37.320984] Oops: 0011 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   37.320993] Modules linked in: dm_multipath scsi_dh eadm_sch dm_mod ctcm fsm autofs4
[   37.321007] CPU: 28 PID: 6443 Comm: multipathd Not tainted 3.10.9-61.x.20130829-s390xdefault #1
[   37.321011] task: 0000004ada778000 ti: 0000004ae3304000 task.ti: 0000004ae3304000
[   37.321014] Krnl PSW : 0704c00180000000 000000000012d1de (bpf_jit_compile+0x198e/0x23d0)
[   37.321022]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
               Krnl GPRS: 000000004350207d 0000004a00000001 0000000000000007 000003ff80231002
[   37.321029]            0000000000000007 000003ff80230ffe 00000000a7740000 000003ff80230f76
[   37.321032]            000003ffffffffff 000003ff00000000 000003ff0000007d 000000000071e820
[   37.321035]            0000004adbe99950 000000000071ea18 0000004af3d9e7c0 0000004ae3307b80
[   37.321046] Krnl Code: 000000000012d1d0: 41305004            la      %r3,4(%r5)
                          000000000012d1d4: e330f0f80021        clg     %r3,248(%r15)
                         #000000000012d1da: a7240009            brc     2,12d1ec
                         >000000000012d1de: 50805000            st      %r8,0(%r5)
                          000000000012d1e2: e330f0f00004        lg      %r3,240(%r15)
                          000000000012d1e8: 41303004            la      %r3,4(%r3)
                          000000000012d1ec: e380f0e00004        lg      %r8,224(%r15)
                          000000000012d1f2: e330f0f00024        stg     %r3,240(%r15)
[   37.321074] Call Trace:
[   37.321077] ([<000000000012da78>] bpf_jit_compile+0x2228/0x23d0)
[   37.321083]  [<00000000006007c2>] sk_attach_filter+0xfe/0x214
[   37.321090]  [<00000000005d2d92>] sock_setsockopt+0x926/0xbdc
[   37.321097]  [<00000000005cbfb6>] SyS_setsockopt+0x8a/0xe8
[   37.321101]  [<00000000005ccaa8>] SyS_socketcall+0x264/0x364
[   37.321106]  [<0000000000713f1c>] sysc_nr_ok+0x22/0x28
[   37.321113]  [<000003fffce10ea8>] 0x3fffce10ea8
[   37.321118] INFO: lockdep is turned off.
[   37.321121] Last Breaking-Event-Address:
[   37.321124]  [<000000000012d192>] bpf_jit_compile+0x1942/0x23d0
[   37.321132]
[   37.321135] Kernel panic - not syncing: Fatal exception: panic_on_oops

Cc: stable@vger.kernel.org # v3.11
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-04 17:18:55 +02:00