Commit graph

556410 commits

Author SHA1 Message Date
Rasmus Villemoes f773f32d71 lib/kobject.c: use kvasprintf_const for formatting ->name
Sometimes kobject_set_name_vargs is called with a format string conaining
no %, or a format string of precisely "%s", where the single vararg
happens to point to .rodata.  kvasprintf_const detects these cases for us
and returns a copy of that pointer instead of duplicating the string, thus
saving some run-time memory.  Otherwise, it falls back to kvasprintf.  We
just need to always deallocate ->name using kfree_const.

Unfortunately, the dance we need to do to perform the '/' -> '!'
sanitization makes the resulting code rather ugly.

I instrumented kstrdup_const to provide some statistics on the memory
saved, and for me this gave an additional ~14KB after boot (306KB was
already saved; this patch bumped that to 320KB).  I have
KMALLOC_SHIFT_LOW==3, and since 80% of the kvasprintf_const hits were
satisfied by an 8-byte allocation, the 14K would roughly be quadrupled
when KMALLOC_SHIFT_LOW==5.  Whether these numbers are sufficient to
justify the ugliness I'll leave to others to decide.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Rasmus Villemoes 0a9df786a6 lib/kasprintf.c: introduce kvasprintf_const
This adds kvasprintf_const which tries to use kstrdup_const if possible:
If the format string contains no % characters, or if the format string is
exactly "%s", we delegate to kstrdup_const.  Otherwise, we fall back to
kvasprintf.

Just as for kstrdup_const, the main motivation is to save memory by
reusing .rodata when possible.

The return value should be freed by kfree_const, just like for
kstrdup_const.

There is deliberately no kasprintf_const: In the vast majority of cases,
the format string argument is a literal, so one can determine statically
whether one could instead use kstrdup_const directly (which would also
require one to change all corresponding kfree calls to kfree_const).

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Dmitry Vyukov 2cf12f821c lib/llist.c: fix data race in llist_del_first
llist_del_first reads entry->next, but it did not acquire visibility over
the entry node.  As the result it can get a stale value of entry->next
(e.g.  NULL or whatever garbage was there before the appending thread
wrote correct value).  And then commit that value as llist head with
cmpxchg.  That will corrupt llist.

Note there is a control-dependency between read of head->first and read of
entry->next, but it does not make the code correct.  Kernel memory model
unambiguously says: "A load-load control dependency requires a full read
memory barrier".

Use smp_load_acquire to acquire visibility over the entry node.

The data race was found with KernelThreadSanitizer (KTSAN).

Here is an example of KTSAN report:

ThreadSanitizer: data-race in llist_del_first

Read of size 1 by thread T389 (K2630, CPU0):
 [<ffffffff8156b8a9>] llist_del_first+0x39/0x70 lib/llist.c:74
 [<     inlined    >] tty_buffer_alloc drivers/tty/tty_buffer.c:181
 [<ffffffff81664af4>] __tty_buffer_request_room+0xb4/0x250 drivers/tty/tty_buffer.c:292
 [<ffffffff81664e6c>] tty_insert_flip_string_fixed_flag+0x6c/0x150 drivers/tty/tty_buffer.c:337
 [<     inlined    >] tty_insert_flip_string include/linux/tty_flip.h:35
 [<ffffffff81667422>] pty_write+0x72/0xc0 drivers/tty/pty.c:110
 [<     inlined    >] process_output_block drivers/tty/n_tty.c:611
 [<ffffffff8165c016>] n_tty_write+0x346/0x7f0 drivers/tty/n_tty.c:2401
 [<     inlined    >] do_tty_write drivers/tty/tty_io.c:1159
 [<ffffffff816568df>] tty_write+0x21f/0x3f0 drivers/tty/tty_io.c:1245
 [<ffffffff8125f00f>] __vfs_write+0x5f/0x1f0 fs/read_write.c:489
 [<ffffffff8125ff8f>] vfs_write+0xef/0x280 fs/read_write.c:538
 [<     inlined    >] SYSC_write fs/read_write.c:585
 [<ffffffff81261390>] SyS_write+0x70/0xe0 fs/read_write.c:577
 [<ffffffff81ee862e>] entry_SYSCALL_64_fastpath+0x12/0x71 arch/x86/entry/entry_64.S:186

Previous write of size 8 by thread T226 (K761, CPU0):
 [<ffffffff8156b832>] llist_add_batch+0x32/0x70 lib/llist.c:44 (discriminator 16)
 [<     inlined    >] llist_add include/linux/llist.h:180
 [<ffffffff816649fc>] tty_buffer_free+0x6c/0xb0 drivers/tty/tty_buffer.c:221
 [<ffffffff816651e7>] flush_to_ldisc+0x107/0x300 drivers/tty/tty_buffer.c:514
 [<ffffffff810b20ee>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
 [<ffffffff810b2650>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
 [<ffffffff810bbe20>] kthread+0x150/0x170 kernel/kthread.c:209
 [<ffffffff81ee8a1f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Vitaly Kuznetsov 943ba65038 lib/test-string_helpers.c: add string_get_size() tests
Add a couple of simple tests for string_get_size().  The last one will
hang the kernel without the 'lib/string_helpers.c: fix infinite loop in
string_get_size()' fix.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Alexander Kuleshov 1c78bc170f lib/halfmd4.c: use rol32 inline function in the ROUND macro
<linux/bitops.h> provides rol32() inline function, let's use already
predefined function instead of direct expression.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Martin Kepplinger 78e3c79510 arch/x86/kernel/cpu/perf_event_msr.c: use sign_extend64() for sign extension
Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Maxime Coquelin <maxime.coquelin@st.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Martin Kepplinger 06d8f8178c arch/sh/kernel/traps_64.c: use sign_extend64() for sign extension
Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Maxime Coquelin <maxime.coquelin@st.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Martin Kepplinger 48e203e21b bitops.h: add sign_extend64()
Months back, this was discussed, see https://lkml.org/lkml/2015/1/18/289
The result was the 64-bit version being "likely fine", "valuable" and
"correct".  The discussion fell asleep but since there are possible users,
let's add it.

Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Maxime Coquelin <maxime.coquelin@st.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Martin Kepplinger e2eb53aa96 bitops.h: improve sign_extend32()'s documentation
It is often overlooked that sign_extend32(), despite its name, is safe to
use for 16 and 8 bit types as well.  This should help prevent sign
extension being done manually some other way.

Signed-off-by: Martin Kepplinger <martin.kepplinger@theobroma-systems.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Maxime Coquelin <maxime.coquelin@st.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Chanwoo Choi cd2c3e7f94 MAINTAINERS: add missing extcon directory
Add the missing extcon directory to maintain them.  When using
get_maintainer.pl, the result should include the correct maintainer
information.

Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Joe Perches 2a7cb1dc82 get_maintainer: add subsystem to reviewer output
Reviewer output currently does not include the subsystem
that matched.  Add it.

Miscellanea:

o Add a get_subsystem_name routine to centralize this

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Brian Norris 4f07510df2 get_maintainer: --r (list reviewer) is on by default
We don't consistenly document the default value next to the option
listing, but we do have a list of defaults here, so let's keep it up to
date.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Brian Norris b1312bfe61 get_maintainer: add --no-foo options to --help
Many flag options are boolean and support both a positive and a negative
invocation from the command line.  Some of these are even mentioned by
example (e.g., --nogit is mentioned as a default option), but they aren't
explicitly mentioned in the list of options.  It happens that some of
these are pretty important, as they are default-on, and to turn them off,
you have to know about the --no-foo version.

Rather than clutter the whole help text with bracketed '--[no]foo', let's
just mention the general rule, a la 'man gcc'.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Brian Norris cc7ff0ef6e get_maintainer: it's '--pattern-depth', not '-pattern-depth'
Though it appears that Perl's GetOptions will take either, the latter is
not documented in the options listing.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Brian Norris 3cbcca8a80 get_maintainer: add missing documentation for --git-blame-signatures
I really haven't used this option much myself, so feel free to improve on
the documentation for it.  I just noticed it while inspecting this script
for undocumented features.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Mathias Krause 3824657c52 printk: prevent userland from spoofing kernel messages
The following statement of ABI/testing/dev-kmsg is not quite right:

   It is not possible to inject messages from userspace with the
   facility number LOG_KERN (0), to make sure that the origin of the
   messages can always be reliably determined.

Userland actually can inject messages with a facility of 0 by abusing the
fact that the facility is stored in a u8 data type.  By using a facility
which is a multiple of 256 the assignment of msg->facility in log_store()
implicitly truncates it to 0, i.e.  LOG_KERN, allowing users of /dev/kmsg
to spoof kernel messages as shown below:

The following call...
   # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg
...leads to the following log entry (dmesg -x | tail -n 1):
   user  :emerg : [   66.137758] Kernel panic - not syncing: beer empty

However, this call...
   # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg
...leads to the slightly different log entry (note the kernel facility):
   kern  :emerg : [   74.177343] Kernel panic - not syncing: beer empty

Fix that by limiting the user provided facility to 8 bit right from the
beginning and catch the truncation early.

Fixes: 7ff9554bb5 ("printk: convert byte-buffer to variable-length...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Alex Elder <elder@linaro.org>
Cc: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Rasmus Villemoes d7ec9a05d6 lib/vsprintf.c: update documentation
%n is no longer just ignored; it results in early return from vsnprintf.
Also add a request to add test cases for future %p extensions.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kees Cook 317dc34ab7 selftests: run lib/test_printf module
This runs the lib/test_printf module to make sure printf is operating
sanely.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Rasmus Villemoes 707cc7280f test_printf: test printf family at runtime
This adds a simple module for testing the kernel's printf facilities.
Previously, some %p extensions have caused a wrong return value in case
the entire output didn't fit and/or been unusable in kasprintf().  This
should help catch such issues.  Also, it should help ensure that changes
to the formatting algorithms don't break anything.

I'm not sure if we have a struct dentry or struct file lying around at
boot time or if we can fake one, but most %p extensions should be
testable, as should the ordinary number and string formatting.

The nature of vararg functions means we can't use a more conventional
table-driven approach.

For now, this is mostly a skeleton; contributions are very
welcome. Some tests are/will be slightly annoying to write, since the
expected output depends on stuff like CONFIG_*, sizeof(long), runtime
values etc.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Rasmus Villemoes 80c9eb46fa lib/vsprintf.c: remove SPECIAL handling in pointer()
As a quick

   git grep -E '%[ +0#-]*#[ +0#-]*(\*|[0-9]+)?(\.(\*|[0-9]+)?)?p'

shows, nobody uses the # flag with %p. Should one try to do so, one
will be met with

  warning: `#' flag used with `%p' gnu_printf format [-Wformat]

(POSIX and C99 both say "... For other conversion specifiers, the
behavior is undefined.". Obviously, the kernel can choose to define
the behaviour however it wants, but as long as gcc issues that
warning, users are unlikely to show up.)

Since default_width is effectively always 2*sizeof(void*), we can
simplify the prologue of pointer() and save a few instructions.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Rasmus Villemoes 762abb5154 lib/vsprintf.c: also improve sanity check in bstr_printf()
Quoting from 2aa2f9e21e ("lib/vsprintf.c: improve sanity check in
vsnprintf()"):

    On 64 bit, size may very well be huge even if bit 31 happens to be 0.
    Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a
    3 GiB one.  So cap at INT_MAX as was probably the intention all along.
    This is also the made-up value passed by sprintf and vsprintf.

I should have seen this copy-pasted instance back then, but let's just
do it now.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Rasmus Villemoes b006f19b05 lib/vsprintf.c: handle invalid format specifiers more robustly
If we meet any invalid or unsupported format specifier, 'handling' it by
just printing it as a literal string is not safe: Presumably the format
string and the arguments passed gcc's type checking, but that means
something like sprintf(buf, "%n %pd", &intvar, dentry) would end up
interpreting &intvar as a struct dentry*.

When the offending specifier was %n it used to be at the end of the format
string, but we can't rely on that always being the case.  Also, gcc
doesn't complain about some more or less exotic qualifiers (or 'length
modifiers' in posix-speak) such as 'j' or 'q', but being unrecognized by
the kernel's printf implementation, they'd be interpreted as unknown
specifiers, and the rest of arguments would be interpreted wrongly.

So let's complain about anything we don't understand, not just %n, and
stop pretending that we'd be able to make sense of the rest of the
format/arguments.  If the offending specifier is in a printk() call we
unfortunately only get a "BUG: recent printk recursion!", but at least
direct users of the sprintf family will be caught.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Martin Kletzander 5e4ee7b13b printk: synchronize %p formatting documentation
Move all pointer-formatting documentation to one place in the code and one
place in the documentation instead of keeping it in three places with
different level of completeness.  Documentation/printk-formats.txt has
detailed information about each modifier, docstring above pointer() has
short descriptions of them (as that is the function dealing with %p) and
docstring above vsprintf() is removed as redundant.  Both docstrings in
the code that were modified are updated with a reminder of updating the
documentation upon any further change.

[akpm@linux-foundation.org: fix comment]
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Rasmus Villemoes 3e406b1d7c lib/dynamic_debug.c: use kstrdup_const
Using kstrdup_const, thus reusing .rodata when possible, saves around 2 kB
of runtime memory on my laptop/.config combination.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Andrew Morton eac44a5e07 fs/jffs2/wbuf.c: remove stray semicolon
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Andrew Morton 9add850c21 include/linux/compiler-gcc.h: improve __visible documentation
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Oleg Nesterov 54708d2858 proc: actually make proc_fd_permission() thread-friendly
The commit 96d0df79f2 ("proc: make proc_fd_permission() thread-friendly")
fixed the access to /proc/self/fd from sub-threads, but introduced another
problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd
if generic_permission() fails.

Change proc_fd_permission() to check same_thread_group(pid_task(), current).

Fixes: 96d0df79f2 ("proc: make proc_fd_permission() thread-friendly")
Reported-by: "Jin, Yihua" <yihua.jin@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Andy Shevchenko 3a49f3d2a1 fs/proc/array.c: set overflow flag in case of error
For now in task_name() we ignore the return code of string_escape_str()
call.  This is not good if buffer suddenly becomes not big enough.  Do the
proper error handling there.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kirill A. Shutemov 1965c8b7ac mm: use 'unsigned int' for compound_dtor/compound_order on 64BIT
On 64 bit system we have enough space in struct page to encode
compound_dtor and compound_order with unsigned int.

On x86-64 it leads to slightly smaller code size due usesage of plain
MOV instead of MOVZX (zero-extended move) or similar effect.

allyesconfig:

   text	   data	    bss	    dec	    hex	filename
159520446	48146736	72196096	279863278	10ae5fee	vmlinux.pre
159520382	48146736	72196096	279863214	10ae5fae	vmlinux.post

On other architectures without native support of 16-bit data types the

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kirill A. Shutemov d00181b96e mm: use 'unsigned int' for page order
Let's try to be consistent about data type of page order.

[sfr@canb.auug.org.au: fix build (type of pageblock_order)]
[hughd@google.com: some configs end up with MAX_ORDER and pageblock_order having different types]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kirill A. Shutemov 1d798ca3f1 mm: make compound_head() robust
Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:

	CPU0					CPU1

isolate_migratepages_block()
  page_count()
    compound_head()
      !!PageTail() == true
					put_page()
					  tail->first_page = NULL
      head = tail->first_page
					alloc_pages(__GFP_COMP)
					   prep_compound_page()
					     tail->first_page = head
					     __SetPageTail(p);
      !!PageTail() == true
    <head == NULL dereferencing>

The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.

We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.

The patch introduces page->compound_head into third double word block in
front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
the rest bits are pointer to head page if bit zero is set.

The patch moves page->pmd_huge_pte out of word, just in case if an
architecture defines pgtable_t into something what can have the bit 0
set.

hugetlb_cgroup uses page->lru.next in the second tail page to store
pointer struct hugetlb_cgroup. The patch switch it to use page->private
in the second tail page instead. The space is free since ->first_page is
removed from the union.

The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
limitation, since there's now space in first tail page to store struct
hugetlb_cgroup pointer. But that's out of scope of the patch.

That means page->compound_head shares storage space with:

 - page->lru.next;
 - page->next;
 - page->rcu_head.next;

That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word.

page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
call_rcu_lazy() is not allowed as it makes use of the bit and we can
get false positive PageTail().

[1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.com

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kirill A. Shutemov f1e61557f0 mm: pack compound_dtor and compound_order into one word in struct page
The patch halves space occupied by compound_dtor and compound_order in
struct page.

For compound_order, it's trivial long -> short conversion.

For get_compound_page_dtor(), we now use hardcoded table for destructor
lookup and store its index in the struct page instead of direct pointer
to destructor. It shouldn't be a big trouble to maintain the table: we
have only two destructor and NULL currently.

This patch free up one word in tail pages for reuse. This is preparation
for the next patch.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kirill A. Shutemov 32e7ba1ea1 zsmalloc: use page->private instead of page->first_page
We are going to rework how compound_head() work. It will not use
page->first_page as we have it now.

The only other user of page->first_page beyond compound pages is
zsmalloc.

Let's use page->private instead of page->first_page here. It occupies
the same storage space.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kirill A. Shutemov bc4f610d5a slab, slub: use page->rcu_head instead of page->lru plus cast
We have properly typed page->rcu_head, no need to cast page->lru.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Kirill A. Shutemov 474e4eeaf2 mm: drop page->slab_page
Since 8456a648cf ("slab: use struct page for slab management") nobody
uses slab_page field in struct page.

Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Sergey Senozhatsky 6fe5186f0c zsmalloc: reduce size_class memory usage
Each `struct size_class' contains `struct zs_size_stat': an array of
NR_ZS_STAT_TYPE `unsigned long'.  For zsmalloc built with no
CONFIG_ZSMALLOC_STAT this results in a waste of `2 * sizeof(unsigned
long)' per-class.

The patch removes unneeded `struct zs_size_stat' members by redefining
NR_ZS_STAT_TYPE (max stat idx in array).

Since both NR_ZS_STAT_TYPE and zs_stat_type are compile time constants,
GCC can eliminate zs_stat_inc()/zs_stat_dec() calls that use zs_stat_type
larger than NR_ZS_STAT_TYPE: CLASS_ALMOST_EMPTY and CLASS_ALMOST_FULL at
the moment.

./scripts/bloat-o-meter mm/zsmalloc.o.old mm/zsmalloc.o.new
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-39 (-39)
function                                     old     new   delta
fix_fullness_group                            97      94      -3
insert_zspage                                100      86     -14
remove_zspage                                141     119     -22

To summarize:
a) each class now uses less memory
b) we avoid a number of dec/inc stats (a minor optimization,
   but still).

The gain will increase once we introduce additional stats.

A simple IO test.

iozone -t 4 -R -r 32K -s 60M -I +Z
                        patched                 base
"  Initial write "       4145599.06              4127509.75
"        Rewrite "       4146225.94              4223618.50
"           Read "      17157606.00             17211329.50
"        Re-read "      17380428.00             17267650.50
"   Reverse Read "      16742768.00             16162732.75
"    Stride read "      16586245.75             16073934.25
"    Random read "      16349587.50             15799401.75
" Mixed workload "      10344230.62              9775551.50
"   Random write "       4277700.62              4260019.69
"         Pwrite "       4302049.12              4313703.88
"          Pread "       6164463.16              6126536.72
"         Fwrite "       7131195.00              6952586.00
"          Fread "      12682602.25             12619207.50

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hui Zhu 6f0b22760b mm/zsmalloc.c: remove useless line in obj_free()
Signed-off-by: Hui Zhu <zhuhui@xiaomi.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Sergey Senozhatsky 2c35169572 zsmalloc: don't test shrinker_enabled in zs_shrinker_count()
We don't let user to disable shrinker in zsmalloc (once it's been
enabled), so no need to check ->shrinker_enabled in zs_shrinker_count(),
at the moment at least.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Sergey Senozhatsky 759b26b298 zsmalloc: use preempt.h for in_interrupt()
A cosmetic change.

Commit c60369f011 ("staging: zsmalloc: prevent mappping in interrupt
context") added in_interrupt() check to zs_map_object() and 'hardirq.h'
include; but in_interrupt() macro is defined in 'preempt.h' not in
'hardirq.h', so include it instead.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hui Zhu 12a7bfad58 zsmalloc: fix obj_to_head use page_private(page) as value but not pointer
In obj_malloc():

	if (!class->huge)
		/* record handle in the header of allocated chunk */
		link->handle = handle;
	else
		/* record handle in first_page->private */
		set_page_private(first_page, handle);

In the hugepage we save handle to private directly.

But in obj_to_head():

	if (class->huge) {
		VM_BUG_ON(!is_first_page(page));
		return *(unsigned long *)page_private(page);
	} else
		return *(unsigned long *)obj;

It is used as a pointer.

The reason why there is no problem until now is huge-class page is born
with ZS_FULL so it can't be migrated.  However, we need this patch for
future work: "VM-aware zsmalloced page migration" to reduce external
fragmentation.

Signed-off-by: Hui Zhu <zhuhui@xiaomi.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Hui Zhu 8f958c98f2 zsmalloc: add comments for ->inuse to zspage
[akpm@linux-foundation.org: fix grammar]
Signed-off-by: Hui Zhu <zhuhui@xiaomi.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Sergey SENOZHATSKY 6f3526d6db mm: zsmalloc: constify struct zs_pool name
Constify `struct zs_pool' ->name.

[akpm@inux-foundation.org: constify zpool_create_pool()'s `type' arg also]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Dan Streetman 69e18f4dbe zpool: remove redundant zpool->type string, const-ify zpool_get_type
Make the return type of zpool_get_type const; the string belongs to the
zpool driver and should not be modified.  Remove the redundant type field
in the struct zpool; it is private to zpool.c and isn't needed since
->driver->type can be used directly.  Add comments indicating strings must
be null-terminated.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Dan Streetman c99b42c352 zswap: use charp for zswap param strings
Instead of using a fixed-length string for the zswap params, use charp.
This simplifies the code and uses less memory, as most zswap param strings
will be less than the current maximum length.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Dan Streetman 3d9c637f4a module: export param_free_charp()
Change the param_free_charp() function from static to exported.

It is used by zswap in the next patch ("zswap: use charp for zswap param
strings").

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Alexey Klimov b0c9865fd2 mm/zswap.c: remove unneeded initialization to NULL in zswap_entry_find_get()
On the next line entry variable will be re-initialized so no need to init
it with NULL.

Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Geliang Tang 1c53e0d273 zram: make is_partial_io/valid_io_request/page_zero_filled return boolean
Make is_partial_io()/valid_io_request()/page_zero_filled() return boolean,
since each function only uses either one or zero as its return value.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Sergey SENOZHATSKY 1237275580 zram: keep the exact overcommited value in mem_used_max
`mem_used_max' is designed to store the max amount of memory zram consumed
to store the data.  However, it does not represent the actual
'overcommited' (max) value.  The existing code goes to -ENOMEM
overcommited case before it updates `->stats.max_used_pages', which hides
the reason we went to -ENOMEM in the first place -- we actually used more
memory than `->limit_pages':

        alloced_pages = zs_get_total_pages(meta->mem_pool);
        if (zram->limit_pages && alloced_pages > zram->limit_pages) {
                zs_free(meta->mem_pool, handle);
                ret = -ENOMEM;
                goto out;
        }

        update_used_max(zram, alloced_pages);

Which is misleading.  User will see -ENOMEM, check `->limit_pages', check
`->stats.max_used_pages', which will keep the value BEFORE zram passed
`->limit_pages', and see:
	`->stats.max_used_pages' < `->limit_pages'

Move update_used_max() before we do `->limit_pages' check, so that
user will see:
	`->stats.max_used_pages' > `->limit_pages'
should the overcommit and -ENOMEM happen.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Luis Henriques 1d5b43bfb6 zram: introduce comp algorithm fallback functionality
When the user supplies an unsupported compression algorithm, keep the
previously selected one (knowingly supported) or the default one (if the
compression algorithm hasn't been changed yet).

Note that previously this operation (i.e. setting an invalid algorithm)
would result in no algorithm being selected, which means that this
represents a small change in the default behaviour.

Minchan said:

For initializing zram, we need to set up 3 optional parameters in advance.

1. the number of compression streams
2. memory limitation
3. compression algorithm

Although user pass completely wrong value to set up for 1 and 2
parameters, it's okay because they have default value so zram will be
initialized with the default value (of course, when user passes a wrong
value via *echo*, sysfs returns -EINVAL so the user can notice it).

But 3 is not consistent with other optional parameters.  IOW, if the
user passes a wrong value to set up 3 parameter, zram's initialization
would fail unlike other optional parameters.

So this patch makes them consistent.

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Andrew Morton 6f6461562e mm/memcontrol.c: uninline mem_cgroup_usage
gcc version 5.2.1 20151010 (Debian 5.2.1-22)
$ size mm/memcontrol.o mm/memcontrol.o.before
   text    data     bss     dec     hex filename
  35535    7908      64   43507    a9f3 mm/memcontrol.o
  35762    7908      64   43734    aad6 mm/memcontrol.o.before

Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00