1
0
Fork 0

These are the locking updates for v5.10:

- Add deadlock detection for recursive read-locks. The rationale is outlined
    in:
 
      224ec489d3cd: ("lockdep/Documention: Recursive read lock detection reasoning")
 
    The main deadlock pattern we want to detect is:
 
            TASK A:                 TASK B:
 
            read_lock(X);
                                    write_lock(X);
            read_lock_2(X);
 
  - Add "latch sequence counters" (seqcount_latch_t):
 
       A sequence counter variant where the counter even/odd value is used to
       switch between two copies of protected data. This allows the read path,
       typically NMIs, to safely interrupt the write side critical section.
 
    We utilize this new variant for sched-clock, and to make x86 TSC handling safer.
 
  - Other seqlock cleanups, fixes and enhancements
 
  - KCSAN updates
 
  - LKMM updates
 
  - Misc updates, cleanups and fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+EX6QRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g3gxAAkg+Jy/tcdRxlxlEDOQPFy1mBqvFmulNA
 pGFPkB6dzqmAWF/NfOZSl4g/h/mqGYsq2V+PfK5E8Sq8DQ/yCmnLhjgVOHNUUliv
 x0WWfOysNgJdtdf69NLYJufIQhxhyI0dwFHHoHIsCdGdGqjh2DVevQFPFTBjdpOc
 BUZYo+u3gCaCdB6A2nmlcWYbEw8eVEHgv3qLG6dq46J0KJOV0HfliqJoU3EZqH+s
 977LvEIo+THfuYWMo/Jepwngbi0y36KeeukOAdwm9fK196htBHIUR+YPPrAe+FWD
 z+UXP5IS5XIw9V1sGLmUaC2m+6gpdW19jKBtlzPkxHXmJmsgiZdLLeytEh3WYey7
 nzfH+9Jd4NyyZKucLssYkOjf6P5BxGKCyJ9LXb7vlSthIhiDdFNx47oKtW4hxjOY
 jubsI3BP5c3G1sIBIjTS53XmOhJg+Z52FxTpQ33JswXn1wGidcHZiuNHZuU5q28p
 +tn8rGb2NGJFb4Sw/Vp0yTcqIpEXf+vweiQoaxm6tc9BWzcVzZntGnh0i3gFotx/
 VgKafN4+pgXgo6bwHbN2WBK2FGyvcXFaptfaOMZL48En82hJ1DI6EnBEYN+vuERQ
 JcCXg+iHeeVbxoou7q8NJxITkBmEL5xNBIugXRRqNSP3fXLxKjFuPYqT84/e7yZi
 elGTReYcq6g=
 =Iq51
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:
 "These are the locking updates for v5.10:

   - Add deadlock detection for recursive read-locks.

     The rationale is outlined in commit 224ec489d3 ("lockdep/
     Documention: Recursive read lock detection reasoning")

     The main deadlock pattern we want to detect is:

           TASK A:                 TASK B:

           read_lock(X);
                                   write_lock(X);
           read_lock_2(X);

   - Add "latch sequence counters" (seqcount_latch_t):

     A sequence counter variant where the counter even/odd value is used
     to switch between two copies of protected data. This allows the
     read path, typically NMIs, to safely interrupt the write side
     critical section.

     We utilize this new variant for sched-clock, and to make x86 TSC
     handling safer.

   - Other seqlock cleanups, fixes and enhancements

   - KCSAN updates

   - LKMM updates

   - Misc updates, cleanups and fixes"

* tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"
  lockdep: Fix lockdep recursion
  lockdep: Fix usage_traceoverflow
  locking/atomics: Check atomic-arch-fallback.h too
  locking/seqlock: Tweak DEFINE_SEQLOCK() kernel doc
  lockdep: Optimize the memory usage of circular queue
  seqlock: Unbreak lockdep
  seqlock: PREEMPT_RT: Do not starve seqlock_t writers
  seqlock: seqcount_LOCKNAME_t: Introduce PREEMPT_RT support
  seqlock: seqcount_t: Implement all read APIs as statement expressions
  seqlock: Use unique prefix for seqcount_t property accessors
  seqlock: seqcount_LOCKNAME_t: Standardize naming convention
  seqlock: seqcount latch APIs: Only allow seqcount_latch_t
  rbtree_latch: Use seqcount_latch_t
  x86/tsc: Use seqcount_latch_t
  timekeeping: Use seqcount_latch_t
  time/sched_clock: Use seqcount_latch_t
  seqlock: Introduce seqcount_latch_t
  mm/swap: Do not abuse the seqcount_t latching API
  time/sched_clock: Use raw_read_seqcount_latch() during suspend
  ...
pull/193/head
Linus Torvalds 2020-10-12 13:06:20 -07:00
commit ed016af52e
38 changed files with 3914 additions and 1006 deletions

View File

@ -392,3 +392,261 @@ Run the command and save the output, then compare against the output from
a later run of this command to identify the leakers. This same output
can also help you find situations where runtime lock initialization has
been omitted.
Recursive read locks:
---------------------
The whole of the rest document tries to prove a certain type of cycle is equivalent
to deadlock possibility.
There are three types of lockers: writers (i.e. exclusive lockers, like
spin_lock() or write_lock()), non-recursive readers (i.e. shared lockers, like
down_read()) and recursive readers (recursive shared lockers, like rcu_read_lock()).
And we use the following notations of those lockers in the rest of the document:
W or E: stands for writers (exclusive lockers).
r: stands for non-recursive readers.
R: stands for recursive readers.
S: stands for all readers (non-recursive + recursive), as both are shared lockers.
N: stands for writers and non-recursive readers, as both are not recursive.
Obviously, N is "r or W" and S is "r or R".
Recursive readers, as their name indicates, are the lockers allowed to acquire
even inside the critical section of another reader of the same lock instance,
in other words, allowing nested read-side critical sections of one lock instance.
While non-recursive readers will cause a self deadlock if trying to acquire inside
the critical section of another reader of the same lock instance.
The difference between recursive readers and non-recursive readers is because:
recursive readers get blocked only by a write lock *holder*, while non-recursive
readers could get blocked by a write lock *waiter*. Considering the follow example:
TASK A: TASK B:
read_lock(X);
write_lock(X);
read_lock_2(X);
Task A gets the reader (no matter whether recursive or non-recursive) on X via
read_lock() first. And when task B tries to acquire writer on X, it will block
and become a waiter for writer on X. Now if read_lock_2() is recursive readers,
task A will make progress, because writer waiters don't block recursive readers,
and there is no deadlock. However, if read_lock_2() is non-recursive readers,
it will get blocked by writer waiter B, and cause a self deadlock.
Block conditions on readers/writers of the same lock instance:
--------------------------------------------------------------
There are simply four block conditions:
1. Writers block other writers.
2. Readers block writers.
3. Writers block both recursive readers and non-recursive readers.
4. And readers (recursive or not) don't block other recursive readers but
may block non-recursive readers (because of the potential co-existing
writer waiters)
Block condition matrix, Y means the row blocks the column, and N means otherwise.
| E | r | R |
+---+---+---+---+
E | Y | Y | Y |
+---+---+---+---+
r | Y | Y | N |
+---+---+---+---+
R | Y | Y | N |
(W: writers, r: non-recursive readers, R: recursive readers)
acquired recursively. Unlike non-recursive read locks, recursive read locks
only get blocked by current write lock *holders* other than write lock
*waiters*, for example:
TASK A: TASK B:
read_lock(X);
write_lock(X);
read_lock(X);
is not a deadlock for recursive read locks, as while the task B is waiting for
the lock X, the second read_lock() doesn't need to wait because it's a recursive
read lock. However if the read_lock() is non-recursive read lock, then the above
case is a deadlock, because even if the write_lock() in TASK B cannot get the
lock, but it can block the second read_lock() in TASK A.
Note that a lock can be a write lock (exclusive lock), a non-recursive read
lock (non-recursive shared lock) or a recursive read lock (recursive shared
lock), depending on the lock operations used to acquire it (more specifically,
the value of the 'read' parameter for lock_acquire()). In other words, a single
lock instance has three types of acquisition depending on the acquisition
functions: exclusive, non-recursive read, and recursive read.
To be concise, we call that write locks and non-recursive read locks as
"non-recursive" locks and recursive read locks as "recursive" locks.
Recursive locks don't block each other, while non-recursive locks do (this is
even true for two non-recursive read locks). A non-recursive lock can block the
corresponding recursive lock, and vice versa.
A deadlock case with recursive locks involved is as follow:
TASK A: TASK B:
read_lock(X);
read_lock(Y);
write_lock(Y);
write_lock(X);
Task A is waiting for task B to read_unlock() Y and task B is waiting for task
A to read_unlock() X.
Dependency types and strong dependency paths:
---------------------------------------------
Lock dependencies record the orders of the acquisitions of a pair of locks, and
because there are 3 types for lockers, there are, in theory, 9 types of lock
dependencies, but we can show that 4 types of lock dependencies are enough for
deadlock detection.
For each lock dependency:
L1 -> L2
, which means lockdep has seen L1 held before L2 held in the same context at runtime.
And in deadlock detection, we care whether we could get blocked on L2 with L1 held,
IOW, whether there is a locker L3 that L1 blocks L3 and L2 gets blocked by L3. So
we only care about 1) what L1 blocks and 2) what blocks L2. As a result, we can combine
recursive readers and non-recursive readers for L1 (as they block the same types) and
we can combine writers and non-recursive readers for L2 (as they get blocked by the
same types).
With the above combination for simplification, there are 4 types of dependency edges
in the lockdep graph:
1) -(ER)->: exclusive writer to recursive reader dependency, "X -(ER)-> Y" means
X -> Y and X is a writer and Y is a recursive reader.
2) -(EN)->: exclusive writer to non-recursive locker dependency, "X -(EN)-> Y" means
X -> Y and X is a writer and Y is either a writer or non-recursive reader.
3) -(SR)->: shared reader to recursive reader dependency, "X -(SR)-> Y" means
X -> Y and X is a reader (recursive or not) and Y is a recursive reader.
4) -(SN)->: shared reader to non-recursive locker dependency, "X -(SN)-> Y" means
X -> Y and X is a reader (recursive or not) and Y is either a writer or
non-recursive reader.
Note that given two locks, they may have multiple dependencies between them, for example:
TASK A:
read_lock(X);
write_lock(Y);
...
TASK B:
write_lock(X);
write_lock(Y);
, we have both X -(SN)-> Y and X -(EN)-> Y in the dependency graph.
We use -(xN)-> to represent edges that are either -(EN)-> or -(SN)->, the
similar for -(Ex)->, -(xR)-> and -(Sx)->
A "path" is a series of conjunct dependency edges in the graph. And we define a
"strong" path, which indicates the strong dependency throughout each dependency
in the path, as the path that doesn't have two conjunct edges (dependencies) as
-(xR)-> and -(Sx)->. In other words, a "strong" path is a path from a lock
walking to another through the lock dependencies, and if X -> Y -> Z is in the
path (where X, Y, Z are locks), and the walk from X to Y is through a -(SR)-> or
-(ER)-> dependency, the walk from Y to Z must not be through a -(SN)-> or
-(SR)-> dependency.
We will see why the path is called "strong" in next section.
Recursive Read Deadlock Detection:
----------------------------------
We now prove two things:
Lemma 1:
If there is a closed strong path (i.e. a strong circle), then there is a
combination of locking sequences that causes deadlock. I.e. a strong circle is
sufficient for deadlock detection.
Lemma 2:
If there is no closed strong path (i.e. strong circle), then there is no
combination of locking sequences that could cause deadlock. I.e. strong
circles are necessary for deadlock detection.
With these two Lemmas, we can easily say a closed strong path is both sufficient
and necessary for deadlocks, therefore a closed strong path is equivalent to
deadlock possibility. As a closed strong path stands for a dependency chain that
could cause deadlocks, so we call it "strong", considering there are dependency
circles that won't cause deadlocks.
Proof for sufficiency (Lemma 1):
Let's say we have a strong circle:
L1 -> L2 ... -> Ln -> L1
, which means we have dependencies:
L1 -> L2
L2 -> L3
...
Ln-1 -> Ln
Ln -> L1
We now can construct a combination of locking sequences that cause deadlock:
Firstly let's make one CPU/task get the L1 in L1 -> L2, and then another get
the L2 in L2 -> L3, and so on. After this, all of the Lx in Lx -> Lx+1 are
held by different CPU/tasks.
And then because we have L1 -> L2, so the holder of L1 is going to acquire L2
in L1 -> L2, however since L2 is already held by another CPU/task, plus L1 ->
L2 and L2 -> L3 are not -(xR)-> and -(Sx)-> (the definition of strong), which
means either L2 in L1 -> L2 is a non-recursive locker (blocked by anyone) or
the L2 in L2 -> L3, is writer (blocking anyone), therefore the holder of L1
cannot get L2, it has to wait L2's holder to release.
Moreover, we can have a similar conclusion for L2's holder: it has to wait L3's
holder to release, and so on. We now can prove that Lx's holder has to wait for
Lx+1's holder to release, and note that Ln+1 is L1, so we have a circular
waiting scenario and nobody can get progress, therefore a deadlock.
Proof for necessary (Lemma 2):
Lemma 2 is equivalent to: If there is a deadlock scenario, then there must be a
strong circle in the dependency graph.
According to Wikipedia[1], if there is a deadlock, then there must be a circular
waiting scenario, means there are N CPU/tasks, where CPU/task P1 is waiting for
a lock held by P2, and P2 is waiting for a lock held by P3, ... and Pn is waiting
for a lock held by P1. Let's name the lock Px is waiting as Lx, so since P1 is waiting
for L1 and holding Ln, so we will have Ln -> L1 in the dependency graph. Similarly,
we have L1 -> L2, L2 -> L3, ..., Ln-1 -> Ln in the dependency graph, which means we
have a circle:
Ln -> L1 -> L2 -> ... -> Ln
, and now let's prove the circle is strong:
For a lock Lx, Px contributes the dependency Lx-1 -> Lx and Px+1 contributes
the dependency Lx -> Lx+1, and since Px is waiting for Px+1 to release Lx,
so it's impossible that Lx on Px+1 is a reader and Lx on Px is a recursive
reader, because readers (no matter recursive or not) don't block recursive
readers, therefore Lx-1 -> Lx and Lx -> Lx+1 cannot be a -(xR)-> -(Sx)-> pair,
and this is true for any lock in the circle, therefore, the circle is strong.
References:
-----------
[1]: https://en.wikipedia.org/wiki/Deadlock
[2]: Shibu, K. (2009). Intro To Embedded Systems (1st ed.). Tata McGraw-Hill

View File

@ -139,6 +139,24 @@ with the associated LOCKTYPE lock acquired.
Read path: same as in :ref:`seqcount_t`.
.. _seqcount_latch_t:
Latch sequence counters (``seqcount_latch_t``)
----------------------------------------------
Latch sequence counters are a multiversion concurrency control mechanism
where the embedded seqcount_t counter even/odd value is used to switch
between two copies of protected data. This allows the sequence counter
read path to safely interrupt its own write side critical section.
Use seqcount_latch_t when the write side sections cannot be protected
from interruption by readers. This is typically the case when the read
side can be invoked from NMI handlers.
Check `raw_write_seqcount_latch()` for more information.
.. _seqlock_t:
Sequential locks (``seqlock_t``)

View File

@ -54,7 +54,7 @@ struct clocksource *art_related_clocksource;
struct cyc2ns {
struct cyc2ns_data data[2]; /* 0 + 2*16 = 32 */
seqcount_t seq; /* 32 + 4 = 36 */
seqcount_latch_t seq; /* 32 + 4 = 36 */
}; /* fits one cacheline */
@ -73,14 +73,14 @@ __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
preempt_disable_notrace();
do {
seq = this_cpu_read(cyc2ns.seq.sequence);
seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
idx = seq & 1;
data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
data->cyc2ns_mul = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
data->cyc2ns_shift = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
} while (unlikely(seq != this_cpu_read(cyc2ns.seq.sequence)));
} while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
}
__always_inline void cyc2ns_read_end(void)
@ -186,7 +186,7 @@ static void __init cyc2ns_init_boot_cpu(void)
{
struct cyc2ns *c2n = this_cpu_ptr(&cyc2ns);
seqcount_init(&c2n->seq);
seqcount_latch_init(&c2n->seq);
__set_cyc2ns_scale(tsc_khz, smp_processor_id(), rdtsc());
}
@ -203,7 +203,7 @@ static void __init cyc2ns_init_secondary_cpus(void)
for_each_possible_cpu(cpu) {
if (cpu != this_cpu) {
seqcount_init(&c2n->seq);
seqcount_latch_init(&c2n->seq);
c2n = per_cpu_ptr(&cyc2ns, cpu);
c2n->data[0] = data[0];
c2n->data[1] = data[1];

File diff suppressed because it is too large Load Diff

View File

@ -67,7 +67,7 @@ static inline void change_bit(long nr, volatile unsigned long *addr)
*/
static inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
{
instrument_atomic_write(addr + BIT_WORD(nr), sizeof(long));
instrument_atomic_read_write(addr + BIT_WORD(nr), sizeof(long));
return arch_test_and_set_bit(nr, addr);
}
@ -80,7 +80,7 @@ static inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
*/
static inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
{
instrument_atomic_write(addr + BIT_WORD(nr), sizeof(long));
instrument_atomic_read_write(addr + BIT_WORD(nr), sizeof(long));
return arch_test_and_clear_bit(nr, addr);
}
@ -93,7 +93,7 @@ static inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
*/
static inline bool test_and_change_bit(long nr, volatile unsigned long *addr)
{
instrument_atomic_write(addr + BIT_WORD(nr), sizeof(long));
instrument_atomic_read_write(addr + BIT_WORD(nr), sizeof(long));
return arch_test_and_change_bit(nr, addr);
}

View File

@ -52,7 +52,7 @@ static inline void __clear_bit_unlock(long nr, volatile unsigned long *addr)
*/
static inline bool test_and_set_bit_lock(long nr, volatile unsigned long *addr)
{
instrument_atomic_write(addr + BIT_WORD(nr), sizeof(long));
instrument_atomic_read_write(addr + BIT_WORD(nr), sizeof(long));
return arch_test_and_set_bit_lock(nr, addr);
}

View File

@ -58,6 +58,30 @@ static inline void __change_bit(long nr, volatile unsigned long *addr)
arch___change_bit(nr, addr);
}
static inline void __instrument_read_write_bitop(long nr, volatile unsigned long *addr)
{
if (IS_ENABLED(CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC)) {
/*
* We treat non-atomic read-write bitops a little more special.
* Given the operations here only modify a single bit, assuming
* non-atomicity of the writer is sufficient may be reasonable
* for certain usage (and follows the permissible nature of the
* assume-plain-writes-atomic rule):
* 1. report read-modify-write races -> check read;
* 2. do not report races with marked readers, but do report
* races with unmarked readers -> check "atomic" write.
*/
kcsan_check_read(addr + BIT_WORD(nr), sizeof(long));
/*
* Use generic write instrumentation, in case other sanitizers
* or tools are enabled alongside KCSAN.
*/
instrument_write(addr + BIT_WORD(nr), sizeof(long));
} else {
instrument_read_write(addr + BIT_WORD(nr), sizeof(long));
}
}
/**
* __test_and_set_bit - Set a bit and return its old value
* @nr: Bit to set
@ -68,7 +92,7 @@ static inline void __change_bit(long nr, volatile unsigned long *addr)
*/
static inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
{
instrument_write(addr + BIT_WORD(nr), sizeof(long));
__instrument_read_write_bitop(nr, addr);
return arch___test_and_set_bit(nr, addr);
}
@ -82,7 +106,7 @@ static inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
*/
static inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
{
instrument_write(addr + BIT_WORD(nr), sizeof(long));
__instrument_read_write_bitop(nr, addr);
return arch___test_and_clear_bit(nr, addr);
}
@ -96,7 +120,7 @@ static inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
*/
static inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
{
instrument_write(addr + BIT_WORD(nr), sizeof(long));
__instrument_read_write_bitop(nr, addr);
return arch___test_and_change_bit(nr, addr);
}

View File

@ -42,6 +42,21 @@ static __always_inline void instrument_write(const volatile void *v, size_t size
kcsan_check_write(v, size);
}
/**
* instrument_read_write - instrument regular read-write access
*
* Instrument a regular write access. The instrumentation should be inserted
* before the actual write happens.
*
* @ptr address of access
* @size size of access
*/
static __always_inline void instrument_read_write(const volatile void *v, size_t size)
{
kasan_check_write(v, size);
kcsan_check_read_write(v, size);
}
/**
* instrument_atomic_read - instrument atomic read access
*
@ -72,6 +87,21 @@ static __always_inline void instrument_atomic_write(const volatile void *v, size
kcsan_check_atomic_write(v, size);
}
/**
* instrument_atomic_read_write - instrument atomic read-write access
*
* Instrument an atomic read-write access. The instrumentation should be
* inserted before the actual write happens.
*
* @ptr address of access
* @size size of access
*/
static __always_inline void instrument_atomic_read_write(const volatile void *v, size_t size)
{
kasan_check_write(v, size);
kcsan_check_atomic_read_write(v, size);
}
/**
* instrument_copy_to_user - instrument reads of copy_to_user
*

View File

@ -7,19 +7,13 @@
#include <linux/compiler_attributes.h>
#include <linux/types.h>
/*
* ACCESS TYPE MODIFIERS
*
* <none>: normal read access;
* WRITE : write access;
* ATOMIC: access is atomic;
* ASSERT: access is not a regular access, but an assertion;
* SCOPED: access is a scoped access;
*/
#define KCSAN_ACCESS_WRITE 0x1
#define KCSAN_ACCESS_ATOMIC 0x2
#define KCSAN_ACCESS_ASSERT 0x4
#define KCSAN_ACCESS_SCOPED 0x8
/* Access types -- if KCSAN_ACCESS_WRITE is not set, the access is a read. */
#define KCSAN_ACCESS_WRITE (1 << 0) /* Access is a write. */
#define KCSAN_ACCESS_COMPOUND (1 << 1) /* Compounded read-write instrumentation. */
#define KCSAN_ACCESS_ATOMIC (1 << 2) /* Access is atomic. */
/* The following are special, and never due to compiler instrumentation. */
#define KCSAN_ACCESS_ASSERT (1 << 3) /* Access is an assertion. */
#define KCSAN_ACCESS_SCOPED (1 << 4) /* Access is a scoped access. */
/*
* __kcsan_*: Always calls into the runtime when KCSAN is enabled. This may be used
@ -204,6 +198,15 @@ static inline void __kcsan_disable_current(void) { }
#define __kcsan_check_write(ptr, size) \
__kcsan_check_access(ptr, size, KCSAN_ACCESS_WRITE)
/**
* __kcsan_check_read_write - check regular read-write access for races
*
* @ptr: address of access
* @size: size of access
*/
#define __kcsan_check_read_write(ptr, size) \
__kcsan_check_access(ptr, size, KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE)
/**
* kcsan_check_read - check regular read access for races
*
@ -221,18 +224,30 @@ static inline void __kcsan_disable_current(void) { }
#define kcsan_check_write(ptr, size) \
kcsan_check_access(ptr, size, KCSAN_ACCESS_WRITE)
/**
* kcsan_check_read_write - check regular read-write access for races
*
* @ptr: address of access
* @size: size of access
*/
#define kcsan_check_read_write(ptr, size) \
kcsan_check_access(ptr, size, KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE)
/*
* Check for atomic accesses: if atomic accesses are not ignored, this simply
* aliases to kcsan_check_access(), otherwise becomes a no-op.
*/
#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
#define kcsan_check_atomic_read(...) do { } while (0)
#define kcsan_check_atomic_write(...) do { } while (0)
#define kcsan_check_atomic_read(...) do { } while (0)
#define kcsan_check_atomic_write(...) do { } while (0)
#define kcsan_check_atomic_read_write(...) do { } while (0)
#else
#define kcsan_check_atomic_read(ptr, size) \
kcsan_check_access(ptr, size, KCSAN_ACCESS_ATOMIC)
#define kcsan_check_atomic_write(ptr, size) \
kcsan_check_access(ptr, size, KCSAN_ACCESS_ATOMIC | KCSAN_ACCESS_WRITE)
#define kcsan_check_atomic_read_write(ptr, size) \
kcsan_check_access(ptr, size, KCSAN_ACCESS_ATOMIC | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_COMPOUND)
#endif
/**

View File

@ -54,7 +54,11 @@ struct lock_list {
struct lock_class *class;
struct lock_class *links_to;
const struct lock_trace *trace;
int distance;
u16 distance;
/* bitmap of different dependencies from head to this */
u8 dep;
/* used by BFS to record whether "prev -> this" only has -(*R)-> */
u8 only_xr;
/*
* The parent field is used to implement breadth-first search, and the
@ -469,6 +473,20 @@ static inline void print_irqtrace_events(struct task_struct *curr)
}
#endif
/* Variable used to make lockdep treat read_lock() as recursive in selftests */
#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
extern unsigned int force_read_lock_recursive;
#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
#define force_read_lock_recursive 0
#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
#ifdef CONFIG_LOCKDEP
extern bool read_lock_is_recursive(void);
#else /* CONFIG_LOCKDEP */
/* If !LOCKDEP, the value is meaningless */
#define read_lock_is_recursive() 0
#endif
/*
* For trivial one-depth nesting of a lock-class, the following
* global define can be used. (Subsystems with multiple levels
@ -490,7 +508,14 @@ static inline void print_irqtrace_events(struct task_struct *curr)
#define spin_release(l, i) lock_release(l, i)
#define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
#define rwlock_acquire_read(l, s, t, i) lock_acquire_shared_recursive(l, s, t, NULL, i)
#define rwlock_acquire_read(l, s, t, i) \
do { \
if (read_lock_is_recursive()) \
lock_acquire_shared_recursive(l, s, t, NULL, i); \
else \
lock_acquire_shared(l, s, t, NULL, i); \
} while (0)
#define rwlock_release(l, i) lock_release(l, i)
#define seqcount_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
@ -512,19 +537,19 @@ static inline void print_irqtrace_events(struct task_struct *curr)
#define lock_map_release(l) lock_release(l, _THIS_IP_)
#ifdef CONFIG_PROVE_LOCKING
# define might_lock(lock) \
# define might_lock(lock) \
do { \
typecheck(struct lockdep_map *, &(lock)->dep_map); \
lock_acquire(&(lock)->dep_map, 0, 0, 0, 1, NULL, _THIS_IP_); \
lock_release(&(lock)->dep_map, _THIS_IP_); \
} while (0)
# define might_lock_read(lock) \
# define might_lock_read(lock) \
do { \
typecheck(struct lockdep_map *, &(lock)->dep_map); \
lock_acquire(&(lock)->dep_map, 0, 0, 1, 1, NULL, _THIS_IP_); \
lock_release(&(lock)->dep_map, _THIS_IP_); \
} while (0)
# define might_lock_nested(lock, subclass) \
# define might_lock_nested(lock, subclass) \
do { \
typecheck(struct lockdep_map *, &(lock)->dep_map); \
lock_acquire(&(lock)->dep_map, subclass, 0, 1, 1, NULL, \
@ -534,44 +559,39 @@ do { \
DECLARE_PER_CPU(int, hardirqs_enabled);
DECLARE_PER_CPU(int, hardirq_context);
DECLARE_PER_CPU(unsigned int, lockdep_recursion);
/*
* The below lockdep_assert_*() macros use raw_cpu_read() to access the above
* per-cpu variables. This is required because this_cpu_read() will potentially
* call into preempt/irq-disable and that obviously isn't right. This is also
* correct because when IRQs are enabled, it doesn't matter if we accidentally
* read the value from our previous CPU.
*/
#define __lockdep_enabled (debug_locks && !this_cpu_read(lockdep_recursion))
#define lockdep_assert_irqs_enabled() \
do { \
WARN_ON_ONCE(debug_locks && !raw_cpu_read(hardirqs_enabled)); \
WARN_ON_ONCE(__lockdep_enabled && !this_cpu_read(hardirqs_enabled)); \
} while (0)
#define lockdep_assert_irqs_disabled() \
do { \
WARN_ON_ONCE(debug_locks && raw_cpu_read(hardirqs_enabled)); \
WARN_ON_ONCE(__lockdep_enabled && this_cpu_read(hardirqs_enabled)); \
} while (0)
#define lockdep_assert_in_irq() \
do { \
WARN_ON_ONCE(debug_locks && !raw_cpu_read(hardirq_context)); \
WARN_ON_ONCE(__lockdep_enabled && !this_cpu_read(hardirq_context)); \
} while (0)
#define lockdep_assert_preemption_enabled() \
do { \
WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \
debug_locks && \
__lockdep_enabled && \
(preempt_count() != 0 || \
!raw_cpu_read(hardirqs_enabled))); \
!this_cpu_read(hardirqs_enabled))); \
} while (0)
#define lockdep_assert_preemption_disabled() \
do { \
WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT) && \
debug_locks && \
__lockdep_enabled && \
(preempt_count() == 0 && \
raw_cpu_read(hardirqs_enabled))); \
this_cpu_read(hardirqs_enabled))); \
} while (0)
#else

View File

@ -35,8 +35,12 @@ enum lockdep_wait_type {
/*
* We'd rather not expose kernel/lockdep_states.h this wide, but we do need
* the total number of states... :-(
*
* XXX_LOCK_USAGE_STATES is the number of lines in lockdep_states.h, for each
* of those we generates 4 states, Additionally we report on USED and USED_READ.
*/
#define XXX_LOCK_USAGE_STATES (1+2*4)
#define XXX_LOCK_USAGE_STATES 2
#define LOCK_TRACE_STATES (XXX_LOCK_USAGE_STATES*4 + 2)
/*
* NR_LOCKDEP_CACHING_CLASSES ... Number of classes
@ -106,7 +110,7 @@ struct lock_class {
* IRQ/softirq usage tracking bits:
*/
unsigned long usage_mask;
const struct lock_trace *usage_traces[XXX_LOCK_USAGE_STATES];
const struct lock_trace *usage_traces[LOCK_TRACE_STATES];
/*
* Generation counter, when doing certain classes of graph walking,

View File

@ -42,8 +42,8 @@ struct latch_tree_node {
};
struct latch_tree_root {
seqcount_t seq;
struct rb_root tree[2];
seqcount_latch_t seq;
struct rb_root tree[2];
};
/**
@ -206,7 +206,7 @@ latch_tree_find(void *key, struct latch_tree_root *root,
do {
seq = raw_read_seqcount_latch(&root->seq);
node = __lt_find(key, root, seq & 1, ops->comp);
} while (read_seqcount_retry(&root->seq, seq));
} while (read_seqcount_latch_retry(&root->seq, seq));
return node;
}

View File

@ -165,7 +165,7 @@ static inline unsigned int refcount_read(const refcount_t *r)
*
* Return: false if the passed refcount is 0, true otherwise
*/
static inline __must_check bool refcount_add_not_zero(int i, refcount_t *r)
static inline __must_check bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
{
int old = refcount_read(r);
@ -174,12 +174,20 @@ static inline __must_check bool refcount_add_not_zero(int i, refcount_t *r)
break;
} while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
if (oldp)
*oldp = old;
if (unlikely(old < 0 || old + i < 0))
refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF);
return old;
}
static inline __must_check bool refcount_add_not_zero(int i, refcount_t *r)
{
return __refcount_add_not_zero(i, r, NULL);
}
/**
* refcount_add - add a value to a refcount
* @i: the value to add to the refcount
@ -196,16 +204,24 @@ static inline __must_check bool refcount_add_not_zero(int i, refcount_t *r)
* cases, refcount_inc(), or one of its variants, should instead be used to
* increment a reference count.
*/
static inline void refcount_add(int i, refcount_t *r)
static inline void __refcount_add(int i, refcount_t *r, int *oldp)
{
int old = atomic_fetch_add_relaxed(i, &r->refs);
if (oldp)
*oldp = old;
if (unlikely(!old))
refcount_warn_saturate(r, REFCOUNT_ADD_UAF);
else if (unlikely(old < 0 || old + i < 0))
refcount_warn_saturate(r, REFCOUNT_ADD_OVF);
}
static inline void refcount_add(int i, refcount_t *r)
{
__refcount_add(i, r, NULL);
}
/**
* refcount_inc_not_zero - increment a refcount unless it is 0
* @r: the refcount to increment
@ -219,9 +235,14 @@ static inline void refcount_add(int i, refcount_t *r)
*
* Return: true if the increment was successful, false otherwise
*/
static inline __must_check bool __refcount_inc_not_zero(refcount_t *r, int *oldp)
{
return __refcount_add_not_zero(1, r, oldp);
}
static inline __must_check bool refcount_inc_not_zero(refcount_t *r)
{
return refcount_add_not_zero(1, r);
return __refcount_inc_not_zero(r, NULL);
}
/**
@ -236,9 +257,14 @@ static inline __must_check bool refcount_inc_not_zero(refcount_t *r)
* Will WARN if the refcount is 0, as this represents a possible use-after-free
* condition.
*/
static inline void __refcount_inc(refcount_t *r, int *oldp)
{
__refcount_add(1, r, oldp);
}
static inline void refcount_inc(refcount_t *r)
{
refcount_add(1, r);
__refcount_inc(r, NULL);
}
/**
@ -261,10 +287,13 @@ static inline void refcount_inc(refcount_t *r)
*
* Return: true if the resulting refcount is 0, false otherwise
*/
static inline __must_check bool refcount_sub_and_test(int i, refcount_t *r)
static inline __must_check bool __refcount_sub_and_test(int i, refcount_t *r, int *oldp)
{
int old = atomic_fetch_sub_release(i, &r->refs);
if (oldp)
*oldp = old;
if (old == i) {
smp_acquire__after_ctrl_dep();
return true;
@ -276,6 +305,11 @@ static inline __must_check bool refcount_sub_and_test(int i, refcount_t *r)
return false;
}
static inline __must_check bool refcount_sub_and_test(int i, refcount_t *r)
{
return __refcount_sub_and_test(i, r, NULL);
}
/**
* refcount_dec_and_test - decrement a refcount and test if it is 0
* @r: the refcount
@ -289,9 +323,14 @@ static inline __must_check bool refcount_sub_and_test(int i, refcount_t *r)
*
* Return: true if the resulting refcount is 0, false otherwise
*/
static inline __must_check bool __refcount_dec_and_test(refcount_t *r, int *oldp)
{
return __refcount_sub_and_test(1, r, oldp);
}
static inline __must_check bool refcount_dec_and_test(refcount_t *r)
{
return refcount_sub_and_test(1, r);
return __refcount_dec_and_test(r, NULL);
}
/**
@ -304,10 +343,20 @@ static inline __must_check bool refcount_dec_and_test(refcount_t *r)
* Provides release memory ordering, such that prior loads and stores are done
* before.
*/
static inline void __refcount_dec(refcount_t *r, int *oldp)
{
int old = atomic_fetch_sub_release(1, &r->refs);
if (oldp)
*oldp = old;
if (unlikely(old <= 1))
refcount_warn_saturate(r, REFCOUNT_DEC_LEAK);
}
static inline void refcount_dec(refcount_t *r)
{
if (unlikely(atomic_fetch_sub_release(1, &r->refs) <= 1))
refcount_warn_saturate(r, REFCOUNT_DEC_LEAK);
__refcount_dec(r, NULL);
}
extern __must_check bool refcount_dec_if_one(refcount_t *r);

View File

@ -17,6 +17,7 @@
#include <linux/kcsan-checks.h>
#include <linux/lockdep.h>
#include <linux/mutex.h>
#include <linux/ww_mutex.h>
#include <linux/preempt.h>
#include <linux/spinlock.h>
@ -53,7 +54,7 @@
*
* If the write serialization mechanism is one of the common kernel
* locking primitives, use a sequence counter with associated lock
* (seqcount_LOCKTYPE_t) instead.
* (seqcount_LOCKNAME_t) instead.
*
* If it's desired to automatically handle the sequence counter writer
* serialization and non-preemptibility requirements, use a sequential
@ -117,7 +118,7 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
/*
* Sequence counters with associated locks (seqcount_LOCKTYPE_t)
* Sequence counters with associated locks (seqcount_LOCKNAME_t)
*
* A sequence counter which associates the lock used for writer
* serialization at initialization time. This enables lockdep to validate
@ -131,63 +132,117 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
* See Documentation/locking/seqlock.rst
*/
#ifdef CONFIG_LOCKDEP
/*
* For PREEMPT_RT, seqcount_LOCKNAME_t write side critical sections cannot
* disable preemption. It can lead to higher latencies, and the write side
* sections will not be able to acquire locks which become sleeping locks
* (e.g. spinlock_t).
*
* To remain preemptible while avoiding a possible livelock caused by the
* reader preempting the writer, use a different technique: let the reader
* detect if a seqcount_LOCKNAME_t writer is in progress. If that is the
* case, acquire then release the associated LOCKNAME writer serialization
* lock. This will allow any possibly-preempted writer to make progress
* until the end of its writer serialization lock critical section.
*
* This lock-unlock technique must be implemented for all of PREEMPT_RT
* sleeping locks. See Documentation/locking/locktypes.rst
*/
#if defined(CONFIG_LOCKDEP) || defined(CONFIG_PREEMPT_RT)
#define __SEQ_LOCK(expr) expr
#else
#define __SEQ_LOCK(expr)
#endif
/**
* typedef seqcount_LOCKNAME_t - sequence counter with LOCKTYPR associated
* typedef seqcount_LOCKNAME_t - sequence counter with LOCKNAME associated
* @seqcount: The real sequence counter
* @lock: Pointer to the associated spinlock
* @lock: Pointer to the associated lock
*
* A plain sequence counter with external writer synchronization by a
* spinlock. The spinlock is associated to the sequence count in the
* A plain sequence counter with external writer synchronization by
* LOCKNAME @lock. The lock is associated to the sequence counter in the
* static initializer or init function. This enables lockdep to validate
* that the write side critical section is properly serialized.
*/
/**
* seqcount_LOCKNAME_init() - runtime initializer for seqcount_LOCKNAME_t
* @s: Pointer to the seqcount_LOCKNAME_t instance
* @lock: Pointer to the associated LOCKTYPE
*
* LOCKNAME: raw_spinlock, spinlock, rwlock, mutex, or ww_mutex.
*/
/*
* SEQCOUNT_LOCKTYPE() - Instantiate seqcount_LOCKNAME_t and helpers
* @locktype: actual typename
* @lockname: name
* seqcount_LOCKNAME_init() - runtime initializer for seqcount_LOCKNAME_t
* @s: Pointer to the seqcount_LOCKNAME_t instance
* @lock: Pointer to the associated lock
*/
#define seqcount_LOCKNAME_init(s, _lock, lockname) \
do { \
seqcount_##lockname##_t *____s = (s); \
seqcount_init(&____s->seqcount); \
__SEQ_LOCK(____s->lock = (_lock)); \
} while (0)
#define seqcount_raw_spinlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, raw_spinlock)
#define seqcount_spinlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, spinlock)
#define seqcount_rwlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, rwlock);
#define seqcount_mutex_init(s, lock) seqcount_LOCKNAME_init(s, lock, mutex);
#define seqcount_ww_mutex_init(s, lock) seqcount_LOCKNAME_init(s, lock, ww_mutex);
/*
* SEQCOUNT_LOCKNAME() - Instantiate seqcount_LOCKNAME_t and helpers
* seqprop_LOCKNAME_*() - Property accessors for seqcount_LOCKNAME_t
*
* @lockname: "LOCKNAME" part of seqcount_LOCKNAME_t
* @locktype: LOCKNAME canonical C data type
* @preemptible: preemptibility of above locktype
* @lockmember: argument for lockdep_assert_held()
* @lockbase: associated lock release function (prefix only)
* @lock_acquire: associated lock acquisition function (full call)
*/
#define SEQCOUNT_LOCKTYPE(locktype, lockname, preemptible, lockmember) \
#define SEQCOUNT_LOCKNAME(lockname, locktype, preemptible, lockmember, lockbase, lock_acquire) \
typedef struct seqcount_##lockname { \
seqcount_t seqcount; \
__SEQ_LOCK(locktype *lock); \
} seqcount_##lockname##_t; \
\
static __always_inline void \
seqcount_##lockname##_init(seqcount_##lockname##_t *s, locktype *lock) \
{ \
seqcount_init(&s->seqcount); \
__SEQ_LOCK(s->lock = lock); \
} \
\
static __always_inline seqcount_t * \
__seqcount_##lockname##_ptr(seqcount_##lockname##_t *s) \
__seqprop_##lockname##_ptr(seqcount_##lockname##_t *s) \
{ \
return &s->seqcount; \
} \
\
static __always_inline bool \
__seqcount_##lockname##_preemptible(seqcount_##lockname##_t *s) \
static __always_inline unsigned \
__seqprop_##lockname##_sequence(const seqcount_##lockname##_t *s) \
{ \
return preemptible; \
unsigned seq = READ_ONCE(s->seqcount.sequence); \
\
if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \
return seq; \
\
if (preemptible && unlikely(seq & 1)) { \
__SEQ_LOCK(lock_acquire); \
__SEQ_LOCK(lockbase##_unlock(s->lock)); \
\
/* \
* Re-read the sequence counter since the (possibly \
* preempted) writer made progress. \
*/ \
seq = READ_ONCE(s->seqcount.sequence); \
} \
\
return seq; \
} \
\
static __always_inline bool \
__seqprop_##lockname##_preemptible(const seqcount_##lockname##_t *s) \
{ \
if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \
return preemptible; \
\
/* PREEMPT_RT relies on the above LOCK+UNLOCK */ \
return false; \
} \
\
static __always_inline void \
__seqcount_##lockname##_assert(seqcount_##lockname##_t *s) \
__seqprop_##lockname##_assert(const seqcount_##lockname##_t *s) \
{ \
__SEQ_LOCK(lockdep_assert_held(lockmember)); \
}
@ -196,50 +251,56 @@ __seqcount_##lockname##_assert(seqcount_##lockname##_t *s) \
* __seqprop() for seqcount_t
*/
static inline seqcount_t *__seqcount_ptr(seqcount_t *s)
static inline seqcount_t *__seqprop_ptr(seqcount_t *s)
{
return s;
}
static inline bool __seqcount_preemptible(seqcount_t *s)
static inline unsigned __seqprop_sequence(const seqcount_t *s)
{
return READ_ONCE(s->sequence);
}
static inline bool __seqprop_preemptible(const seqcount_t *s)
{
return false;
}
static inline void __seqcount_assert(seqcount_t *s)
static inline void __seqprop_assert(const seqcount_t *s)
{
lockdep_assert_preemption_disabled();
}
SEQCOUNT_LOCKTYPE(raw_spinlock_t, raw_spinlock, false, s->lock)
SEQCOUNT_LOCKTYPE(spinlock_t, spinlock, false, s->lock)
SEQCOUNT_LOCKTYPE(rwlock_t, rwlock, false, s->lock)
SEQCOUNT_LOCKTYPE(struct mutex, mutex, true, s->lock)
SEQCOUNT_LOCKTYPE(struct ww_mutex, ww_mutex, true, &s->lock->base)
#define __SEQ_RT IS_ENABLED(CONFIG_PREEMPT_RT)
/**
SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t, false, s->lock, raw_spin, raw_spin_lock(s->lock))
SEQCOUNT_LOCKNAME(spinlock, spinlock_t, __SEQ_RT, s->lock, spin, spin_lock(s->lock))
SEQCOUNT_LOCKNAME(rwlock, rwlock_t, __SEQ_RT, s->lock, read, read_lock(s->lock))
SEQCOUNT_LOCKNAME(mutex, struct mutex, true, s->lock, mutex, mutex_lock(s->lock))
SEQCOUNT_LOCKNAME(ww_mutex, struct ww_mutex, true, &s->lock->base, ww_mutex, ww_mutex_lock(s->lock, NULL))
/*
* SEQCNT_LOCKNAME_ZERO - static initializer for seqcount_LOCKNAME_t
* @name: Name of the seqcount_LOCKNAME_t instance
* @lock: Pointer to the associated LOCKTYPE
* @lock: Pointer to the associated LOCKNAME
*/
#define SEQCOUNT_LOCKTYPE_ZERO(seq_name, assoc_lock) { \
#define SEQCOUNT_LOCKNAME_ZERO(seq_name, assoc_lock) { \
.seqcount = SEQCNT_ZERO(seq_name.seqcount), \
__SEQ_LOCK(.lock = (assoc_lock)) \
}
#define SEQCNT_SPINLOCK_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock)
#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock)
#define SEQCNT_RWLOCK_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock)
#define SEQCNT_MUTEX_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock)
#define SEQCNT_WW_MUTEX_ZERO(name, lock) SEQCOUNT_LOCKTYPE_ZERO(name, lock)
#define SEQCNT_RAW_SPINLOCK_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock)
#define SEQCNT_SPINLOCK_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock)
#define SEQCNT_RWLOCK_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock)
#define SEQCNT_MUTEX_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock)
#define SEQCNT_WW_MUTEX_ZERO(name, lock) SEQCOUNT_LOCKNAME_ZERO(name, lock)
#define __seqprop_case(s, lockname, prop) \
seqcount_##lockname##_t: __seqcount_##lockname##_##prop((void *)(s))
seqcount_##lockname##_t: __seqprop_##lockname##_##prop((void *)(s))
#define __seqprop(s, prop) _Generic(*(s), \
seqcount_t: __seqcount_##prop((void *)(s)), \
seqcount_t: __seqprop_##prop((void *)(s)), \
__seqprop_case((s), raw_spinlock, prop), \
__seqprop_case((s), spinlock, prop), \
__seqprop_case((s), rwlock, prop), \
@ -247,12 +308,13 @@ SEQCOUNT_LOCKTYPE(struct ww_mutex, ww_mutex, true, &s->lock->base)
__seqprop_case((s), ww_mutex, prop))
#define __seqcount_ptr(s) __seqprop(s, ptr)
#define __seqcount_sequence(s) __seqprop(s, sequence)
#define __seqcount_lock_preemptible(s) __seqprop(s, preemptible)
#define __seqcount_assert_lock_held(s) __seqprop(s, assert)
/**
* __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
* barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@ -265,56 +327,45 @@ SEQCOUNT_LOCKTYPE(struct ww_mutex, ww_mutex, true, &s->lock->base)
* Return: count to be passed to read_seqcount_retry()
*/
#define __read_seqcount_begin(s) \
__read_seqcount_t_begin(__seqcount_ptr(s))
static inline unsigned __read_seqcount_t_begin(const seqcount_t *s)
{
unsigned ret;
repeat:
ret = READ_ONCE(s->sequence);
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;
}
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
return ret;
}
({ \
unsigned seq; \
\
while ((seq = __seqcount_sequence(s)) & 1) \
cpu_relax(); \
\
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \
seq; \
})
/**
* raw_read_seqcount_begin() - begin a seqcount_t read section w/o lockdep
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* Return: count to be passed to read_seqcount_retry()
*/
#define raw_read_seqcount_begin(s) \
raw_read_seqcount_t_begin(__seqcount_ptr(s))
static inline unsigned raw_read_seqcount_t_begin(const seqcount_t *s)
{
unsigned ret = __read_seqcount_t_begin(s);
smp_rmb();
return ret;
}
({ \
unsigned seq = __read_seqcount_begin(s); \
\
smp_rmb(); \
seq; \
})
/**
* read_seqcount_begin() - begin a seqcount_t read critical section
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* Return: count to be passed to read_seqcount_retry()
*/
#define read_seqcount_begin(s) \
read_seqcount_t_begin(__seqcount_ptr(s))
static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
{
seqcount_lockdep_reader_access(s);
return raw_read_seqcount_t_begin(s);
}
({ \
seqcount_lockdep_reader_access(__seqcount_ptr(s)); \
raw_read_seqcount_begin(s); \
})
/**
* raw_read_seqcount() - read the raw seqcount_t counter value
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* raw_read_seqcount opens a read critical section of the given
* seqcount_t, without any lockdep checking, and without checking or
@ -324,20 +375,18 @@ static inline unsigned read_seqcount_t_begin(const seqcount_t *s)
* Return: count to be passed to read_seqcount_retry()
*/
#define raw_read_seqcount(s) \
raw_read_seqcount_t(__seqcount_ptr(s))
static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
{
unsigned ret = READ_ONCE(s->sequence);
smp_rmb();
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
return ret;
}
({ \
unsigned seq = __seqcount_sequence(s); \
\
smp_rmb(); \
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX); \
seq; \
})
/**
* raw_seqcount_begin() - begin a seqcount_t read critical section w/o
* lockdep and w/o counter stabilization
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* raw_seqcount_begin opens a read critical section of the given
* seqcount_t. Unlike read_seqcount_begin(), this function will not wait
@ -352,20 +401,17 @@ static inline unsigned raw_read_seqcount_t(const seqcount_t *s)
* Return: count to be passed to read_seqcount_retry()
*/
#define raw_seqcount_begin(s) \
raw_seqcount_t_begin(__seqcount_ptr(s))
static inline unsigned raw_seqcount_t_begin(const seqcount_t *s)
{
/*
* If the counter is odd, let read_seqcount_retry() fail
* by decrementing the counter.
*/
return raw_read_seqcount_t(s) & ~1;
}
({ \
/* \
* If the counter is odd, let read_seqcount_retry() fail \
* by decrementing the counter. \
*/ \
raw_read_seqcount(s) & ~1; \
})
/**
* __read_seqcount_retry() - end a seqcount_t read section w/o barrier
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
* @start: count, from read_seqcount_begin()
*
* __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
@ -389,7 +435,7 @@ static inline int __read_seqcount_t_retry(const seqcount_t *s, unsigned start)
/**
* read_seqcount_retry() - end a seqcount_t read critical section
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
* @start: count, from read_seqcount_begin()
*
* read_seqcount_retry closes the read critical section of given
@ -409,7 +455,7 @@ static inline int read_seqcount_t_retry(const seqcount_t *s, unsigned start)
/**
* raw_write_seqcount_begin() - start a seqcount_t write section w/o lockdep
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*/
#define raw_write_seqcount_begin(s) \
do { \
@ -428,7 +474,7 @@ static inline void raw_write_seqcount_t_begin(seqcount_t *s)
/**
* raw_write_seqcount_end() - end a seqcount_t write section w/o lockdep
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*/
#define raw_write_seqcount_end(s) \
do { \
@ -448,7 +494,7 @@ static inline void raw_write_seqcount_t_end(seqcount_t *s)
/**
* write_seqcount_begin_nested() - start a seqcount_t write section with
* custom lockdep nesting level
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
* @subclass: lockdep nesting level
*
* See Documentation/locking/lockdep-design.rst
@ -471,7 +517,7 @@ static inline void write_seqcount_t_begin_nested(seqcount_t *s, int subclass)
/**
* write_seqcount_begin() - start a seqcount_t write side critical section
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* write_seqcount_begin opens a write side critical section of the given
* seqcount_t.
@ -497,7 +543,7 @@ static inline void write_seqcount_t_begin(seqcount_t *s)
/**
* write_seqcount_end() - end a seqcount_t write side critical section
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* The write section must've been opened with write_seqcount_begin().
*/
@ -517,7 +563,7 @@ static inline void write_seqcount_t_end(seqcount_t *s)
/**
* raw_write_seqcount_barrier() - do a seqcount_t write barrier
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* This can be used to provide an ordering guarantee instead of the usual
* consistency guarantee. It is one wmb cheaper, because it can collapse
@ -571,7 +617,7 @@ static inline void raw_write_seqcount_t_barrier(seqcount_t *s)
/**
* write_seqcount_invalidate() - invalidate in-progress seqcount_t read
* side operations
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* After write_seqcount_invalidate, no seqcount_t read side operations
* will complete successfully and see data older than this.
@ -587,34 +633,73 @@ static inline void write_seqcount_t_invalidate(seqcount_t *s)
kcsan_nestable_atomic_end();
}
/**
* raw_read_seqcount_latch() - pick even/odd seqcount_t latch data copy
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
/*
* Latch sequence counters (seqcount_latch_t)
*
* Use seqcount_t latching to switch between two storage places protected
* by a sequence counter. Doing so allows having interruptible, preemptible,
* seqcount_t write side critical sections.
* A sequence counter variant where the counter even/odd value is used to
* switch between two copies of protected data. This allows the read path,
* typically NMIs, to safely interrupt the write side critical section.
*
* Check raw_write_seqcount_latch() for more details and a full reader and
* writer usage example.
*
* Return: sequence counter raw value. Use the lowest bit as an index for
* picking which data copy to read. The full counter value must then be
* checked with read_seqcount_retry().
* As the write sections are fully preemptible, no special handling for
* PREEMPT_RT is needed.
*/
#define raw_read_seqcount_latch(s) \
raw_read_seqcount_t_latch(__seqcount_ptr(s))
typedef struct {
seqcount_t seqcount;
} seqcount_latch_t;
static inline int raw_read_seqcount_t_latch(seqcount_t *s)
{
/* Pairs with the first smp_wmb() in raw_write_seqcount_latch() */
int seq = READ_ONCE(s->sequence); /* ^^^ */
return seq;
/**
* SEQCNT_LATCH_ZERO() - static initializer for seqcount_latch_t
* @seq_name: Name of the seqcount_latch_t instance
*/
#define SEQCNT_LATCH_ZERO(seq_name) { \
.seqcount = SEQCNT_ZERO(seq_name.seqcount), \
}
/**
* raw_write_seqcount_latch() - redirect readers to even/odd copy
* @s: Pointer to seqcount_t or any of the seqcount_locktype_t variants
* seqcount_latch_init() - runtime initializer for seqcount_latch_t
* @s: Pointer to the seqcount_latch_t instance
*/
static inline void seqcount_latch_init(seqcount_latch_t *s)
{
seqcount_init(&s->seqcount);
}
/**
* raw_read_seqcount_latch() - pick even/odd latch data copy
* @s: Pointer to seqcount_latch_t
*
* See raw_write_seqcount_latch() for details and a full reader/writer
* usage example.
*
* Return: sequence counter raw value. Use the lowest bit as an index for
* picking which data copy to read. The full counter must then be checked
* with read_seqcount_latch_retry().
*/
static inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
{
/*
* Pairs with the first smp_wmb() in raw_write_seqcount_latch().
* Due to the dependent load, a full smp_rmb() is not needed.
*/
return READ_ONCE(s->seqcount.sequence);
}
/**
* read_seqcount_latch_retry() - end a seqcount_latch_t read section
* @s: Pointer to seqcount_latch_t
* @start: count, from raw_read_seqcount_latch()
*
* Return: true if a read section retry is required, else false
*/
static inline int
read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
{
return read_seqcount_retry(&s->seqcount, start);
}
/**
* raw_write_seqcount_latch() - redirect latch readers to even/odd copy
* @s: Pointer to seqcount_latch_t
*
* The latch technique is a multiversion concurrency control method that allows
* queries during non-atomic modifications. If you can guarantee queries never
@ -633,7 +718,7 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
* The basic form is a data structure like::
*
* struct latch_struct {
* seqcount_t seq;
* seqcount_latch_t seq;
* struct data_struct data[2];
* };
*
@ -643,13 +728,13 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
* void latch_modify(struct latch_struct *latch, ...)
* {
* smp_wmb(); // Ensure that the last data[1] update is visible
* latch->seq++;
* latch->seq.sequence++;
* smp_wmb(); // Ensure that the seqcount update is visible
*
* modify(latch->data[0], ...);
*
* smp_wmb(); // Ensure that the data[0] update is visible
* latch->seq++;
* latch->seq.sequence++;
* smp_wmb(); // Ensure that the seqcount update is visible
*
* modify(latch->data[1], ...);
@ -668,8 +753,8 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
* idx = seq & 0x01;
* entry = data_query(latch->data[idx], ...);
*
* // read_seqcount_retry() includes needed smp_rmb()
* } while (read_seqcount_retry(&latch->seq, seq));
* // This includes needed smp_rmb()
* } while (read_seqcount_latch_retry(&latch->seq, seq));
*
* return entry;
* }
@ -688,19 +773,16 @@ static inline int raw_read_seqcount_t_latch(seqcount_t *s)
* to miss an entire modification sequence, once it resumes it might
* observe the new entry.
*
* NOTE:
* NOTE2:
*
* When data is a dynamic data structure; one should use regular RCU
* patterns to manage the lifetimes of the objects within.
*/
#define raw_write_seqcount_latch(s) \
raw_write_seqcount_t_latch(__seqcount_ptr(s))
static inline void raw_write_seqcount_t_latch(seqcount_t *s)
static inline void raw_write_seqcount_latch(seqcount_latch_t *s)
{
smp_wmb(); /* prior stores before incrementing "sequence" */
s->sequence++;
smp_wmb(); /* increment "sequence" before following stores */
smp_wmb(); /* prior stores before incrementing "sequence" */
s->seqcount.sequence++;
smp_wmb(); /* increment "sequence" before following stores */
}
/*
@ -714,13 +796,17 @@ static inline void raw_write_seqcount_t_latch(seqcount_t *s)
* - Documentation/locking/seqlock.rst
*/
typedef struct {
struct seqcount seqcount;
/*
* Make sure that readers don't starve writers on PREEMPT_RT: use
* seqcount_spinlock_t instead of seqcount_t. Check __SEQ_LOCK().
*/
seqcount_spinlock_t seqcount;
spinlock_t lock;
} seqlock_t;
#define __SEQLOCK_UNLOCKED(lockname) \
{ \
.seqcount = SEQCNT_ZERO(lockname), \
.seqcount = SEQCNT_SPINLOCK_ZERO(lockname, &(lockname).lock), \
.lock = __SPIN_LOCK_UNLOCKED(lockname) \
}
@ -730,12 +816,12 @@ typedef struct {
*/
#define seqlock_init(sl) \
do { \
seqcount_init(&(sl)->seqcount); \
spin_lock_init(&(sl)->lock); \
seqcount_spinlock_init(&(sl)->seqcount, &(sl)->lock); \
} while (0)
/**
* DEFINE_SEQLOCK() - Define a statically allocated seqlock_t
* DEFINE_SEQLOCK(sl) - Define a statically allocated seqlock_t
* @sl: Name of the seqlock_t instance
*/
#define DEFINE_SEQLOCK(sl) \
@ -778,6 +864,12 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
return read_seqcount_retry(&sl->seqcount, start);
}
/*
* For all seqlock_t write side functions, use write_seqcount_*t*_begin()
* instead of the generic write_seqcount_begin(). This way, no redundant
* lockdep_assert_held() checks are added.
*/
/**
* write_seqlock() - start a seqlock_t write side critical section
* @sl: Pointer to seqlock_t
@ -794,7 +886,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
static inline void write_seqlock(seqlock_t *sl)
{
spin_lock(&sl->lock);
write_seqcount_t_begin(&sl->seqcount);
write_seqcount_t_begin(&sl->seqcount.seqcount);
}
/**
@ -806,7 +898,7 @@ static inline void write_seqlock(seqlock_t *sl)
*/
static inline void write_sequnlock(seqlock_t *sl)
{
write_seqcount_t_end(&sl->seqcount);
write_seqcount_t_end(&sl->seqcount.seqcount);
spin_unlock(&sl->lock);
}
@ -820,7 +912,7 @@ static inline void write_sequnlock(seqlock_t *sl)
static inline void write_seqlock_bh(seqlock_t *sl)
{
spin_lock_bh(&sl->lock);
write_seqcount_t_begin(&sl->seqcount);
write_seqcount_t_begin(&sl->seqcount.seqcount);
}
/**
@ -833,7 +925,7 @@ static inline void write_seqlock_bh(seqlock_t *sl)
*/
static inline void write_sequnlock_bh(seqlock_t *sl)
{
write_seqcount_t_end(&sl->seqcount);
write_seqcount_t_end(&sl->seqcount.seqcount);
spin_unlock_bh(&sl->lock);
}
@ -847,7 +939,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
static inline void write_seqlock_irq(seqlock_t *sl)
{
spin_lock_irq(&sl->lock);
write_seqcount_t_begin(&sl->seqcount);
write_seqcount_t_begin(&sl->seqcount.seqcount);
}
/**
@ -859,7 +951,7 @@ static inline void write_seqlock_irq(seqlock_t *sl)
*/
static inline void write_sequnlock_irq(seqlock_t *sl)
{
write_seqcount_t_end(&sl->seqcount);
write_seqcount_t_end(&sl->seqcount.seqcount);
spin_unlock_irq(&sl->lock);
}
@ -868,7 +960,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
unsigned long flags;
spin_lock_irqsave(&sl->lock, flags);
write_seqcount_t_begin(&sl->seqcount);
write_seqcount_t_begin(&sl->seqcount.seqcount);
return flags;
}
@ -897,7 +989,7 @@ static inline unsigned long __write_seqlock_irqsave(seqlock_t *sl)
static inline void
write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
{
write_seqcount_t_end(&sl->seqcount);
write_seqcount_t_end(&sl->seqcount.seqcount);
spin_unlock_irqrestore(&sl->lock, flags);
}

View File

@ -1,5 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#define pr_fmt(fmt) "kcsan: " fmt
#include <linux/atomic.h>
#include <linux/bug.h>
#include <linux/delay.h>
@ -98,6 +100,9 @@ static atomic_long_t watchpoints[CONFIG_KCSAN_NUM_WATCHPOINTS + NUM_SLOTS-1];
*/
static DEFINE_PER_CPU(long, kcsan_skip);
/* For kcsan_prandom_u32_max(). */
static DEFINE_PER_CPU(struct rnd_state, kcsan_rand_state);
static __always_inline atomic_long_t *find_watchpoint(unsigned long addr,
size_t size,
bool expect_write,
@ -223,7 +228,7 @@ is_atomic(const volatile void *ptr, size_t size, int type, struct kcsan_ctx *ctx
if (IS_ENABLED(CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC) &&
(type & KCSAN_ACCESS_WRITE) && size <= sizeof(long) &&
IS_ALIGNED((unsigned long)ptr, size))
!(type & KCSAN_ACCESS_COMPOUND) && IS_ALIGNED((unsigned long)ptr, size))
return true; /* Assume aligned writes up to word size are atomic. */
if (ctx->atomic_next > 0) {
@ -269,11 +274,28 @@ should_watch(const volatile void *ptr, size_t size, int type, struct kcsan_ctx *
return true;
}
/*
* Returns a pseudo-random number in interval [0, ep_ro). See prandom_u32_max()
* for more details.
*
* The open-coded version here is using only safe primitives for all contexts
* where we can have KCSAN instrumentation. In particular, we cannot use
* prandom_u32() directly, as its tracepoint could cause recursion.
*/
static u32 kcsan_prandom_u32_max(u32 ep_ro)
{
struct rnd_state *state = &get_cpu_var(kcsan_rand_state);
const u32 res = prandom_u32_state(state);
put_cpu_var(kcsan_rand_state);
return (u32)(((u64) res * ep_ro) >> 32);
}
static inline void reset_kcsan_skip(void)
{
long skip_count = kcsan_skip_watch -
(IS_ENABLED(CONFIG_KCSAN_SKIP_WATCH_RANDOMIZE) ?
prandom_u32_max(kcsan_skip_watch) :
kcsan_prandom_u32_max(kcsan_skip_watch) :
0);
this_cpu_write(kcsan_skip, skip_count);
}
@ -283,12 +305,18 @@ static __always_inline bool kcsan_is_enabled(void)
return READ_ONCE(kcsan_enabled) && get_ctx()->disable_count == 0;
}
static inline unsigned int get_delay(void)
/* Introduce delay depending on context and configuration. */
static void delay_access(int type)
{
unsigned int delay = in_task() ? kcsan_udelay_task : kcsan_udelay_interrupt;
return delay - (IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
prandom_u32_max(delay) :
0);
/* For certain access types, skew the random delay to be longer. */
unsigned int skew_delay_order =
(type & (KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_ASSERT)) ? 1 : 0;
delay -= IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
kcsan_prandom_u32_max(delay >> skew_delay_order) :
0;
udelay(delay);
}
void kcsan_save_irqtrace(struct task_struct *task)
@ -361,13 +389,13 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
* already removed the watchpoint, or another thread consumed
* the watchpoint before this thread.
*/
kcsan_counter_inc(KCSAN_COUNTER_REPORT_RACES);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_REPORT_RACES]);
}
if ((type & KCSAN_ACCESS_ASSERT) != 0)
kcsan_counter_inc(KCSAN_COUNTER_ASSERT_FAILURES);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_ASSERT_FAILURES]);
else
kcsan_counter_inc(KCSAN_COUNTER_DATA_RACES);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_DATA_RACES]);
user_access_restore(flags);
}
@ -408,7 +436,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
goto out;
if (!check_encodable((unsigned long)ptr, size)) {
kcsan_counter_inc(KCSAN_COUNTER_UNENCODABLE_ACCESSES);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_UNENCODABLE_ACCESSES]);
goto out;
}
@ -428,12 +456,12 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
* with which should_watch() returns true should be tweaked so
* that this case happens very rarely.
*/
kcsan_counter_inc(KCSAN_COUNTER_NO_CAPACITY);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_NO_CAPACITY]);
goto out_unlock;
}
kcsan_counter_inc(KCSAN_COUNTER_SETUP_WATCHPOINTS);
kcsan_counter_inc(KCSAN_COUNTER_USED_WATCHPOINTS);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_SETUP_WATCHPOINTS]);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_USED_WATCHPOINTS]);
/*
* Read the current value, to later check and infer a race if the data
@ -459,7 +487,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) {
kcsan_disable_current();
pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
pr_err("watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
is_write ? "write" : "read", size, ptr,
watchpoint_slot((unsigned long)ptr),
encode_watchpoint((unsigned long)ptr, size, is_write));
@ -470,7 +498,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
* Delay this thread, to increase probability of observing a racy
* conflicting access.
*/
udelay(get_delay());
delay_access(type);
/*
* Re-read value, and check if it is as expected; if not, we infer a
@ -535,16 +563,16 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
* increment this counter.
*/
if (is_assert && value_change == KCSAN_VALUE_CHANGE_TRUE)
kcsan_counter_inc(KCSAN_COUNTER_ASSERT_FAILURES);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_ASSERT_FAILURES]);
kcsan_report(ptr, size, type, value_change, KCSAN_REPORT_RACE_SIGNAL,
watchpoint - watchpoints);
} else if (value_change == KCSAN_VALUE_CHANGE_TRUE) {
/* Inferring a race, since the value should not have changed. */
kcsan_counter_inc(KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN]);
if (is_assert)
kcsan_counter_inc(KCSAN_COUNTER_ASSERT_FAILURES);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_ASSERT_FAILURES]);
if (IS_ENABLED(CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN) || is_assert)
kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_TRUE,
@ -557,7 +585,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
* reused after this point.
*/
remove_watchpoint(watchpoint);
kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS);
atomic_long_dec(&kcsan_counters[KCSAN_COUNTER_USED_WATCHPOINTS]);
out_unlock:
if (!kcsan_interrupt_watcher)
local_irq_restore(irq_flags);
@ -614,13 +642,16 @@ void __init kcsan_init(void)
BUG_ON(!in_task());
kcsan_debugfs_init();
prandom_seed_full_state(&kcsan_rand_state);
/*
* We are in the init task, and no other tasks should be running;
* WRITE_ONCE without memory barrier is sufficient.
*/
if (kcsan_early_enable)
if (kcsan_early_enable) {
pr_info("enabled early\n");
WRITE_ONCE(kcsan_enabled, true);
}
}
/* === Exported interface =================================================== */
@ -793,7 +824,17 @@ EXPORT_SYMBOL(__kcsan_check_access);
EXPORT_SYMBOL(__tsan_write##size); \
void __tsan_unaligned_write##size(void *ptr) \
__alias(__tsan_write##size); \
EXPORT_SYMBOL(__tsan_unaligned_write##size)
EXPORT_SYMBOL(__tsan_unaligned_write##size); \
void __tsan_read_write##size(void *ptr); \
void __tsan_read_write##size(void *ptr) \
{ \
check_access(ptr, size, \
KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE); \
} \
EXPORT_SYMBOL(__tsan_read_write##size); \
void __tsan_unaligned_read_write##size(void *ptr) \
__alias(__tsan_read_write##size); \
EXPORT_SYMBOL(__tsan_unaligned_read_write##size)
DEFINE_TSAN_READ_WRITE(1);
DEFINE_TSAN_READ_WRITE(2);
@ -879,3 +920,130 @@ void __tsan_init(void)
{
}
EXPORT_SYMBOL(__tsan_init);
/*
* Instrumentation for atomic builtins (__atomic_*, __sync_*).
*
* Normal kernel code _should not_ be using them directly, but some
* architectures may implement some or all atomics using the compilers'
* builtins.
*
* Note: If an architecture decides to fully implement atomics using the
* builtins, because they are implicitly instrumented by KCSAN (and KASAN,
* etc.), implementing the ARCH_ATOMIC interface (to get instrumentation via
* atomic-instrumented) is no longer necessary.
*
* TSAN instrumentation replaces atomic accesses with calls to any of the below
* functions, whose job is to also execute the operation itself.
*/
#define DEFINE_TSAN_ATOMIC_LOAD_STORE(bits) \
u##bits __tsan_atomic##bits##_load(const u##bits *ptr, int memorder); \
u##bits __tsan_atomic##bits##_load(const u##bits *ptr, int memorder) \
{ \
if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) { \
check_access(ptr, bits / BITS_PER_BYTE, KCSAN_ACCESS_ATOMIC); \
} \
return __atomic_load_n(ptr, memorder); \
} \
EXPORT_SYMBOL(__tsan_atomic##bits##_load); \
void __tsan_atomic##bits##_store(u##bits *ptr, u##bits v, int memorder); \
void __tsan_atomic##bits##_store(u##bits *ptr, u##bits v, int memorder) \
{ \
if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) { \
check_access(ptr, bits / BITS_PER_BYTE, \
KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC); \
} \
__atomic_store_n(ptr, v, memorder); \
} \
EXPORT_SYMBOL(__tsan_atomic##bits##_store)
#define DEFINE_TSAN_ATOMIC_RMW(op, bits, suffix) \
u##bits __tsan_atomic##bits##_##op(u##bits *ptr, u##bits v, int memorder); \
u##bits __tsan_atomic##bits##_##op(u##bits *ptr, u##bits v, int memorder) \
{ \
if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) { \
check_access(ptr, bits / BITS_PER_BYTE, \
KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE | \
KCSAN_ACCESS_ATOMIC); \
} \
return __atomic_##op##suffix(ptr, v, memorder); \
} \
EXPORT_SYMBOL(__tsan_atomic##bits##_##op)
/*
* Note: CAS operations are always classified as write, even in case they
* fail. We cannot perform check_access() after a write, as it might lead to
* false positives, in cases such as:
*
* T0: __atomic_compare_exchange_n(&p->flag, &old, 1, ...)
*
* T1: if (__atomic_load_n(&p->flag, ...)) {
* modify *p;
* p->flag = 0;
* }
*
* The only downside is that, if there are 3 threads, with one CAS that
* succeeds, another CAS that fails, and an unmarked racing operation, we may
* point at the wrong CAS as the source of the race. However, if we assume that
* all CAS can succeed in some other execution, the data race is still valid.
*/
#define DEFINE_TSAN_ATOMIC_CMPXCHG(bits, strength, weak) \
int __tsan_atomic##bits##_compare_exchange_##strength(u##bits *ptr, u##bits *exp, \
u##bits val, int mo, int fail_mo); \
int __tsan_atomic##bits##_compare_exchange_##strength(u##bits *ptr, u##bits *exp, \
u##bits val, int mo, int fail_mo) \
{ \
if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) { \
check_access(ptr, bits / BITS_PER_BYTE, \
KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE | \
KCSAN_ACCESS_ATOMIC); \
} \
return __atomic_compare_exchange_n(ptr, exp, val, weak, mo, fail_mo); \
} \
EXPORT_SYMBOL(__tsan_atomic##bits##_compare_exchange_##strength)
#define DEFINE_TSAN_ATOMIC_CMPXCHG_VAL(bits) \
u##bits __tsan_atomic##bits##_compare_exchange_val(u##bits *ptr, u##bits exp, u##bits val, \
int mo, int fail_mo); \
u##bits __tsan_atomic##bits##_compare_exchange_val(u##bits *ptr, u##bits exp, u##bits val, \
int mo, int fail_mo) \
{ \
if (!IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS)) { \
check_access(ptr, bits / BITS_PER_BYTE, \
KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE | \
KCSAN_ACCESS_ATOMIC); \
} \
__atomic_compare_exchange_n(ptr, &exp, val, 0, mo, fail_mo); \
return exp; \
} \
EXPORT_SYMBOL(__tsan_atomic##bits##_compare_exchange_val)
#define DEFINE_TSAN_ATOMIC_OPS(bits) \
DEFINE_TSAN_ATOMIC_LOAD_STORE(bits); \
DEFINE_TSAN_ATOMIC_RMW(exchange, bits, _n); \
DEFINE_TSAN_ATOMIC_RMW(fetch_add, bits, ); \
DEFINE_TSAN_ATOMIC_RMW(fetch_sub, bits, ); \
DEFINE_TSAN_ATOMIC_RMW(fetch_and, bits, ); \
DEFINE_TSAN_ATOMIC_RMW(fetch_or, bits, ); \
DEFINE_TSAN_ATOMIC_RMW(fetch_xor, bits, ); \
DEFINE_TSAN_ATOMIC_RMW(fetch_nand, bits, ); \
DEFINE_TSAN_ATOMIC_CMPXCHG(bits, strong, 0); \
DEFINE_TSAN_ATOMIC_CMPXCHG(bits, weak, 1); \
DEFINE_TSAN_ATOMIC_CMPXCHG_VAL(bits)
DEFINE_TSAN_ATOMIC_OPS(8);
DEFINE_TSAN_ATOMIC_OPS(16);
DEFINE_TSAN_ATOMIC_OPS(32);
DEFINE_TSAN_ATOMIC_OPS(64);
void __tsan_atomic_thread_fence(int memorder);
void __tsan_atomic_thread_fence(int memorder)
{
__atomic_thread_fence(memorder);
}
EXPORT_SYMBOL(__tsan_atomic_thread_fence);
void __tsan_atomic_signal_fence(int memorder);
void __tsan_atomic_signal_fence(int memorder) { }
EXPORT_SYMBOL(__tsan_atomic_signal_fence);

View File

@ -1,5 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#define pr_fmt(fmt) "kcsan: " fmt
#include <linux/atomic.h>
#include <linux/bsearch.h>
#include <linux/bug.h>
@ -15,10 +17,19 @@
#include "kcsan.h"
/*
* Statistics counters.
*/
static atomic_long_t counters[KCSAN_COUNTER_COUNT];
atomic_long_t kcsan_counters[KCSAN_COUNTER_COUNT];
static const char *const counter_names[] = {
[KCSAN_COUNTER_USED_WATCHPOINTS] = "used_watchpoints",
[KCSAN_COUNTER_SETUP_WATCHPOINTS] = "setup_watchpoints",
[KCSAN_COUNTER_DATA_RACES] = "data_races",
[KCSAN_COUNTER_ASSERT_FAILURES] = "assert_failures",
[KCSAN_COUNTER_NO_CAPACITY] = "no_capacity",
[KCSAN_COUNTER_REPORT_RACES] = "report_races",
[KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN] = "races_unknown_origin",
[KCSAN_COUNTER_UNENCODABLE_ACCESSES] = "unencodable_accesses",
[KCSAN_COUNTER_ENCODING_FALSE_POSITIVES] = "encoding_false_positives",
};
static_assert(ARRAY_SIZE(counter_names) == KCSAN_COUNTER_COUNT);
/*
* Addresses for filtering functions from reporting. This list can be used as a
@ -39,34 +50,6 @@ static struct {
};
static DEFINE_SPINLOCK(report_filterlist_lock);
static const char *counter_to_name(enum kcsan_counter_id id)
{
switch (id) {
case KCSAN_COUNTER_USED_WATCHPOINTS: return "used_watchpoints";
case KCSAN_COUNTER_SETUP_WATCHPOINTS: return "setup_watchpoints";
case KCSAN_COUNTER_DATA_RACES: return "data_races";
case KCSAN_COUNTER_ASSERT_FAILURES: return "assert_failures";
case KCSAN_COUNTER_NO_CAPACITY: return "no_capacity";
case KCSAN_COUNTER_REPORT_RACES: return "report_races";
case KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN: return "races_unknown_origin";
case KCSAN_COUNTER_UNENCODABLE_ACCESSES: return "unencodable_accesses";
case KCSAN_COUNTER_ENCODING_FALSE_POSITIVES: return "encoding_false_positives";
case KCSAN_COUNTER_COUNT:
BUG();
}
return NULL;
}
void kcsan_counter_inc(enum kcsan_counter_id id)
{
atomic_long_inc(&counters[id]);
}
void kcsan_counter_dec(enum kcsan_counter_id id)
{
atomic_long_dec(&counters[id]);
}
/*
* The microbenchmark allows benchmarking KCSAN core runtime only. To run
* multiple threads, pipe 'microbench=<iters>' from multiple tasks into the
@ -86,7 +69,7 @@ static noinline void microbenchmark(unsigned long iters)
*/
WRITE_ONCE(kcsan_enabled, false);
pr_info("KCSAN: %s begin | iters: %lu\n", __func__, iters);
pr_info("%s begin | iters: %lu\n", __func__, iters);
cycles = get_cycles();
while (iters--) {
@ -97,73 +80,13 @@ static noinline void microbenchmark(unsigned long iters)
}
cycles = get_cycles() - cycles;
pr_info("KCSAN: %s end | cycles: %llu\n", __func__, cycles);
pr_info("%s end | cycles: %llu\n", __func__, cycles);
WRITE_ONCE(kcsan_enabled, was_enabled);
/* restore context */
current->kcsan_ctx = ctx_save;
}
/*
* Simple test to create conflicting accesses. Write 'test=<iters>' to KCSAN's
* debugfs file from multiple tasks to generate real conflicts and show reports.
*/
static long test_dummy;
static long test_flags;
static long test_scoped;
static noinline void test_thread(unsigned long iters)
{
const long CHANGE_BITS = 0xff00ff00ff00ff00L;
const struct kcsan_ctx ctx_save = current->kcsan_ctx;
cycles_t cycles;
/* We may have been called from an atomic region; reset context. */
memset(&current->kcsan_ctx, 0, sizeof(current->kcsan_ctx));
pr_info("KCSAN: %s begin | iters: %lu\n", __func__, iters);
pr_info("test_dummy@%px, test_flags@%px, test_scoped@%px,\n",
&test_dummy, &test_flags, &test_scoped);
cycles = get_cycles();
while (iters--) {
/* These all should generate reports. */
__kcsan_check_read(&test_dummy, sizeof(test_dummy));
ASSERT_EXCLUSIVE_WRITER(test_dummy);
ASSERT_EXCLUSIVE_ACCESS(test_dummy);
ASSERT_EXCLUSIVE_BITS(test_flags, ~CHANGE_BITS); /* no report */
__kcsan_check_read(&test_flags, sizeof(test_flags)); /* no report */
ASSERT_EXCLUSIVE_BITS(test_flags, CHANGE_BITS); /* report */
__kcsan_check_read(&test_flags, sizeof(test_flags)); /* no report */
/* not actually instrumented */
WRITE_ONCE(test_dummy, iters); /* to observe value-change */
__kcsan_check_write(&test_dummy, sizeof(test_dummy));
test_flags ^= CHANGE_BITS; /* generate value-change */
__kcsan_check_write(&test_flags, sizeof(test_flags));
BUG_ON(current->kcsan_ctx.scoped_accesses.prev);
{
/* Should generate reports anywhere in this block. */
ASSERT_EXCLUSIVE_WRITER_SCOPED(test_scoped);
ASSERT_EXCLUSIVE_ACCESS_SCOPED(test_scoped);
BUG_ON(!current->kcsan_ctx.scoped_accesses.prev);
/* Unrelated accesses. */
__kcsan_check_access(&cycles, sizeof(cycles), 0);
__kcsan_check_access(&cycles, sizeof(cycles), KCSAN_ACCESS_ATOMIC);
}
BUG_ON(current->kcsan_ctx.scoped_accesses.prev);
}
cycles = get_cycles() - cycles;
pr_info("KCSAN: %s end | cycles: %llu\n", __func__, cycles);
/* restore context */
current->kcsan_ctx = ctx_save;
}
static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
{
const unsigned long a = *(const unsigned long *)rhs;
@ -220,7 +143,7 @@ static ssize_t insert_report_filterlist(const char *func)
ssize_t ret = 0;
if (!addr) {
pr_err("KCSAN: could not find function: '%s'\n", func);
pr_err("could not find function: '%s'\n", func);
return -ENOENT;
}
@ -270,9 +193,10 @@ static int show_info(struct seq_file *file, void *v)
/* show stats */
seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
for (i = 0; i < KCSAN_COUNTER_COUNT; ++i)
seq_printf(file, "%s: %ld\n", counter_to_name(i),
atomic_long_read(&counters[i]));
for (i = 0; i < KCSAN_COUNTER_COUNT; ++i) {
seq_printf(file, "%s: %ld\n", counter_names[i],
atomic_long_read(&kcsan_counters[i]));
}
/* show filter functions, and filter type */
spin_lock_irqsave(&report_filterlist_lock, flags);
@ -307,18 +231,12 @@ debugfs_write(struct file *file, const char __user *buf, size_t count, loff_t *o
WRITE_ONCE(kcsan_enabled, true);
} else if (!strcmp(arg, "off")) {
WRITE_ONCE(kcsan_enabled, false);
} else if (!strncmp(arg, "microbench=", sizeof("microbench=") - 1)) {
} else if (str_has_prefix(arg, "microbench=")) {
unsigned long iters;
if (kstrtoul(&arg[sizeof("microbench=") - 1], 0, &iters))
if (kstrtoul(&arg[strlen("microbench=")], 0, &iters))
return -EINVAL;
microbenchmark(iters);
} else if (!strncmp(arg, "test=", sizeof("test=") - 1)) {
unsigned long iters;
if (kstrtoul(&arg[sizeof("test=") - 1], 0, &iters))
return -EINVAL;
test_thread(iters);
} else if (!strcmp(arg, "whitelist")) {
set_report_filterlist_whitelist(true);
} else if (!strcmp(arg, "blacklist")) {

View File

@ -27,6 +27,12 @@
#include <linux/types.h>
#include <trace/events/printk.h>
#ifdef CONFIG_CC_HAS_TSAN_COMPOUND_READ_BEFORE_WRITE
#define __KCSAN_ACCESS_RW(alt) (KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE)
#else
#define __KCSAN_ACCESS_RW(alt) (alt)
#endif
/* Points to current test-case memory access "kernels". */
static void (*access_kernels[2])(void);
@ -186,20 +192,21 @@ static bool report_matches(const struct expect_report *r)
/* Access 1 & 2 */
for (i = 0; i < 2; ++i) {
const int ty = r->access[i].type;
const char *const access_type =
(r->access[i].type & KCSAN_ACCESS_ASSERT) ?
((r->access[i].type & KCSAN_ACCESS_WRITE) ?
"assert no accesses" :
"assert no writes") :
((r->access[i].type & KCSAN_ACCESS_WRITE) ?
"write" :
"read");
(ty & KCSAN_ACCESS_ASSERT) ?
((ty & KCSAN_ACCESS_WRITE) ?
"assert no accesses" :
"assert no writes") :
((ty & KCSAN_ACCESS_WRITE) ?
((ty & KCSAN_ACCESS_COMPOUND) ?
"read-write" :
"write") :
"read");
const char *const access_type_aux =
(r->access[i].type & KCSAN_ACCESS_ATOMIC) ?
" (marked)" :
((r->access[i].type & KCSAN_ACCESS_SCOPED) ?
" (scoped)" :
"");
(ty & KCSAN_ACCESS_ATOMIC) ?
" (marked)" :
((ty & KCSAN_ACCESS_SCOPED) ? " (scoped)" : "");
if (i == 1) {
/* Access 2 */
@ -277,6 +284,12 @@ static noinline void test_kernel_write_atomic(void)
WRITE_ONCE(test_var, READ_ONCE_NOCHECK(test_sink) + 1);
}
static noinline void test_kernel_atomic_rmw(void)
{
/* Use builtin, so we can set up the "bad" atomic/non-atomic scenario. */
__atomic_fetch_add(&test_var, 1, __ATOMIC_RELAXED);
}
__no_kcsan
static noinline void test_kernel_write_uninstrumented(void) { test_var++; }
@ -390,6 +403,15 @@ static noinline void test_kernel_seqlock_writer(void)
write_sequnlock_irqrestore(&test_seqlock, flags);
}
static noinline void test_kernel_atomic_builtins(void)
{
/*
* Generate concurrent accesses, expecting no reports, ensuring KCSAN
* treats builtin atomics as actually atomic.
*/
__atomic_load_n(&test_var, __ATOMIC_RELAXED);
}
/* ===== Test cases ===== */
/* Simple test with normal data race. */
@ -430,8 +452,8 @@ static void test_concurrent_races(struct kunit *test)
const struct expect_report expect = {
.access = {
/* NULL will match any address. */
{ test_kernel_rmw_array, NULL, 0, KCSAN_ACCESS_WRITE },
{ test_kernel_rmw_array, NULL, 0, 0 },
{ test_kernel_rmw_array, NULL, 0, __KCSAN_ACCESS_RW(KCSAN_ACCESS_WRITE) },
{ test_kernel_rmw_array, NULL, 0, __KCSAN_ACCESS_RW(0) },
},
};
static const struct expect_report never = {
@ -620,6 +642,29 @@ static void test_read_plain_atomic_write(struct kunit *test)
KUNIT_EXPECT_TRUE(test, match_expect);
}
/* Test that atomic RMWs generate correct report. */
__no_kcsan
static void test_read_plain_atomic_rmw(struct kunit *test)
{
const struct expect_report expect = {
.access = {
{ test_kernel_read, &test_var, sizeof(test_var), 0 },
{ test_kernel_atomic_rmw, &test_var, sizeof(test_var),
KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC },
},
};
bool match_expect = false;
if (IS_ENABLED(CONFIG_KCSAN_IGNORE_ATOMICS))
return;
begin_test_checks(test_kernel_read, test_kernel_atomic_rmw);
do {
match_expect = report_matches(&expect);
} while (!end_test_checks(match_expect));
KUNIT_EXPECT_TRUE(test, match_expect);
}
/* Zero-sized accesses should never cause data race reports. */
__no_kcsan
static void test_zero_size_access(struct kunit *test)
@ -852,6 +897,59 @@ static void test_seqlock_noreport(struct kunit *test)
KUNIT_EXPECT_FALSE(test, match_never);
}
/*
* Test atomic builtins work and required instrumentation functions exist. We
* also test that KCSAN understands they're atomic by racing with them via
* test_kernel_atomic_builtins(), and expect no reports.
*
* The atomic builtins _SHOULD NOT_ be used in normal kernel code!
*/
static void test_atomic_builtins(struct kunit *test)
{
bool match_never = false;
begin_test_checks(test_kernel_atomic_builtins, test_kernel_atomic_builtins);
do {
long tmp;
kcsan_enable_current();
__atomic_store_n(&test_var, 42L, __ATOMIC_RELAXED);
KUNIT_EXPECT_EQ(test, 42L, __atomic_load_n(&test_var, __ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, 42L, __atomic_exchange_n(&test_var, 20, __ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, 20L, test_var);
tmp = 20L;
KUNIT_EXPECT_TRUE(test, __atomic_compare_exchange_n(&test_var, &tmp, 30L,
0, __ATOMIC_RELAXED,
__ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, tmp, 20L);
KUNIT_EXPECT_EQ(test, test_var, 30L);
KUNIT_EXPECT_FALSE(test, __atomic_compare_exchange_n(&test_var, &tmp, 40L,
1, __ATOMIC_RELAXED,
__ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, tmp, 30L);
KUNIT_EXPECT_EQ(test, test_var, 30L);
KUNIT_EXPECT_EQ(test, 30L, __atomic_fetch_add(&test_var, 1, __ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, 31L, __atomic_fetch_sub(&test_var, 1, __ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, 30L, __atomic_fetch_and(&test_var, 0xf, __ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, 14L, __atomic_fetch_xor(&test_var, 0xf, __ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, 1L, __atomic_fetch_or(&test_var, 0xf0, __ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, 241L, __atomic_fetch_nand(&test_var, 0xf, __ATOMIC_RELAXED));
KUNIT_EXPECT_EQ(test, -2L, test_var);
__atomic_thread_fence(__ATOMIC_SEQ_CST);
__atomic_signal_fence(__ATOMIC_SEQ_CST);
kcsan_disable_current();
match_never = report_available();
} while (!end_test_checks(match_never));
KUNIT_EXPECT_FALSE(test, match_never);
}
/*
* Each test case is run with different numbers of threads. Until KUnit supports
* passing arguments for each test case, we encode #threads in the test case
@ -880,6 +978,7 @@ static struct kunit_case kcsan_test_cases[] = {
KCSAN_KUNIT_CASE(test_write_write_struct_part),
KCSAN_KUNIT_CASE(test_read_atomic_write_atomic),
KCSAN_KUNIT_CASE(test_read_plain_atomic_write),
KCSAN_KUNIT_CASE(test_read_plain_atomic_rmw),
KCSAN_KUNIT_CASE(test_zero_size_access),
KCSAN_KUNIT_CASE(test_data_race),
KCSAN_KUNIT_CASE(test_assert_exclusive_writer),
@ -891,6 +990,7 @@ static struct kunit_case kcsan_test_cases[] = {
KCSAN_KUNIT_CASE(test_assert_exclusive_access_scoped),
KCSAN_KUNIT_CASE(test_jiffies_noreport),
KCSAN_KUNIT_CASE(test_seqlock_noreport),
KCSAN_KUNIT_CASE(test_atomic_builtins),
{},
};

View File

@ -8,6 +8,7 @@
#ifndef _KERNEL_KCSAN_KCSAN_H
#define _KERNEL_KCSAN_KCSAN_H
#include <linux/atomic.h>
#include <linux/kcsan.h>
#include <linux/sched.h>
@ -34,6 +35,10 @@ void kcsan_restore_irqtrace(struct task_struct *task);
*/
void kcsan_debugfs_init(void);
/*
* Statistics counters displayed via debugfs; should only be modified in
* slow-paths.
*/
enum kcsan_counter_id {
/*
* Number of watchpoints currently in use.
@ -86,12 +91,7 @@ enum kcsan_counter_id {
KCSAN_COUNTER_COUNT, /* number of counters */
};
/*
* Increment/decrement counter with given id; avoid calling these in fast-path.
*/
extern void kcsan_counter_inc(enum kcsan_counter_id id);
extern void kcsan_counter_dec(enum kcsan_counter_id id);
extern atomic_long_t kcsan_counters[KCSAN_COUNTER_COUNT];
/*
* Returns true if data races in the function symbol that maps to func_addr

View File

@ -228,6 +228,10 @@ static const char *get_access_type(int type)
return "write";
case KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC:
return "write (marked)";
case KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE:
return "read-write";
case KCSAN_ACCESS_COMPOUND | KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ATOMIC:
return "read-write (marked)";
case KCSAN_ACCESS_SCOPED:
return "read (scoped)";
case KCSAN_ACCESS_SCOPED | KCSAN_ACCESS_ATOMIC:
@ -275,8 +279,8 @@ static int get_stack_skipnr(const unsigned long stack_entries[], int num_entries
cur = strnstr(buf, "kcsan_", len);
if (cur) {
cur += sizeof("kcsan_") - 1;
if (strncmp(cur, "test", sizeof("test") - 1))
cur += strlen("kcsan_");
if (!str_has_prefix(cur, "test"))
continue; /* KCSAN runtime function. */
/* KCSAN related test. */
}
@ -555,7 +559,7 @@ static bool prepare_report_consumer(unsigned long *flags,
* If the actual accesses to not match, this was a false
* positive due to watchpoint encoding.
*/
kcsan_counter_inc(KCSAN_COUNTER_ENCODING_FALSE_POSITIVES);
atomic_long_inc(&kcsan_counters[KCSAN_COUNTER_ENCODING_FALSE_POSITIVES]);
goto discard;
}

View File

@ -1,5 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#define pr_fmt(fmt) "kcsan: " fmt
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/printk.h>
@ -116,16 +118,16 @@ static int __init kcsan_selftest(void)
if (do_test()) \
++passed; \
else \
pr_err("KCSAN selftest: " #do_test " failed"); \
pr_err("selftest: " #do_test " failed"); \
} while (0)
RUN_TEST(test_requires);
RUN_TEST(test_encode_decode);
RUN_TEST(test_matching_access);
pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
pr_info("selftest: %d/%d tests passed\n", passed, total);
if (passed != total)
panic("KCSAN selftests failed");
panic("selftests failed");
return 0;
}
postcore_initcall(kcsan_selftest);

File diff suppressed because it is too large Load Diff

View File

@ -20,9 +20,12 @@ enum lock_usage_bit {
#undef LOCKDEP_STATE
LOCK_USED,
LOCK_USED_READ,
LOCK_USAGE_STATES
LOCK_USAGE_STATES,
};
/* states after LOCK_USED_READ are not traced and printed */
static_assert(LOCK_TRACE_STATES == LOCK_USAGE_STATES);
#define LOCK_USAGE_READ_MASK 1
#define LOCK_USAGE_DIR_MASK 2
#define LOCK_USAGE_STATE_MASK (~(LOCK_USAGE_READ_MASK | LOCK_USAGE_DIR_MASK))
@ -121,7 +124,7 @@ static const unsigned long LOCKF_USED_IN_IRQ_READ =
extern struct list_head all_lock_classes;
extern struct lock_chain lock_chains[];
#define LOCK_USAGE_CHARS (1+LOCK_USAGE_STATES/2)
#define LOCK_USAGE_CHARS (2*XXX_LOCK_USAGE_STATES + 1)
extern void get_usage_chars(struct lock_class *class,
char usage[LOCK_USAGE_CHARS]);

View File

@ -35,7 +35,7 @@
* into a single 64-byte cache line.
*/
struct clock_data {
seqcount_t seq;
seqcount_latch_t seq;
struct clock_read_data read_data[2];
ktime_t wrap_kt;
unsigned long rate;
@ -76,7 +76,7 @@ struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
int sched_clock_read_retry(unsigned int seq)
{
return read_seqcount_retry(&cd.seq, seq);
return read_seqcount_latch_retry(&cd.seq, seq);
}
unsigned long long notrace sched_clock(void)
@ -258,7 +258,7 @@ void __init generic_sched_clock_init(void)
*/
static u64 notrace suspended_sched_clock_read(void)
{
unsigned int seq = raw_read_seqcount(&cd.seq);
unsigned int seq = raw_read_seqcount_latch(&cd.seq);
return cd.read_data[seq & 1].epoch_cyc;
}

View File

@ -67,7 +67,7 @@ int __read_mostly timekeeping_suspended;
* See @update_fast_timekeeper() below.
*/
struct tk_fast {
seqcount_raw_spinlock_t seq;
seqcount_latch_t seq;
struct tk_read_base base[2];
};
@ -101,13 +101,13 @@ static struct clocksource dummy_clock = {
}
static struct tk_fast tk_fast_mono ____cacheline_aligned = {
.seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, &timekeeper_lock),
.seq = SEQCNT_LATCH_ZERO(tk_fast_mono.seq),
.base[0] = FAST_TK_INIT,
.base[1] = FAST_TK_INIT,
};
static struct tk_fast tk_fast_raw ____cacheline_aligned = {
.seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, &timekeeper_lock),
.seq = SEQCNT_LATCH_ZERO(tk_fast_raw.seq),
.base[0] = FAST_TK_INIT,
.base[1] = FAST_TK_INIT,
};
@ -484,7 +484,7 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
tk_clock_read(tkr),
tkr->cycle_last,
tkr->mask));
} while (read_seqcount_retry(&tkf->seq, seq));
} while (read_seqcount_latch_retry(&tkf->seq, seq));
return now;
}
@ -548,7 +548,7 @@ static __always_inline u64 __ktime_get_real_fast(struct tk_fast *tkf, u64 *mono)
delta = timekeeping_delta_to_ns(tkr,
clocksource_delta(tk_clock_read(tkr),
tkr->cycle_last, tkr->mask));
} while (read_seqcount_retry(&tkf->seq, seq));
} while (read_seqcount_latch_retry(&tkf->seq, seq));
if (mono)
*mono = basem + delta;

View File

@ -40,6 +40,11 @@ menuconfig KCSAN
if KCSAN
# Compiler capabilities that should not fail the test if they are unavailable.
config CC_HAS_TSAN_COMPOUND_READ_BEFORE_WRITE
def_bool (CC_IS_CLANG && $(cc-option,-fsanitize=thread -mllvm -tsan-compound-read-before-write=1)) || \
(CC_IS_GCC && $(cc-option,-fsanitize=thread --param tsan-compound-read-before-write=1))
config KCSAN_VERBOSE
bool "Show verbose reports with more information about system state"
depends on PROVE_LOCKING

View File

@ -28,6 +28,7 @@
* Change this to 1 if you want to see the failure printouts:
*/
static unsigned int debug_locks_verbose;
unsigned int force_read_lock_recursive;
static DEFINE_WD_CLASS(ww_lockdep);
@ -395,6 +396,49 @@ static void rwsem_ABBA1(void)
MU(Y1); // should fail
}
/*
* read_lock(A)
* spin_lock(B)
* spin_lock(B)
* write_lock(A)
*
* This test case is aimed at poking whether the chain cache prevents us from
* detecting a read-lock/lock-write deadlock: if the chain cache doesn't differ
* read/write locks, the following case may happen
*
* { read_lock(A)->lock(B) dependency exists }
*
* P0:
* lock(B);
* read_lock(A);
*
* { Not a deadlock, B -> A is added in the chain cache }
*
* P1:
* lock(B);
* write_lock(A);
*
* { B->A found in chain cache, not reported as a deadlock }
*
*/
static void rlock_chaincache_ABBA1(void)
{
RL(X1);
L(Y1);
U(Y1);
RU(X1);
L(Y1);
RL(X1);
RU(X1);
U(Y1);
L(Y1);
WL(X1);
WU(X1);
U(Y1); // should fail
}
/*
* read_lock(A)
* spin_lock(B)
@ -990,6 +1034,133 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_soft_wlock)
#undef E2
#undef E3
/*
* write-read / write-read / write-read deadlock even if read is recursive
*/
#define E1() \
\
WL(X1); \
RL(Y1); \
RU(Y1); \
WU(X1);
#define E2() \
\
WL(Y1); \
RL(Z1); \
RU(Z1); \
WU(Y1);
#define E3() \
\
WL(Z1); \
RL(X1); \
RU(X1); \
WU(Z1);
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(W1R2_W2R3_W3R1)
#undef E1
#undef E2
#undef E3
/*
* write-write / read-read / write-read deadlock even if read is recursive
*/
#define E1() \
\
WL(X1); \
WL(Y1); \
WU(Y1); \
WU(X1);
#define E2() \
\
RL(Y1); \
RL(Z1); \
RU(Z1); \
RU(Y1);
#define E3() \
\
WL(Z1); \
RL(X1); \
RU(X1); \
WU(Z1);
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(W1W2_R2R3_W3R1)
#undef E1
#undef E2
#undef E3
/*
* write-write / read-read / read-write is not deadlock when read is recursive
*/
#define E1() \
\
WL(X1); \
WL(Y1); \
WU(Y1); \
WU(X1);
#define E2() \
\
RL(Y1); \
RL(Z1); \
RU(Z1); \
RU(Y1);
#define E3() \
\
RL(Z1); \
WL(X1); \
WU(X1); \
RU(Z1);
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(W1R2_R2R3_W3W1)
#undef E1
#undef E2
#undef E3
/*
* write-read / read-read / write-write is not deadlock when read is recursive
*/
#define E1() \
\
WL(X1); \
RL(Y1); \
RU(Y1); \
WU(X1);
#define E2() \
\
RL(Y1); \
RL(Z1); \
RU(Z1); \
RU(Y1);
#define E3() \
\
WL(Z1); \
WL(X1); \
WU(X1); \
WU(Z1);
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(W1W2_R2R3_R3W1)
#undef E1
#undef E2
#undef E3
/*
* read-lock / write-lock recursion that is actually safe.
*/
@ -1009,20 +1180,28 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_soft_wlock)
#define E3() \
\
IRQ_ENTER(); \
RL(A); \
LOCK(A); \
L(B); \
U(B); \
RU(A); \
UNLOCK(A); \
IRQ_EXIT();
/*
* Generate 12 testcases:
* Generate 24 testcases:
*/
#include "locking-selftest-hardirq.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard)
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard_rlock)
#include "locking-selftest-wlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard_wlock)
#include "locking-selftest-softirq.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft_rlock)
#include "locking-selftest-wlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft_wlock)
#undef E1
#undef E2
@ -1036,8 +1215,8 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
\
IRQ_DISABLE(); \
L(B); \
WL(A); \
WU(A); \
LOCK(A); \
UNLOCK(A); \
U(B); \
IRQ_ENABLE();
@ -1054,13 +1233,75 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
IRQ_EXIT();
/*
* Generate 12 testcases:
* Generate 24 testcases:
*/
#include "locking-selftest-hardirq.h"
// GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard)
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard_rlock)
#include "locking-selftest-wlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard_wlock)
#include "locking-selftest-softirq.h"
// GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft)
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_rlock)
#include "locking-selftest-wlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_wlock)
#undef E1
#undef E2
#undef E3
/*
* read-lock / write-lock recursion that is unsafe.
*
* A is a ENABLED_*_READ lock
* B is a USED_IN_*_READ lock
*
* read_lock(A);
* write_lock(B);
* <interrupt>
* read_lock(B);
* write_lock(A); // if this one is read_lock(), no deadlock
*/
#define E1() \
\
IRQ_DISABLE(); \
WL(B); \
LOCK(A); \
UNLOCK(A); \
WU(B); \
IRQ_ENABLE();
#define E2() \
\
RL(A); \
RU(A); \
#define E3() \
\
IRQ_ENTER(); \
RL(B); \
RU(B); \
IRQ_EXIT();
/*
* Generate 24 testcases:
*/
#include "locking-selftest-hardirq.h"
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_hard_rlock)
#include "locking-selftest-wlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_hard_wlock)
#include "locking-selftest-softirq.h"
#include "locking-selftest-rlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_soft_rlock)
#include "locking-selftest-wlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_soft_wlock)
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define I_SPINLOCK(x) lockdep_reset_lock(&lock_##x.dep_map)
@ -1199,6 +1440,19 @@ static inline void print_testname(const char *testname)
dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK); \
pr_cont("\n");
#define DO_TESTCASE_1RR(desc, name, nr) \
print_testname(desc"/"#nr); \
pr_cont(" |"); \
dotest(name##_##nr, SUCCESS, LOCKTYPE_RWLOCK); \
pr_cont("\n");
#define DO_TESTCASE_1RRB(desc, name, nr) \
print_testname(desc"/"#nr); \
pr_cont(" |"); \
dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK); \
pr_cont("\n");
#define DO_TESTCASE_3(desc, name, nr) \
print_testname(desc"/"#nr); \
dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN); \
@ -1213,6 +1467,25 @@ static inline void print_testname(const char *testname)
dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK); \
pr_cont("\n");
#define DO_TESTCASE_2RW(desc, name, nr) \
print_testname(desc"/"#nr); \
pr_cont(" |"); \
dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK); \
dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK); \
pr_cont("\n");
#define DO_TESTCASE_2x2RW(desc, name, nr) \
DO_TESTCASE_2RW("hard-"desc, name##_hard, nr) \
DO_TESTCASE_2RW("soft-"desc, name##_soft, nr) \
#define DO_TESTCASE_6x2x2RW(desc, name) \
DO_TESTCASE_2x2RW(desc, name, 123); \
DO_TESTCASE_2x2RW(desc, name, 132); \
DO_TESTCASE_2x2RW(desc, name, 213); \
DO_TESTCASE_2x2RW(desc, name, 231); \
DO_TESTCASE_2x2RW(desc, name, 312); \
DO_TESTCASE_2x2RW(desc, name, 321);
#define DO_TESTCASE_6(desc, name) \
print_testname(desc); \
dotest(name##_spin, FAILURE, LOCKTYPE_SPIN); \
@ -1289,6 +1562,22 @@ static inline void print_testname(const char *testname)
DO_TESTCASE_2IB(desc, name, 312); \
DO_TESTCASE_2IB(desc, name, 321);
#define DO_TESTCASE_6x1RR(desc, name) \
DO_TESTCASE_1RR(desc, name, 123); \
DO_TESTCASE_1RR(desc, name, 132); \
DO_TESTCASE_1RR(desc, name, 213); \
DO_TESTCASE_1RR(desc, name, 231); \
DO_TESTCASE_1RR(desc, name, 312); \
DO_TESTCASE_1RR(desc, name, 321);
#define DO_TESTCASE_6x1RRB(desc, name) \
DO_TESTCASE_1RRB(desc, name, 123); \
DO_TESTCASE_1RRB(desc, name, 132); \
DO_TESTCASE_1RRB(desc, name, 213); \
DO_TESTCASE_1RRB(desc, name, 231); \
DO_TESTCASE_1RRB(desc, name, 312); \
DO_TESTCASE_1RRB(desc, name, 321);
#define DO_TESTCASE_6x6(desc, name) \
DO_TESTCASE_6I(desc, name, 123); \
DO_TESTCASE_6I(desc, name, 132); \
@ -1966,6 +2255,108 @@ static void ww_tests(void)
pr_cont("\n");
}
/*
* <in hardirq handler>
* read_lock(&A);
* <hardirq disable>
* spin_lock(&B);
* spin_lock(&B);
* read_lock(&A);
*
* is a deadlock.
*/
static void queued_read_lock_hardirq_RE_Er(void)
{
HARDIRQ_ENTER();
read_lock(&rwlock_A);
LOCK(B);
UNLOCK(B);
read_unlock(&rwlock_A);
HARDIRQ_EXIT();
HARDIRQ_DISABLE();
LOCK(B);
read_lock(&rwlock_A);
read_unlock(&rwlock_A);
UNLOCK(B);
HARDIRQ_ENABLE();
}
/*
* <in hardirq handler>
* spin_lock(&B);
* <hardirq disable>
* read_lock(&A);
* read_lock(&A);
* spin_lock(&B);
*
* is not a deadlock.
*/
static void queued_read_lock_hardirq_ER_rE(void)
{
HARDIRQ_ENTER();
LOCK(B);
read_lock(&rwlock_A);
read_unlock(&rwlock_A);
UNLOCK(B);
HARDIRQ_EXIT();
HARDIRQ_DISABLE();
read_lock(&rwlock_A);
LOCK(B);
UNLOCK(B);
read_unlock(&rwlock_A);
HARDIRQ_ENABLE();
}
/*
* <hardirq disable>
* spin_lock(&B);
* read_lock(&A);
* <in hardirq handler>
* spin_lock(&B);
* read_lock(&A);
*
* is a deadlock. Because the two read_lock()s are both non-recursive readers.
*/
static void queued_read_lock_hardirq_inversion(void)
{
HARDIRQ_ENTER();
LOCK(B);
UNLOCK(B);
HARDIRQ_EXIT();
HARDIRQ_DISABLE();
LOCK(B);
read_lock(&rwlock_A);
read_unlock(&rwlock_A);
UNLOCK(B);
HARDIRQ_ENABLE();
read_lock(&rwlock_A);
read_unlock(&rwlock_A);
}
static void queued_read_lock_tests(void)
{
printk(" --------------------------------------------------------------------------\n");
printk(" | queued read lock tests |\n");
printk(" ---------------------------\n");
print_testname("hardirq read-lock/lock-read");
dotest(queued_read_lock_hardirq_RE_Er, FAILURE, LOCKTYPE_RWLOCK);
pr_cont("\n");
print_testname("hardirq lock-read/read-lock");
dotest(queued_read_lock_hardirq_ER_rE, SUCCESS, LOCKTYPE_RWLOCK);
pr_cont("\n");
print_testname("hardirq inversion");
dotest(queued_read_lock_hardirq_inversion, FAILURE, LOCKTYPE_RWLOCK);
pr_cont("\n");
}
void locking_selftest(void)
{
/*
@ -1978,6 +2369,11 @@ void locking_selftest(void)
return;
}
/*
* treats read_lock() as recursive read locks for testing purpose
*/
force_read_lock_recursive = 1;
/*
* Run the testsuite:
*/
@ -2033,14 +2429,6 @@ void locking_selftest(void)
print_testname("mixed read-lock/lock-write ABBA");
pr_cont(" |");
dotest(rlock_ABBA1, FAILURE, LOCKTYPE_RWLOCK);
#ifdef CONFIG_PROVE_LOCKING
/*
* Lockdep does indeed fail here, but there's nothing we can do about
* that now. Don't kill lockdep for it.
*/
unexpected_testcase_failures--;
#endif
pr_cont(" |");
dotest(rwsem_ABBA1, FAILURE, LOCKTYPE_RWSEM);
@ -2056,6 +2444,15 @@ void locking_selftest(void)
pr_cont(" |");
dotest(rwsem_ABBA3, FAILURE, LOCKTYPE_RWSEM);
print_testname("chain cached mixed R-L/L-W ABBA");
pr_cont(" |");
dotest(rlock_chaincache_ABBA1, FAILURE, LOCKTYPE_RWLOCK);
DO_TESTCASE_6x1RRB("rlock W1R2/W2R3/W3R1", W1R2_W2R3_W3R1);
DO_TESTCASE_6x1RRB("rlock W1W2/R2R3/W3R1", W1W2_R2R3_W3R1);
DO_TESTCASE_6x1RR("rlock W1W2/R2R3/R3W1", W1W2_R2R3_R3W1);
DO_TESTCASE_6x1RR("rlock W1R2/R2R3/W3W1", W1R2_R2R3_W3W1);
printk(" --------------------------------------------------------------------------\n");
/*
@ -2068,11 +2465,19 @@ void locking_selftest(void)
DO_TESTCASE_6x6("safe-A + unsafe-B #2", irqsafe4);
DO_TESTCASE_6x6RW("irq lock-inversion", irq_inversion);
DO_TESTCASE_6x2("irq read-recursion", irq_read_recursion);
// DO_TESTCASE_6x2B("irq read-recursion #2", irq_read_recursion2);
DO_TESTCASE_6x2x2RW("irq read-recursion", irq_read_recursion);
DO_TESTCASE_6x2x2RW("irq read-recursion #2", irq_read_recursion2);
DO_TESTCASE_6x2x2RW("irq read-recursion #3", irq_read_recursion3);
ww_tests();
force_read_lock_recursive = 0;
/*
* queued_read_lock() specific test cases can be put here
*/
if (IS_ENABLED(CONFIG_QUEUED_RWLOCKS))
queued_read_lock_tests();
if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
debug_locks = 0;

View File

@ -763,10 +763,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
*/
void lru_add_drain_all(void)
{
static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
static DEFINE_MUTEX(lock);
/*
* lru_drain_gen - Global pages generation number
*
* (A) Definition: global lru_drain_gen = x implies that all generations
* 0 < n <= x are already *scheduled* for draining.
*
* This is an optimization for the highly-contended use case where a
* user space workload keeps constantly generating a flow of pages for
* each CPU.
*/
static unsigned int lru_drain_gen;
static struct cpumask has_work;
int cpu, seq;
static DEFINE_MUTEX(lock);
unsigned cpu, this_gen;
/*
* Make sure nobody triggers this path before mm_percpu_wq is fully
@ -775,21 +785,54 @@ void lru_add_drain_all(void)
if (WARN_ON(!mm_percpu_wq))
return;
seq = raw_read_seqcount_latch(&seqcount);
/*
* Guarantee pagevec counter stores visible by this CPU are visible to
* other CPUs before loading the current drain generation.
*/
smp_mb();
/*
* (B) Locally cache global LRU draining generation number
*
* The read barrier ensures that the counter is loaded before the mutex
* is taken. It pairs with smp_mb() inside the mutex critical section
* at (D).
*/
this_gen = smp_load_acquire(&lru_drain_gen);
mutex_lock(&lock);
/*
* Piggyback on drain started and finished while we waited for lock:
* all pages pended at the time of our enter were drained from vectors.
* (C) Exit the draining operation if a newer generation, from another
* lru_add_drain_all(), was already scheduled for draining. Check (A).
*/
if (__read_seqcount_retry(&seqcount, seq))
if (unlikely(this_gen != lru_drain_gen))
goto done;
raw_write_seqcount_latch(&seqcount);
/*
* (D) Increment global generation number
*
* Pairs with smp_load_acquire() at (B), outside of the critical
* section. Use a full memory barrier to guarantee that the new global
* drain generation number is stored before loading pagevec counters.
*
* This pairing must be done here, before the for_each_online_cpu loop
* below which drains the page vectors.
*
* Let x, y, and z represent some system CPU numbers, where x < y < z.
* Assume CPU #z is is in the middle of the for_each_online_cpu loop
* below and has already reached CPU #y's per-cpu data. CPU #x comes
* along, adds some pages to its per-cpu vectors, then calls
* lru_add_drain_all().
*
* If the paired barrier is done at any later step, e.g. after the
* loop, CPU #x will just exit at (C) and miss flushing out all of its
* added pages.
*/
WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
smp_mb();
cpumask_clear(&has_work);
for_each_online_cpu(cpu) {
struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
@ -801,7 +844,7 @@ void lru_add_drain_all(void)
need_activate_page_drain(cpu)) {
INIT_WORK(work, lru_add_drain_per_cpu);
queue_work_on(cpu, mm_percpu_wq, work);
cpumask_set_cpu(cpu, &has_work);
__cpumask_set_cpu(cpu, &has_work);
}
}
@ -816,7 +859,7 @@ void lru_add_drain_all(void)
{
lru_add_drain();
}
#endif
#endif /* CONFIG_SMP */
/**
* release_pages - batched put_page()

View File

@ -11,5 +11,5 @@ endif
# of some options does not break KCSAN nor causes false positive reports.
CFLAGS_KCSAN := -fsanitize=thread \
$(call cc-option,$(call cc-param,tsan-instrument-func-entry-exit=0) -fno-optimize-sibling-calls) \
$(call cc-option,$(call cc-param,tsan-instrument-read-before-write=1)) \
$(call cc-option,$(call cc-param,tsan-compound-read-before-write=1),$(call cc-option,$(call cc-param,tsan-instrument-read-before-write=1))) \
$(call cc-param,tsan-distinguish-volatile=1)

View File

@ -16,6 +16,7 @@ fi
cat <<EOF |
asm-generic/atomic-instrumented.h
asm-generic/atomic-long.h
linux/atomic-arch-fallback.h
linux/atomic-fallback.h
EOF
while read header; do

View File

@ -5,9 +5,10 @@ ATOMICDIR=$(dirname $0)
. ${ATOMICDIR}/atomic-tbl.sh
#gen_param_check(arg)
#gen_param_check(meta, arg)
gen_param_check()
{
local meta="$1"; shift
local arg="$1"; shift
local type="${arg%%:*}"
local name="$(gen_param_name "${arg}")"
@ -17,17 +18,25 @@ gen_param_check()
i) return;;
esac
# We don't write to constant parameters
[ ${type#c} != ${type} ] && rw="read"
if [ ${type#c} != ${type} ]; then
# We don't write to constant parameters.
rw="read"
elif [ "${meta}" != "s" ]; then
# An atomic RMW: if this parameter is not a constant, and this atomic is
# not just a 's'tore, this parameter is both read from and written to.
rw="read_write"
fi
printf "\tinstrument_atomic_${rw}(${name}, sizeof(*${name}));\n"
}
#gen_param_check(arg...)
#gen_params_checks(meta, arg...)
gen_params_checks()
{
local meta="$1"; shift
while [ "$#" -gt 0 ]; do
gen_param_check "$1"
gen_param_check "$meta" "$1"
shift;
done
}
@ -77,7 +86,7 @@ gen_proto_order_variant()
local ret="$(gen_ret_type "${meta}" "${int}")"
local params="$(gen_params "${int}" "${atomic}" "$@")"
local checks="$(gen_params_checks "$@")"
local checks="$(gen_params_checks "${meta}" "$@")"
local args="$(gen_args "$@")"
local retstmt="$(gen_ret_stmt "${meta}")"

View File

@ -205,6 +205,8 @@ regex_c=(
'/\<DEVICE_ATTR_\(RW\|RO\|WO\)(\([[:alnum:]_]\+\)/dev_attr_\2/'
'/\<DRIVER_ATTR_\(RW\|RO\|WO\)(\([[:alnum:]_]\+\)/driver_attr_\2/'
'/\<\(DEFINE\|DECLARE\)_STATIC_KEY_\(TRUE\|FALSE\)\(\|_RO\)(\([[:alnum:]_]\+\)/\4/'
'/^SEQCOUNT_LOCKTYPE(\([^,]*\),[[:space:]]*\([^,]*\),[^)]*)/seqcount_\2_t/'
'/^SEQCOUNT_LOCKTYPE(\([^,]*\),[[:space:]]*\([^,]*\),[^)]*)/seqcount_\2_init/'
)
regex_kconfig=(
'/^[[:blank:]]*\(menu\|\)config[[:blank:]]\+\([[:alnum:]_]\+\)/\2/'

View File

@ -3,9 +3,9 @@
C Self R W RMW Self R W DR DW RMW SV
-- ---- - - --- ---- - - -- -- --- --
Store, e.g., WRITE_ONCE() Y Y
Load, e.g., READ_ONCE() Y Y Y Y
Unsuccessful RMW operation Y Y Y Y
Relaxed store Y Y
Relaxed load Y Y Y Y
Relaxed RMW operation Y Y Y Y
rcu_dereference() Y Y Y Y
Successful *_acquire() R Y Y Y Y Y Y
Successful *_release() C Y Y Y W Y
@ -17,14 +17,19 @@ smp_mb__before_atomic() CP Y Y Y a a a a Y
smp_mb__after_atomic() CP a a Y Y Y Y Y Y
Key: C: Ordering is cumulative
P: Ordering propagates
R: Read, for example, READ_ONCE(), or read portion of RMW
W: Write, for example, WRITE_ONCE(), or write portion of RMW
Y: Provides ordering
a: Provides ordering given intervening RMW atomic operation
DR: Dependent read (address dependency)
DW: Dependent write (address, data, or control dependency)
RMW: Atomic read-modify-write operation
SELF: Orders self, as opposed to accesses before and/or after
SV: Orders later accesses to the same variable
Key: Relaxed: A relaxed operation is either READ_ONCE(), WRITE_ONCE(),
a *_relaxed() RMW operation, an unsuccessful RMW
operation, a non-value-returning RMW operation such
as atomic_inc(), or one of the atomic*_read() and
atomic*_set() family of operations.
C: Ordering is cumulative
P: Ordering propagates
R: Read, for example, READ_ONCE(), or read portion of RMW
W: Write, for example, WRITE_ONCE(), or write portion of RMW
Y: Provides ordering
a: Provides ordering given intervening RMW atomic operation
DR: Dependent read (address dependency)
DW: Dependent write (address, data, or control dependency)
RMW: Atomic read-modify-write operation
SELF: Orders self, as opposed to accesses before and/or after
SV: Orders later accesses to the same variable

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +1,7 @@
This document provides "recipes", that is, litmus tests for commonly
occurring situations, as well as a few that illustrate subtly broken but
attractive nuisances. Many of these recipes include example code from
v4.13 of the Linux kernel.
v5.7 of the Linux kernel.
The first section covers simple special cases, the second section
takes off the training wheels to cover more involved examples,
@ -278,7 +278,7 @@ is present if the value loaded determines the address of a later access
first place (control dependency). Note that the term "data dependency"
is sometimes casually used to cover both address and data dependencies.
In lib/prime_numbers.c, the expand_to_next_prime() function invokes
In lib/math/prime_numbers.c, the expand_to_next_prime() function invokes
rcu_assign_pointer(), and the next_prime_number() function invokes
rcu_dereference(). This combination mediates access to a bit vector
that is expanded as additional primes are needed.

View File

@ -120,7 +120,7 @@ o Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. "Herding
o Jade Alglave, Patrick Cousot, and Luc Maranget. 2016. "Syntax and
semantics of the weak consistency model specification language
cat". CoRR abs/1608.07531 (2016). http://arxiv.org/abs/1608.07531
cat". CoRR abs/1608.07531 (2016). https://arxiv.org/abs/1608.07531
Memory-model comparisons

View File

@ -0,0 +1,271 @@
This document provides options for those wishing to keep their
memory-ordering lives simple, as is necessary for those whose domain
is complex. After all, there are bugs other than memory-ordering bugs,
and the time spent gaining memory-ordering knowledge is not available
for gaining domain knowledge. Furthermore Linux-kernel memory model
(LKMM) is quite complex, with subtle differences in code often having
dramatic effects on correctness.
The options near the beginning of this list are quite simple. The idea
is not that kernel hackers don't already know about them, but rather
that they might need the occasional reminder.
Please note that this is a generic guide, and that specific subsystems
will often have special requirements or idioms. For example, developers
of MMIO-based device drivers will often need to use mb(), rmb(), and
wmb(), and therefore might find smp_mb(), smp_rmb(), and smp_wmb()
to be more natural than smp_load_acquire() and smp_store_release().
On the other hand, those coming in from other environments will likely
be more familiar with these last two.
Single-threaded code
====================
In single-threaded code, there is no reordering, at least assuming
that your toolchain and hardware are working correctly. In addition,
it is generally a mistake to assume your code will only run in a single
threaded context as the kernel can enter the same code path on multiple
CPUs at the same time. One important exception is a function that makes
no external data references.
In the general case, you will need to take explicit steps to ensure that
your code really is executed within a single thread that does not access
shared variables. A simple way to achieve this is to define a global lock
that you acquire at the beginning of your code and release at the end,
taking care to ensure that all references to your code's shared data are
also carried out under that same lock. Because only one thread can hold
this lock at a given time, your code will be executed single-threaded.
This approach is called "code locking".
Code locking can severely limit both performance and scalability, so it
should be used with caution, and only on code paths that execute rarely.
After all, a huge amount of effort was required to remove the Linux
kernel's old "Big Kernel Lock", so let's please be very careful about
adding new "little kernel locks".
One of the advantages of locking is that, in happy contrast with the
year 1981, almost all kernel developers are very familiar with locking.
The Linux kernel's lockdep (CONFIG_PROVE_LOCKING=y) is very helpful with
the formerly feared deadlock scenarios.
Please use the standard locking primitives provided by the kernel rather
than rolling your own. For one thing, the standard primitives interact
properly with lockdep. For another thing, these primitives have been
tuned to deal better with high contention. And for one final thing, it is
surprisingly hard to correctly code production-quality lock acquisition
and release functions. After all, even simple non-production-quality
locking functions must carefully prevent both the CPU and the compiler
from moving code in either direction across the locking function.
Despite the scalability limitations of single-threaded code, RCU
takes this approach for much of its grace-period processing and also
for early-boot operation. The reason RCU is able to scale despite
single-threaded grace-period processing is use of batching, where all
updates that accumulated during one grace period are handled by the
next one. In other words, slowing down grace-period processing makes
it more efficient. Nor is RCU unique: Similar batching optimizations
are used in many I/O operations.
Packaged code
=============
Even if performance and scalability concerns prevent your code from
being completely single-threaded, it is often possible to use library
functions that handle the concurrency nearly or entirely on their own.
This approach delegates any LKMM worries to the library maintainer.
In the kernel, what is the "library"? Quite a bit. It includes the
contents of the lib/ directory, much of the include/linux/ directory along
with a lot of other heavily used APIs. But heavily used examples include
the list macros (for example, include/linux/{,rcu}list.h), workqueues,
smp_call_function(), and the various hash tables and search trees.
Data locking
============
With code locking, we use single-threaded code execution to guarantee
serialized access to the data that the code is accessing. However,
we can also achieve this by instead associating the lock with specific
instances of the data structures. This creates a "critical section"
in the code execution that will execute as though it is single threaded.
By placing all the accesses and modifications to a shared data structure
inside a critical section, we ensure that the execution context that
holds the lock has exclusive access to the shared data.
The poster boy for this approach is the hash table, where placing a lock
in each hash bucket allows operations on different buckets to proceed
concurrently. This works because the buckets do not overlap with each
other, so that an operation on one bucket does not interfere with any
other bucket.
As the number of buckets increases, data locking scales naturally.
In particular, if the amount of data increases with the number of CPUs,
increasing the number of buckets as the number of CPUs increase results
in a naturally scalable data structure.
Per-CPU processing
==================
Partitioning processing and data over CPUs allows each CPU to take
a single-threaded approach while providing excellent performance and
scalability. Of course, there is no free lunch: The dark side of this
excellence is substantially increased memory footprint.
In addition, it is sometimes necessary to occasionally update some global
view of this processing and data, in which case something like locking
must be used to protect this global view. This is the approach taken
by the percpu_counter infrastructure. In many cases, there are already
generic/library variants of commonly used per-cpu constructs available.
Please use them rather than rolling your own.
RCU uses DEFINE_PER_CPU*() declaration to create a number of per-CPU
data sets. For example, each CPU does private quiescent-state processing
within its instance of the per-CPU rcu_data structure, and then uses data
locking to report quiescent states up the grace-period combining tree.
Packaged primitives: Sequence locking
=====================================
Lockless programming is considered by many to be more difficult than
lock-based programming, but there are a few lockless design patterns that
have been built out into an API. One of these APIs is sequence locking.
Although this APIs can be used in extremely complex ways, there are simple
and effective ways of using it that avoid the need to pay attention to
memory ordering.
The basic keep-things-simple rule for sequence locking is "do not write
in read-side code". Yes, you can do writes from within sequence-locking
readers, but it won't be so simple. For example, such writes will be
lockless and should be idempotent.
For more sophisticated use cases, LKMM can guide you, including use
cases involving combining sequence locking with other synchronization
primitives. (LKMM does not yet know about sequence locking, so it is
currently necessary to open-code it in your litmus tests.)
Additional information may be found in include/linux/seqlock.h.
Packaged primitives: RCU
========================
Another lockless design pattern that has been baked into an API
is RCU. The Linux kernel makes sophisticated use of RCU, but the
keep-things-simple rules for RCU are "do not write in read-side code"
and "do not update anything that is visible to and accessed by readers",
and "protect updates with locking".
These rules are illustrated by the functions foo_update_a() and
foo_get_a() shown in Documentation/RCU/whatisRCU.rst. Additional
RCU usage patterns maybe found in Documentation/RCU and in the
source code.
Packaged primitives: Atomic operations
======================================
Back in the day, the Linux kernel had three types of atomic operations:
1. Initialization and read-out, such as atomic_set() and atomic_read().
2. Operations that did not return a value and provided no ordering,
such as atomic_inc() and atomic_dec().
3. Operations that returned a value and provided full ordering, such as
atomic_add_return() and atomic_dec_and_test(). Note that some
value-returning operations provide full ordering only conditionally.
For example, cmpxchg() provides ordering only upon success.
More recent kernels have operations that return a value but do not
provide full ordering. These are flagged with either a _relaxed()
suffix (providing no ordering), or an _acquire() or _release() suffix
(providing limited ordering).
Additional information may be found in these files:
Documentation/atomic_t.txt
Documentation/atomic_bitops.txt
Documentation/core-api/atomic_ops.rst
Documentation/core-api/refcount-vs-atomic.rst
Reading code using these primitives is often also quite helpful.
Lockless, fully ordered
=======================
When using locking, there often comes a time when it is necessary
to access some variable or another without holding the data lock
that serializes access to that variable.
If you want to keep things simple, use the initialization and read-out
operations from the previous section only when there are no racing
accesses. Otherwise, use only fully ordered operations when accessing
or modifying the variable. This approach guarantees that code prior
to a given access to that variable will be seen by all CPUs has having
happened before any code following any later access to that same variable.
Please note that per-CPU functions are not atomic operations and
hence they do not provide any ordering guarantees at all.
If the lockless accesses are frequently executed reads that are used
only for heuristics, or if they are frequently executed writes that
are used only for statistics, please see the next section.
Lockless statistics and heuristics
==================================
Unordered primitives such as atomic_read(), atomic_set(), READ_ONCE(), and
WRITE_ONCE() can safely be used in some cases. These primitives provide
no ordering, but they do prevent the compiler from carrying out a number
of destructive optimizations (for which please see the next section).
One example use for these primitives is statistics, such as per-CPU
counters exemplified by the rt_cache_stat structure's routing-cache
statistics counters. Another example use case is heuristics, such as
the jiffies_till_first_fqs and jiffies_till_next_fqs kernel parameters
controlling how often RCU scans for idle CPUs.
But be careful. "Unordered" really does mean "unordered". It is all
too easy to assume ordering, and this assumption must be avoided when
using these primitives.
Don't let the compiler trip you up
==================================
It can be quite tempting to use plain C-language accesses for lockless
loads from and stores to shared variables. Although this is both
possible and quite common in the Linux kernel, it does require a
surprising amount of analysis, care, and knowledge about the compiler.
Yes, some decades ago it was not unfair to consider a C compiler to be
an assembler with added syntax and better portability, but the advent of
sophisticated optimizing compilers mean that those days are long gone.
Today's optimizing compilers can profoundly rewrite your code during the
translation process, and have long been ready, willing, and able to do so.
Therefore, if you really need to use C-language assignments instead of
READ_ONCE(), WRITE_ONCE(), and so on, you will need to have a very good
understanding of both the C standard and your compiler. Here are some
introductory references and some tooling to start you on this noble quest:
Who's afraid of a big bad optimizing compiler?
https://lwn.net/Articles/793253/
Calibrating your fear of big bad optimizing compilers
https://lwn.net/Articles/799218/
Concurrency bugs should fear the big bad data-race detector (part 1)
https://lwn.net/Articles/816850/
Concurrency bugs should fear the big bad data-race detector (part 2)
https://lwn.net/Articles/816854/
More complex use cases
======================
If the alternatives above do not do what you need, please look at the
recipes-pairs.txt file to peel off the next layer of the memory-ordering
onion.

View File

@ -63,10 +63,32 @@ BASIC USAGE: HERD7
==================
The memory model is used, in conjunction with "herd7", to exhaustively
explore the state space of small litmus tests.
explore the state space of small litmus tests. Documentation describing
the format, features, capabilities and limitations of these litmus
tests is available in tools/memory-model/Documentation/litmus-tests.txt.
For example, to run SB+fencembonceonces.litmus against the memory model:
Example litmus tests may be found in the Linux-kernel source tree:
tools/memory-model/litmus-tests/
Documentation/litmus-tests/
Several thousand more example litmus tests are available here:
https://github.com/paulmckrcu/litmus
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git/tree/CodeSamples/formal/herd
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git/tree/CodeSamples/formal/litmus
Documentation describing litmus tests and now to use them may be found
here:
tools/memory-model/Documentation/litmus-tests.txt
The remainder of this section uses the SB+fencembonceonces.litmus test
located in the tools/memory-model directory.
To run SB+fencembonceonces.litmus against the memory model:
$ cd $LINUX_SOURCE_TREE/tools/memory-model
$ herd7 -conf linux-kernel.cfg litmus-tests/SB+fencembonceonces.litmus
Here is the corresponding output:
@ -87,7 +109,11 @@ Here is the corresponding output:
The "Positive: 0 Negative: 3" and the "Never 0 3" each indicate that
this litmus test's "exists" clause can not be satisfied.
See "herd7 -help" or "herdtools7/doc/" for more information.
See "herd7 -help" or "herdtools7/doc/" for more information on running the
tool itself, but please be aware that this documentation is intended for
people who work on the memory model itself, that is, people making changes
to the tools/memory-model/linux-kernel.* files. It is not intended for
people focusing on writing, understanding, and running LKMM litmus tests.
=====================
@ -124,7 +150,11 @@ that during two million trials, the state specified in this litmus
test's "exists" clause was not reached.
And, as with "herd7", please see "klitmus7 -help" or "herdtools7/doc/"
for more information.
for more information. And again, please be aware that this documentation
is intended for people who work on the memory model itself, that is,
people making changes to the tools/memory-model/linux-kernel.* files.
It is not intended for people focusing on writing, understanding, and
running LKMM litmus tests.
====================
@ -137,12 +167,21 @@ Documentation/cheatsheet.txt
Documentation/explanation.txt
Describes the memory model in detail.
Documentation/litmus-tests.txt
Describes the format, features, capabilities, and limitations
of the litmus tests that LKMM can evaluate.
Documentation/recipes.txt
Lists common memory-ordering patterns.
Documentation/references.txt
Provides background reading.
Documentation/simple.txt
Starting point for someone new to Linux-kernel concurrency.
And also for those needing a reminder of the simpler approaches
to concurrency!
linux-kernel.bell
Categorizes the relevant instructions, including memory
references, memory barriers, atomic read-modify-write operations,
@ -187,116 +226,3 @@ README
This file.
scripts Various scripts, see scripts/README.
===========
LIMITATIONS
===========
The Linux-kernel memory model (LKMM) has the following limitations:
1. Compiler optimizations are not accurately modeled. Of course,
the use of READ_ONCE() and WRITE_ONCE() limits the compiler's
ability to optimize, but under some circumstances it is possible
for the compiler to undermine the memory model. For more
information, see Documentation/explanation.txt (in particular,
the "THE PROGRAM ORDER RELATION: po AND po-loc" and "A WARNING"
sections).
Note that this limitation in turn limits LKMM's ability to
accurately model address, control, and data dependencies.
For example, if the compiler can deduce the value of some variable
carrying a dependency, then the compiler can break that dependency
by substituting a constant of that value.
2. Multiple access sizes for a single variable are not supported,
and neither are misaligned or partially overlapping accesses.
3. Exceptions and interrupts are not modeled. In some cases,
this limitation can be overcome by modeling the interrupt or
exception with an additional process.
4. I/O such as MMIO or DMA is not supported.
5. Self-modifying code (such as that found in the kernel's
alternatives mechanism, function tracer, Berkeley Packet Filter
JIT compiler, and module loader) is not supported.
6. Complete modeling of all variants of atomic read-modify-write
operations, locking primitives, and RCU is not provided.
For example, call_rcu() and rcu_barrier() are not supported.
However, a substantial amount of support is provided for these
operations, as shown in the linux-kernel.def file.
a. When rcu_assign_pointer() is passed NULL, the Linux
kernel provides no ordering, but LKMM models this
case as a store release.
b. The "unless" RMW operations are not currently modeled:
atomic_long_add_unless(), atomic_inc_unless_negative(),
and atomic_dec_unless_positive(). These can be emulated
in litmus tests, for example, by using atomic_cmpxchg().
One exception of this limitation is atomic_add_unless(),
which is provided directly by herd7 (so no corresponding
definition in linux-kernel.def). atomic_add_unless() is
modeled by herd7 therefore it can be used in litmus tests.
c. The call_rcu() function is not modeled. It can be
emulated in litmus tests by adding another process that
invokes synchronize_rcu() and the body of the callback
function, with (for example) a release-acquire from
the site of the emulated call_rcu() to the beginning
of the additional process.
d. The rcu_barrier() function is not modeled. It can be
emulated in litmus tests emulating call_rcu() via
(for example) a release-acquire from the end of each
additional call_rcu() process to the site of the
emulated rcu-barrier().
e. Although sleepable RCU (SRCU) is now modeled, there
are some subtle differences between its semantics and
those in the Linux kernel. For example, the kernel
might interpret the following sequence as two partially
overlapping SRCU read-side critical sections:
1 r1 = srcu_read_lock(&my_srcu);
2 do_something_1();
3 r2 = srcu_read_lock(&my_srcu);
4 do_something_2();
5 srcu_read_unlock(&my_srcu, r1);
6 do_something_3();
7 srcu_read_unlock(&my_srcu, r2);
In contrast, LKMM will interpret this as a nested pair of
SRCU read-side critical sections, with the outer critical
section spanning lines 1-7 and the inner critical section
spanning lines 3-5.
This difference would be more of a concern had anyone
identified a reasonable use case for partially overlapping
SRCU read-side critical sections. For more information,
please see: https://paulmck.livejournal.com/40593.html
f. Reader-writer locking is not modeled. It can be
emulated in litmus tests using atomic read-modify-write
operations.
The "herd7" tool has some additional limitations of its own, apart from
the memory model:
1. Non-trivial data structures such as arrays or structures are
not supported. However, pointers are supported, allowing trivial
linked lists to be constructed.
2. Dynamic memory allocation is not supported, although this can
be worked around in some cases by supplying multiple statically
allocated variables.
Some of these limitations may be overcome in the future, but others are
more likely to be addressed by incorporating the Linux-kernel memory model
into other tools.
Finally, please note that LKMM is subject to change as hardware, use cases,
and compilers evolve.

View File

@ -528,6 +528,61 @@ static const char *uaccess_safe_builtin[] = {
"__tsan_write4",
"__tsan_write8",
"__tsan_write16",
"__tsan_read_write1",
"__tsan_read_write2",
"__tsan_read_write4",
"__tsan_read_write8",
"__tsan_read_write16",
"__tsan_atomic8_load",
"__tsan_atomic16_load",
"__tsan_atomic32_load",
"__tsan_atomic64_load",
"__tsan_atomic8_store",
"__tsan_atomic16_store",
"__tsan_atomic32_store",
"__tsan_atomic64_store",
"__tsan_atomic8_exchange",
"__tsan_atomic16_exchange",
"__tsan_atomic32_exchange",
"__tsan_atomic64_exchange",
"__tsan_atomic8_fetch_add",
"__tsan_atomic16_fetch_add",
"__tsan_atomic32_fetch_add",
"__tsan_atomic64_fetch_add",
"__tsan_atomic8_fetch_sub",
"__tsan_atomic16_fetch_sub",
"__tsan_atomic32_fetch_sub",
"__tsan_atomic64_fetch_sub",
"__tsan_atomic8_fetch_and",
"__tsan_atomic16_fetch_and",
"__tsan_atomic32_fetch_and",
"__tsan_atomic64_fetch_and",
"__tsan_atomic8_fetch_or",
"__tsan_atomic16_fetch_or",
"__tsan_atomic32_fetch_or",
"__tsan_atomic64_fetch_or",
"__tsan_atomic8_fetch_xor",
"__tsan_atomic16_fetch_xor",
"__tsan_atomic32_fetch_xor",
"__tsan_atomic64_fetch_xor",
"__tsan_atomic8_fetch_nand",
"__tsan_atomic16_fetch_nand",
"__tsan_atomic32_fetch_nand",
"__tsan_atomic64_fetch_nand",
"__tsan_atomic8_compare_exchange_strong",
"__tsan_atomic16_compare_exchange_strong",
"__tsan_atomic32_compare_exchange_strong",
"__tsan_atomic64_compare_exchange_strong",
"__tsan_atomic8_compare_exchange_weak",
"__tsan_atomic16_compare_exchange_weak",
"__tsan_atomic32_compare_exchange_weak",
"__tsan_atomic64_compare_exchange_weak",
"__tsan_atomic8_compare_exchange_val",
"__tsan_atomic16_compare_exchange_val",
"__tsan_atomic32_compare_exchange_val",
"__tsan_atomic64_compare_exchange_val",
"__tsan_atomic_thread_fence",
"__tsan_atomic_signal_fence",
/* KCOV */
"write_comp_data",
"check_kcov_mode",