2019-06-04 02:11:33 -06:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0-only */
|
2019-06-24 17:32:59 -06:00
|
|
|
/* Copyright (C) 2013 Jozsef Kadlecsik <kadlec@netfilter.org> */
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
#ifndef _IP_SET_HASH_GEN_H
|
|
|
|
#define _IP_SET_HASH_GEN_H
|
|
|
|
|
|
|
|
#include <linux/rcupdate.h>
|
|
|
|
#include <linux/jhash.h>
|
2015-06-13 09:29:56 -06:00
|
|
|
#include <linux/types.h>
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
#include <linux/netfilter/nfnetlink.h>
|
2019-08-07 08:16:58 -06:00
|
|
|
#include <linux/netfilter/ipset/ip_set.h>
|
2015-06-13 09:29:56 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
#define __ipset_dereference(p) \
|
|
|
|
rcu_dereference_protected(p, 1)
|
|
|
|
#define ipset_dereference_nfnl(p) \
|
|
|
|
rcu_dereference_protected(p, \
|
|
|
|
lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET))
|
|
|
|
#define ipset_dereference_set(p, set) \
|
|
|
|
rcu_dereference_protected(p, \
|
|
|
|
lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET) || \
|
|
|
|
lockdep_is_held(&(set)->lock))
|
|
|
|
#define ipset_dereference_bh_nfnl(p) \
|
|
|
|
rcu_dereference_bh_check(p, \
|
|
|
|
lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET))
|
2013-04-30 13:23:18 -06:00
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Hashing which uses arrays to resolve clashing. The hash table is resized
|
|
|
|
* (doubled) when searching becomes too long.
|
|
|
|
* Internally jhash is used with the assumption that the size of the
|
2015-06-13 09:29:56 -06:00
|
|
|
* stored data is a multiple of sizeof(u32).
|
2013-04-08 13:05:44 -06:00
|
|
|
*
|
|
|
|
* Readers and resizing
|
|
|
|
*
|
|
|
|
* Resizing can be triggered by userspace command only, and those
|
|
|
|
* are serialized by the nfnl mutex. During resizing the set is
|
|
|
|
* read-locked, so the only possible concurrent operations are
|
|
|
|
* the kernel side readers. Those must be protected by proper RCU locking.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Number of elements to store in an initial array block */
|
|
|
|
#define AHASH_INIT_SIZE 4
|
|
|
|
/* Max number of elements to store in an array block */
|
2015-06-13 11:45:33 -06:00
|
|
|
#define AHASH_MAX_SIZE (3 * AHASH_INIT_SIZE)
|
2015-06-13 09:29:56 -06:00
|
|
|
/* Max muber of elements in the array block when tuned */
|
|
|
|
#define AHASH_MAX_TUNED 64
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
/* Max number of elements can be tuned */
|
|
|
|
#ifdef IP_SET_HASH_WITH_MULTI
|
|
|
|
#define AHASH_MAX(h) ((h)->ahash_max)
|
|
|
|
|
|
|
|
static inline u8
|
|
|
|
tune_ahash_max(u8 curr, u32 multi)
|
|
|
|
{
|
|
|
|
u32 n;
|
|
|
|
|
|
|
|
if (multi < curr)
|
|
|
|
return curr;
|
|
|
|
|
|
|
|
n = curr + AHASH_INIT_SIZE;
|
|
|
|
/* Currently, at listing one hash bucket must fit into a message.
|
|
|
|
* Therefore we have a hard limit here.
|
|
|
|
*/
|
2015-06-13 09:29:56 -06:00
|
|
|
return n > curr && n <= AHASH_MAX_TUNED ? n : curr;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
2015-06-13 11:45:33 -06:00
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
#define TUNE_AHASH_MAX(h, multi) \
|
|
|
|
((h)->ahash_max = tune_ahash_max((h)->ahash_max, multi))
|
|
|
|
#else
|
|
|
|
#define AHASH_MAX(h) AHASH_MAX_SIZE
|
|
|
|
#define TUNE_AHASH_MAX(h, multi)
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* A hash bucket */
|
|
|
|
struct hbucket {
|
2018-11-11 12:43:59 -07:00
|
|
|
struct rcu_head rcu; /* for call_rcu */
|
2015-06-13 09:29:56 -06:00
|
|
|
/* Which positions are used in the array */
|
|
|
|
DECLARE_BITMAP(used, AHASH_MAX_TUNED);
|
2013-04-08 13:05:44 -06:00
|
|
|
u8 size; /* size of the array */
|
|
|
|
u8 pos; /* position of the first free entry */
|
2015-11-07 03:21:47 -07:00
|
|
|
unsigned char value[0] /* the array of the values */
|
|
|
|
__aligned(__alignof__(u64));
|
|
|
|
};
|
2013-04-08 13:05:44 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
/* Region size for locking == 2^HTABLE_REGION_BITS */
|
|
|
|
#define HTABLE_REGION_BITS 10
|
|
|
|
#define ahash_numof_locks(htable_bits) \
|
|
|
|
((htable_bits) < HTABLE_REGION_BITS ? 1 \
|
|
|
|
: jhash_size((htable_bits) - HTABLE_REGION_BITS))
|
|
|
|
#define ahash_sizeof_regions(htable_bits) \
|
|
|
|
(ahash_numof_locks(htable_bits) * sizeof(struct ip_set_region))
|
|
|
|
#define ahash_region(n, htable_bits) \
|
|
|
|
((n) % ahash_numof_locks(htable_bits))
|
|
|
|
#define ahash_bucket_start(h, htable_bits) \
|
|
|
|
((htable_bits) < HTABLE_REGION_BITS ? 0 \
|
|
|
|
: (h) * jhash_size(HTABLE_REGION_BITS))
|
|
|
|
#define ahash_bucket_end(h, htable_bits) \
|
|
|
|
((htable_bits) < HTABLE_REGION_BITS ? jhash_size(htable_bits) \
|
|
|
|
: ((h) + 1) * jhash_size(HTABLE_REGION_BITS))
|
|
|
|
|
|
|
|
struct htable_gc {
|
|
|
|
struct delayed_work dwork;
|
|
|
|
struct ip_set *set; /* Set the gc belongs to */
|
|
|
|
u32 region; /* Last gc run position */
|
|
|
|
};
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
/* The hash table: the table size stored here in order to make resizing easy */
|
|
|
|
struct htable {
|
2015-06-13 03:59:45 -06:00
|
|
|
atomic_t ref; /* References for resizing */
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
atomic_t uref; /* References for dumping and gc */
|
2013-04-08 13:05:44 -06:00
|
|
|
u8 htable_bits; /* size of hash table == 2^htable_bits */
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
u32 maxelem; /* Maxelem per region */
|
|
|
|
struct ip_set_region *hregion; /* Region locks and ext sizes */
|
2015-06-13 09:29:56 -06:00
|
|
|
struct hbucket __rcu *bucket[0]; /* hashtable buckets */
|
2013-04-08 13:05:44 -06:00
|
|
|
};
|
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
#define hbucket(h, i) ((h)->bucket[i])
|
2015-11-04 01:44:29 -07:00
|
|
|
#define ext_size(n, dsize) \
|
|
|
|
(sizeof(struct hbucket) + (n) * (dsize))
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2013-09-30 01:05:54 -06:00
|
|
|
#ifndef IPSET_NET_COUNT
|
|
|
|
#define IPSET_NET_COUNT 1
|
|
|
|
#endif
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Book-keeping of the prefixes added to the set */
|
|
|
|
struct net_prefixes {
|
2015-06-13 09:29:56 -06:00
|
|
|
u32 nets[IPSET_NET_COUNT]; /* number of elements for this cidr */
|
|
|
|
u8 cidr[IPSET_NET_COUNT]; /* the cidr value */
|
2013-04-08 13:05:44 -06:00
|
|
|
};
|
|
|
|
|
|
|
|
/* Compute the hash table size */
|
|
|
|
static size_t
|
|
|
|
htable_size(u8 hbits)
|
|
|
|
{
|
|
|
|
size_t hsize;
|
|
|
|
|
|
|
|
/* We must fit both into u32 in jhash and size_t */
|
|
|
|
if (hbits > 31)
|
|
|
|
return 0;
|
|
|
|
hsize = jhash_size(hbits);
|
2015-06-13 09:29:56 -06:00
|
|
|
if ((((size_t)-1) - sizeof(struct htable)) / sizeof(struct hbucket *)
|
2013-04-08 13:05:44 -06:00
|
|
|
< hsize)
|
|
|
|
return 0;
|
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
return hsize * sizeof(struct hbucket *) + sizeof(struct htable);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
2013-09-20 02:13:53 -06:00
|
|
|
#if IPSET_NET_COUNT > 1
|
|
|
|
#define __CIDR(cidr, i) (cidr[i])
|
|
|
|
#else
|
|
|
|
#define __CIDR(cidr, i) (cidr)
|
|
|
|
#endif
|
2014-11-30 11:56:55 -07:00
|
|
|
|
|
|
|
/* cidr + 1 is stored in net_prefixes to support /0 */
|
2015-06-12 14:11:00 -06:00
|
|
|
#define NCIDR_PUT(cidr) ((cidr) + 1)
|
|
|
|
#define NCIDR_GET(cidr) ((cidr) - 1)
|
2014-11-30 11:56:55 -07:00
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS_PACKED
|
2014-11-30 11:56:55 -07:00
|
|
|
/* When cidr is packed with nomatch, cidr - 1 is stored in the data entry */
|
2015-06-12 14:11:00 -06:00
|
|
|
#define DCIDR_PUT(cidr) ((cidr) - 1)
|
|
|
|
#define DCIDR_GET(cidr, i) (__CIDR(cidr, i) + 1)
|
2013-04-08 13:05:44 -06:00
|
|
|
#else
|
2015-06-12 14:11:00 -06:00
|
|
|
#define DCIDR_PUT(cidr) (cidr)
|
|
|
|
#define DCIDR_GET(cidr, i) __CIDR(cidr, i)
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
|
|
|
|
2015-06-12 14:11:00 -06:00
|
|
|
#define INIT_CIDR(cidr, host_mask) \
|
|
|
|
DCIDR_PUT(((cidr) ? NCIDR_GET(cidr) : host_mask))
|
|
|
|
|
2014-11-30 11:56:54 -07:00
|
|
|
#ifdef IP_SET_HASH_WITH_NET0
|
2016-11-10 04:24:10 -07:00
|
|
|
/* cidr from 0 to HOST_MASK value and c = cidr + 1 */
|
|
|
|
#define NLEN (HOST_MASK + 1)
|
2015-08-25 03:17:51 -06:00
|
|
|
#define CIDR_POS(c) ((c) - 1)
|
2013-04-08 13:05:44 -06:00
|
|
|
#else
|
2016-11-10 04:24:10 -07:00
|
|
|
/* cidr from 1 to HOST_MASK value and c = cidr + 1 */
|
|
|
|
#define NLEN HOST_MASK
|
2015-08-25 03:17:51 -06:00
|
|
|
#define CIDR_POS(c) ((c) - 2)
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
|
|
|
|
|
|
|
#else
|
2016-11-10 04:24:10 -07:00
|
|
|
#define NLEN 0
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif /* IP_SET_HASH_WITH_NETS */
|
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
#define SET_ELEM_EXPIRED(set, d) \
|
|
|
|
(SET_WITH_TIMEOUT(set) && \
|
|
|
|
ip_set_timeout_expired(ext_timeout(d, set)))
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif /* _IP_SET_HASH_GEN_H */
|
|
|
|
|
2015-06-26 07:13:18 -06:00
|
|
|
#ifndef MTYPE
|
|
|
|
#error "MTYPE is not defined!"
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifndef HTYPE
|
|
|
|
#error "HTYPE is not defined!"
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifndef HOST_MASK
|
|
|
|
#error "HOST_MASK is not defined!"
|
|
|
|
#endif
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Family dependent templates */
|
|
|
|
|
|
|
|
#undef ahash_data
|
|
|
|
#undef mtype_data_equal
|
|
|
|
#undef mtype_do_data_match
|
|
|
|
#undef mtype_data_set_flags
|
2015-05-02 11:28:06 -06:00
|
|
|
#undef mtype_data_reset_elem
|
2013-04-08 13:05:44 -06:00
|
|
|
#undef mtype_data_reset_flags
|
|
|
|
#undef mtype_data_netmask
|
|
|
|
#undef mtype_data_list
|
|
|
|
#undef mtype_data_next
|
|
|
|
#undef mtype_elem
|
|
|
|
|
2013-09-09 06:44:29 -06:00
|
|
|
#undef mtype_ahash_destroy
|
|
|
|
#undef mtype_ext_cleanup
|
2013-04-08 13:05:44 -06:00
|
|
|
#undef mtype_add_cidr
|
|
|
|
#undef mtype_del_cidr
|
|
|
|
#undef mtype_ahash_memsize
|
|
|
|
#undef mtype_flush
|
|
|
|
#undef mtype_destroy
|
|
|
|
#undef mtype_same_set
|
|
|
|
#undef mtype_kadt
|
|
|
|
#undef mtype_uadt
|
|
|
|
|
|
|
|
#undef mtype_add
|
|
|
|
#undef mtype_del
|
|
|
|
#undef mtype_test_cidrs
|
|
|
|
#undef mtype_test
|
2015-06-13 03:59:45 -06:00
|
|
|
#undef mtype_uref
|
2013-04-08 13:05:44 -06:00
|
|
|
#undef mtype_resize
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
#undef mtype_ext_size
|
|
|
|
#undef mtype_resize_ad
|
2013-04-08 13:05:44 -06:00
|
|
|
#undef mtype_head
|
|
|
|
#undef mtype_list
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
#undef mtype_gc_do
|
2013-04-08 13:05:44 -06:00
|
|
|
#undef mtype_gc
|
|
|
|
#undef mtype_gc_init
|
|
|
|
#undef mtype_variant
|
|
|
|
#undef mtype_data_match
|
|
|
|
|
2015-06-26 07:13:18 -06:00
|
|
|
#undef htype
|
2013-04-08 13:05:44 -06:00
|
|
|
#undef HKEY
|
|
|
|
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_data_equal IPSET_TOKEN(MTYPE, _data_equal)
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_do_data_match IPSET_TOKEN(MTYPE, _do_data_match)
|
2013-04-08 13:05:44 -06:00
|
|
|
#else
|
|
|
|
#define mtype_do_data_match(d) 1
|
|
|
|
#endif
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_data_set_flags IPSET_TOKEN(MTYPE, _data_set_flags)
|
2013-09-20 02:13:53 -06:00
|
|
|
#define mtype_data_reset_elem IPSET_TOKEN(MTYPE, _data_reset_elem)
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_data_reset_flags IPSET_TOKEN(MTYPE, _data_reset_flags)
|
|
|
|
#define mtype_data_netmask IPSET_TOKEN(MTYPE, _data_netmask)
|
|
|
|
#define mtype_data_list IPSET_TOKEN(MTYPE, _data_list)
|
|
|
|
#define mtype_data_next IPSET_TOKEN(MTYPE, _data_next)
|
|
|
|
#define mtype_elem IPSET_TOKEN(MTYPE, _elem)
|
2015-05-02 11:28:06 -06:00
|
|
|
|
2013-09-09 06:44:29 -06:00
|
|
|
#define mtype_ahash_destroy IPSET_TOKEN(MTYPE, _ahash_destroy)
|
|
|
|
#define mtype_ext_cleanup IPSET_TOKEN(MTYPE, _ext_cleanup)
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_add_cidr IPSET_TOKEN(MTYPE, _add_cidr)
|
|
|
|
#define mtype_del_cidr IPSET_TOKEN(MTYPE, _del_cidr)
|
|
|
|
#define mtype_ahash_memsize IPSET_TOKEN(MTYPE, _ahash_memsize)
|
|
|
|
#define mtype_flush IPSET_TOKEN(MTYPE, _flush)
|
|
|
|
#define mtype_destroy IPSET_TOKEN(MTYPE, _destroy)
|
|
|
|
#define mtype_same_set IPSET_TOKEN(MTYPE, _same_set)
|
|
|
|
#define mtype_kadt IPSET_TOKEN(MTYPE, _kadt)
|
|
|
|
#define mtype_uadt IPSET_TOKEN(MTYPE, _uadt)
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_add IPSET_TOKEN(MTYPE, _add)
|
|
|
|
#define mtype_del IPSET_TOKEN(MTYPE, _del)
|
|
|
|
#define mtype_test_cidrs IPSET_TOKEN(MTYPE, _test_cidrs)
|
|
|
|
#define mtype_test IPSET_TOKEN(MTYPE, _test)
|
2015-06-13 03:59:45 -06:00
|
|
|
#define mtype_uref IPSET_TOKEN(MTYPE, _uref)
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_resize IPSET_TOKEN(MTYPE, _resize)
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
#define mtype_ext_size IPSET_TOKEN(MTYPE, _ext_size)
|
|
|
|
#define mtype_resize_ad IPSET_TOKEN(MTYPE, _resize_ad)
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_head IPSET_TOKEN(MTYPE, _head)
|
|
|
|
#define mtype_list IPSET_TOKEN(MTYPE, _list)
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
#define mtype_gc_do IPSET_TOKEN(MTYPE, _gc_do)
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_gc IPSET_TOKEN(MTYPE, _gc)
|
2015-05-02 11:28:06 -06:00
|
|
|
#define mtype_gc_init IPSET_TOKEN(MTYPE, _gc_init)
|
2013-04-30 15:02:43 -06:00
|
|
|
#define mtype_variant IPSET_TOKEN(MTYPE, _variant)
|
|
|
|
#define mtype_data_match IPSET_TOKEN(MTYPE, _data_match)
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
#ifndef HKEY_DATALEN
|
|
|
|
#define HKEY_DATALEN sizeof(struct mtype_elem)
|
|
|
|
#endif
|
|
|
|
|
2015-06-26 07:13:18 -06:00
|
|
|
#define htype MTYPE
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
#define HKEY(data, initval, htable_bits) \
|
2015-06-26 03:16:28 -06:00
|
|
|
({ \
|
|
|
|
const u32 *__k = (const u32 *)data; \
|
|
|
|
u32 __l = HKEY_DATALEN / sizeof(u32); \
|
|
|
|
\
|
|
|
|
BUILD_BUG_ON(HKEY_DATALEN % sizeof(u32) != 0); \
|
|
|
|
\
|
|
|
|
jhash2(__k, __l, initval) & jhash_mask(htable_bits); \
|
|
|
|
})
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
/* The generic hash structure */
|
|
|
|
struct htype {
|
2013-04-30 13:23:18 -06:00
|
|
|
struct htable __rcu *table; /* the hash table */
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
struct htable_gc gc; /* gc workqueue */
|
2013-04-08 13:05:44 -06:00
|
|
|
u32 maxelem; /* max elements in the hash */
|
|
|
|
u32 initval; /* random jhash init value */
|
2013-12-17 07:01:44 -07:00
|
|
|
#ifdef IP_SET_HASH_WITH_MARKMASK
|
|
|
|
u32 markmask; /* markmask value for mark mask to store */
|
|
|
|
#endif
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_MULTI
|
|
|
|
u8 ahash_max; /* max elements in an array block */
|
|
|
|
#endif
|
|
|
|
#ifdef IP_SET_HASH_WITH_NETMASK
|
|
|
|
u8 netmask; /* netmask value for subnets to store */
|
|
|
|
#endif
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
struct list_head ad; /* Resize add|del backlist */
|
2015-06-26 07:13:18 -06:00
|
|
|
struct mtype_elem next; /* temporary storage for uadd */
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
2015-06-26 07:13:18 -06:00
|
|
|
struct net_prefixes nets[NLEN]; /* book-keeping of prefixes */
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
|
|
|
};
|
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
/* ADD|DEL entries saved during resize */
|
|
|
|
struct mtype_resize_ad {
|
|
|
|
struct list_head list;
|
|
|
|
enum ipset_adt ad; /* ADD|DEL element */
|
|
|
|
struct mtype_elem d; /* Element value */
|
|
|
|
struct ip_set_ext ext; /* Extensions for ADD */
|
|
|
|
struct ip_set_ext mext; /* Target extensions for ADD */
|
|
|
|
u32 flags; /* Flags for ADD */
|
|
|
|
};
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
|
|
|
/* Network cidr size book keeping when the hash stores different
|
2015-06-12 14:11:00 -06:00
|
|
|
* sized networks. cidr == real cidr + 1 to support /0.
|
|
|
|
*/
|
2013-04-08 13:05:44 -06:00
|
|
|
static void
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_add_cidr(struct ip_set *set, struct htype *h, u8 cidr, u8 n)
|
2013-04-08 13:05:44 -06:00
|
|
|
{
|
|
|
|
int i, j;
|
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
spin_lock_bh(&set->lock);
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Add in increasing prefix order, so larger cidr first */
|
2016-11-10 04:24:10 -07:00
|
|
|
for (i = 0, j = -1; i < NLEN && h->nets[i].cidr[n]; i++) {
|
2015-06-13 11:45:33 -06:00
|
|
|
if (j != -1) {
|
2013-04-08 13:05:44 -06:00
|
|
|
continue;
|
2015-06-13 11:45:33 -06:00
|
|
|
} else if (h->nets[i].cidr[n] < cidr) {
|
2013-04-08 13:05:44 -06:00
|
|
|
j = i;
|
2015-06-13 11:45:33 -06:00
|
|
|
} else if (h->nets[i].cidr[n] == cidr) {
|
2015-08-25 03:17:51 -06:00
|
|
|
h->nets[CIDR_POS(cidr)].nets[n]++;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
goto unlock;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
}
|
|
|
|
if (j != -1) {
|
2014-11-30 11:56:55 -07:00
|
|
|
for (; i > j; i--)
|
2013-09-30 01:05:54 -06:00
|
|
|
h->nets[i].cidr[n] = h->nets[i - 1].cidr[n];
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
2013-09-30 01:05:54 -06:00
|
|
|
h->nets[i].cidr[n] = cidr;
|
2015-08-25 03:17:51 -06:00
|
|
|
h->nets[CIDR_POS(cidr)].nets[n] = 1;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
unlock:
|
|
|
|
spin_unlock_bh(&set->lock);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_del_cidr(struct ip_set *set, struct htype *h, u8 cidr, u8 n)
|
2013-04-08 13:05:44 -06:00
|
|
|
{
|
2016-11-10 04:24:10 -07:00
|
|
|
u8 i, j, net_end = NLEN - 1;
|
2013-09-16 12:30:57 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
spin_lock_bh(&set->lock);
|
2016-11-10 04:24:10 -07:00
|
|
|
for (i = 0; i < NLEN; i++) {
|
2015-06-13 11:45:33 -06:00
|
|
|
if (h->nets[i].cidr[n] != cidr)
|
|
|
|
continue;
|
2015-08-25 03:17:51 -06:00
|
|
|
h->nets[CIDR_POS(cidr)].nets[n]--;
|
|
|
|
if (h->nets[CIDR_POS(cidr)].nets[n] > 0)
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
goto unlock;
|
2014-11-30 11:56:55 -07:00
|
|
|
for (j = i; j < net_end && h->nets[j].cidr[n]; j++)
|
2015-06-13 11:45:33 -06:00
|
|
|
h->nets[j].cidr[n] = h->nets[j + 1].cidr[n];
|
2014-11-30 11:56:55 -07:00
|
|
|
h->nets[j].cidr[n] = 0;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
goto unlock;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
unlock:
|
|
|
|
spin_unlock_bh(&set->lock);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Calculate the actual memory size of the set data */
|
|
|
|
static size_t
|
2016-11-10 04:24:10 -07:00
|
|
|
mtype_ahash_memsize(const struct htype *h, const struct htable *t)
|
2013-04-08 13:05:44 -06:00
|
|
|
{
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
return sizeof(*h) + sizeof(*t) + ahash_sizeof_regions(t->htable_bits);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
2013-09-09 06:44:29 -06:00
|
|
|
/* Get the ith element from the array block n */
|
|
|
|
#define ahash_data(n, i, dsize) \
|
|
|
|
((struct mtype_elem *)((n)->value + ((i) * (dsize))))
|
|
|
|
|
|
|
|
static void
|
|
|
|
mtype_ext_cleanup(struct ip_set *set, struct hbucket *n)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < n->pos; i++)
|
2015-06-13 09:29:56 -06:00
|
|
|
if (test_bit(i, n->used))
|
|
|
|
ip_set_ext_destroy(set, ahash_data(n, i, set->dsize));
|
2013-09-09 06:44:29 -06:00
|
|
|
}
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Flush a hash type of set: destroy all elements */
|
|
|
|
static void
|
|
|
|
mtype_flush(struct ip_set *set)
|
|
|
|
{
|
|
|
|
struct htype *h = set->data;
|
2013-04-30 13:23:18 -06:00
|
|
|
struct htable *t;
|
2013-04-08 13:05:44 -06:00
|
|
|
struct hbucket *n;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
u32 r, i;
|
|
|
|
|
|
|
|
t = ipset_dereference_nfnl(h->table);
|
|
|
|
for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) {
|
|
|
|
spin_lock_bh(&t->hregion[r].lock);
|
|
|
|
for (i = ahash_bucket_start(r, t->htable_bits);
|
|
|
|
i < ahash_bucket_end(r, t->htable_bits); i++) {
|
|
|
|
n = __ipset_dereference(hbucket(t, i));
|
|
|
|
if (!n)
|
|
|
|
continue;
|
|
|
|
if (set->extensions & IPSET_EXT_DESTROY)
|
|
|
|
mtype_ext_cleanup(set, n);
|
|
|
|
/* FIXME: use slab cache */
|
|
|
|
rcu_assign_pointer(hbucket(t, i), NULL);
|
|
|
|
kfree_rcu(n, rcu);
|
|
|
|
}
|
|
|
|
t->hregion[r].ext_size = 0;
|
|
|
|
t->hregion[r].elements = 0;
|
|
|
|
spin_unlock_bh(&t->hregion[r].lock);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
2015-06-26 07:13:18 -06:00
|
|
|
memset(h->nets, 0, sizeof(h->nets));
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2013-09-09 06:44:29 -06:00
|
|
|
/* Destroy the hashtable part of the set */
|
|
|
|
static void
|
2013-09-23 09:45:21 -06:00
|
|
|
mtype_ahash_destroy(struct ip_set *set, struct htable *t, bool ext_destroy)
|
2013-09-09 06:44:29 -06:00
|
|
|
{
|
|
|
|
struct hbucket *n;
|
|
|
|
u32 i;
|
|
|
|
|
|
|
|
for (i = 0; i < jhash_size(t->htable_bits); i++) {
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
n = __ipset_dereference(hbucket(t, i));
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!n)
|
|
|
|
continue;
|
|
|
|
if (set->extensions & IPSET_EXT_DESTROY && ext_destroy)
|
|
|
|
mtype_ext_cleanup(set, n);
|
|
|
|
/* FIXME: use slab cache */
|
|
|
|
kfree(n);
|
2013-09-09 06:44:29 -06:00
|
|
|
}
|
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
ip_set_free(t->hregion);
|
2013-09-09 06:44:29 -06:00
|
|
|
ip_set_free(t);
|
|
|
|
}
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Destroy a hash type of set */
|
|
|
|
static void
|
|
|
|
mtype_destroy(struct ip_set *set)
|
|
|
|
{
|
|
|
|
struct htype *h = set->data;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
struct list_head *l, *lt;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2015-06-12 13:11:54 -06:00
|
|
|
if (SET_WITH_TIMEOUT(set))
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
cancel_delayed_work_sync(&h->gc.dwork);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true);
|
|
|
|
list_for_each_safe(l, lt, &h->ad) {
|
|
|
|
list_del(l);
|
|
|
|
kfree(l);
|
|
|
|
}
|
2013-04-08 13:05:44 -06:00
|
|
|
kfree(h);
|
|
|
|
|
|
|
|
set->data = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool
|
|
|
|
mtype_same_set(const struct ip_set *a, const struct ip_set *b)
|
|
|
|
{
|
|
|
|
const struct htype *x = a->data;
|
|
|
|
const struct htype *y = b->data;
|
|
|
|
|
|
|
|
/* Resizing changes htable_bits, so we ignore it */
|
|
|
|
return x->maxelem == y->maxelem &&
|
2013-09-06 16:10:07 -06:00
|
|
|
a->timeout == b->timeout &&
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETMASK
|
|
|
|
x->netmask == y->netmask &&
|
2013-12-17 07:01:44 -07:00
|
|
|
#endif
|
|
|
|
#ifdef IP_SET_HASH_WITH_MARKMASK
|
|
|
|
x->markmask == y->markmask &&
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
|
|
|
a->extensions == b->extensions;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_gc_do(struct ip_set *set, struct htype *h, struct htable *t, u32 r)
|
2013-04-08 13:05:44 -06:00
|
|
|
{
|
2015-11-07 03:24:51 -07:00
|
|
|
struct hbucket *n, *tmp;
|
2013-04-08 13:05:44 -06:00
|
|
|
struct mtype_elem *data;
|
2015-06-13 09:29:56 -06:00
|
|
|
u32 i, j, d;
|
2016-11-10 04:12:25 -07:00
|
|
|
size_t dsize = set->dsize;
|
2013-09-20 02:13:53 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
2016-11-10 04:24:10 -07:00
|
|
|
u8 k;
|
2013-09-20 02:13:53 -06:00
|
|
|
#endif
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
u8 htable_bits = t->htable_bits;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
spin_lock_bh(&t->hregion[r].lock);
|
|
|
|
for (i = ahash_bucket_start(r, htable_bits);
|
|
|
|
i < ahash_bucket_end(r, htable_bits); i++) {
|
|
|
|
n = __ipset_dereference(hbucket(t, i));
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!n)
|
|
|
|
continue;
|
|
|
|
for (j = 0, d = 0; j < n->pos; j++) {
|
|
|
|
if (!test_bit(j, n->used)) {
|
|
|
|
d++;
|
|
|
|
continue;
|
|
|
|
}
|
2013-04-08 13:05:44 -06:00
|
|
|
data = ahash_data(n, j, dsize);
|
2016-11-10 04:18:06 -07:00
|
|
|
if (!ip_set_timeout_expired(ext_timeout(data, set)))
|
|
|
|
continue;
|
|
|
|
pr_debug("expired %u/%u\n", i, j);
|
|
|
|
clear_bit(j, n->used);
|
|
|
|
smp_mb__after_atomic();
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
2016-11-10 04:18:06 -07:00
|
|
|
for (k = 0; k < IPSET_NET_COUNT; k++)
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_del_cidr(set, h,
|
2016-11-10 04:18:06 -07:00
|
|
|
NCIDR_PUT(DCIDR_GET(data->cidr, k)),
|
2016-11-10 04:24:10 -07:00
|
|
|
k);
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].elements--;
|
2016-11-10 04:18:06 -07:00
|
|
|
ip_set_ext_destroy(set, data);
|
|
|
|
d++;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
2015-06-13 09:29:56 -06:00
|
|
|
if (d >= AHASH_INIT_SIZE) {
|
2015-11-07 03:24:51 -07:00
|
|
|
if (d >= n->size) {
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].ext_size -=
|
|
|
|
ext_size(n->size, dsize);
|
2015-11-07 03:24:51 -07:00
|
|
|
rcu_assign_pointer(hbucket(t, i), NULL);
|
|
|
|
kfree_rcu(n, rcu);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
tmp = kzalloc(sizeof(*tmp) +
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
(n->size - AHASH_INIT_SIZE) * dsize,
|
|
|
|
GFP_ATOMIC);
|
2013-04-08 13:05:44 -06:00
|
|
|
if (!tmp)
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
/* Still try to delete expired elements. */
|
2013-04-08 13:05:44 -06:00
|
|
|
continue;
|
2015-06-13 09:29:56 -06:00
|
|
|
tmp->size = n->size - AHASH_INIT_SIZE;
|
|
|
|
for (j = 0, d = 0; j < n->pos; j++) {
|
|
|
|
if (!test_bit(j, n->used))
|
|
|
|
continue;
|
|
|
|
data = ahash_data(n, j, dsize);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
memcpy(tmp->value + d * dsize,
|
|
|
|
data, dsize);
|
2015-11-07 03:23:34 -07:00
|
|
|
set_bit(d, tmp->used);
|
2015-06-13 09:29:56 -06:00
|
|
|
d++;
|
|
|
|
}
|
|
|
|
tmp->pos = d;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].ext_size -=
|
|
|
|
ext_size(AHASH_INIT_SIZE, dsize);
|
2015-06-13 09:29:56 -06:00
|
|
|
rcu_assign_pointer(hbucket(t, i), tmp);
|
|
|
|
kfree_rcu(n, rcu);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
spin_unlock_bh(&t->hregion[r].lock);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_gc(struct work_struct *work)
|
2013-04-08 13:05:44 -06:00
|
|
|
{
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
struct htable_gc *gc;
|
|
|
|
struct ip_set *set;
|
|
|
|
struct htype *h;
|
|
|
|
struct htable *t;
|
|
|
|
u32 r, numof_locks;
|
|
|
|
unsigned int next_run;
|
|
|
|
|
|
|
|
gc = container_of(work, struct htable_gc, dwork.work);
|
|
|
|
set = gc->set;
|
|
|
|
h = set->data;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
spin_lock_bh(&set->lock);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t = ipset_dereference_set(h->table, set);
|
|
|
|
atomic_inc(&t->uref);
|
|
|
|
numof_locks = ahash_numof_locks(t->htable_bits);
|
|
|
|
r = gc->region++;
|
|
|
|
if (r >= numof_locks) {
|
|
|
|
r = gc->region = 0;
|
|
|
|
}
|
|
|
|
next_run = (IPSET_GC_PERIOD(set->timeout) * HZ) / numof_locks;
|
|
|
|
if (next_run < HZ/10)
|
|
|
|
next_run = HZ/10;
|
2015-06-13 09:29:56 -06:00
|
|
|
spin_unlock_bh(&set->lock);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_gc_do(set, h, t, r);
|
|
|
|
|
|
|
|
if (atomic_dec_and_test(&t->uref) && atomic_read(&t->ref)) {
|
|
|
|
pr_debug("Table destroy after resize by expire: %p\n", t);
|
|
|
|
mtype_ahash_destroy(set, t, false);
|
|
|
|
}
|
|
|
|
|
|
|
|
queue_delayed_work(system_power_efficient_wq, &gc->dwork, next_run);
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
mtype_gc_init(struct htable_gc *gc)
|
|
|
|
{
|
|
|
|
INIT_DEFERRABLE_WORK(&gc->dwork, mtype_gc);
|
|
|
|
queue_delayed_work(system_power_efficient_wq, &gc->dwork, HZ);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
static int
|
|
|
|
mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
|
|
|
|
struct ip_set_ext *mext, u32 flags);
|
|
|
|
static int
|
|
|
|
mtype_del(struct ip_set *set, void *value, const struct ip_set_ext *ext,
|
|
|
|
struct ip_set_ext *mext, u32 flags);
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Resize a hash: create a new hash table with doubling the hashsize
|
|
|
|
* and inserting the elements to it. Repeat until we succeed or
|
2015-06-13 11:45:33 -06:00
|
|
|
* fail due to memory pressures.
|
|
|
|
*/
|
2013-04-08 13:05:44 -06:00
|
|
|
static int
|
|
|
|
mtype_resize(struct ip_set *set, bool retried)
|
|
|
|
{
|
|
|
|
struct htype *h = set->data;
|
2015-06-13 09:29:56 -06:00
|
|
|
struct htable *t, *orig;
|
|
|
|
u8 htable_bits;
|
2020-12-17 01:53:40 -07:00
|
|
|
size_t hsize, dsize = set->dsize;
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
|
|
|
u8 flags;
|
2015-06-13 09:29:56 -06:00
|
|
|
struct mtype_elem *tmp;
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
|
|
|
struct mtype_elem *data;
|
|
|
|
struct mtype_elem *d;
|
|
|
|
struct hbucket *n, *m;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
struct list_head *l, *lt;
|
|
|
|
struct mtype_resize_ad *x;
|
|
|
|
u32 i, j, r, nr, key;
|
2013-04-08 13:05:44 -06:00
|
|
|
int ret;
|
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
|
|
|
tmp = kmalloc(dsize, GFP_KERNEL);
|
|
|
|
if (!tmp)
|
|
|
|
return -ENOMEM;
|
|
|
|
#endif
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
orig = ipset_dereference_bh_nfnl(h->table);
|
2015-06-13 09:29:56 -06:00
|
|
|
htable_bits = orig->htable_bits;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
retry:
|
|
|
|
ret = 0;
|
|
|
|
htable_bits++;
|
2020-12-17 01:53:40 -07:00
|
|
|
if (!htable_bits)
|
|
|
|
goto hbwarn;
|
|
|
|
hsize = htable_size(htable_bits);
|
|
|
|
if (!hsize)
|
|
|
|
goto hbwarn;
|
|
|
|
t = ip_set_alloc(hsize);
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!t) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits));
|
|
|
|
if (!t->hregion) {
|
2020-06-29 18:04:17 -06:00
|
|
|
ip_set_free(t);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out;
|
|
|
|
}
|
2013-04-08 13:05:44 -06:00
|
|
|
t->htable_bits = htable_bits;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->maxelem = h->maxelem / ahash_numof_locks(htable_bits);
|
|
|
|
for (i = 0; i < ahash_numof_locks(htable_bits); i++)
|
|
|
|
spin_lock_init(&t->hregion[i].lock);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
/* There can't be another parallel resizing,
|
|
|
|
* but dumping, gc, kernel side add/del are possible
|
|
|
|
*/
|
|
|
|
orig = ipset_dereference_bh_nfnl(h->table);
|
2015-06-13 03:59:45 -06:00
|
|
|
atomic_set(&orig->ref, 1);
|
|
|
|
atomic_inc(&orig->uref);
|
2015-06-13 09:29:56 -06:00
|
|
|
pr_debug("attempt to resize set %s from %u to %u, t %p\n",
|
|
|
|
set->name, orig->htable_bits, htable_bits, orig);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
for (r = 0; r < ahash_numof_locks(orig->htable_bits); r++) {
|
|
|
|
/* Expire may replace a hbucket with another one */
|
|
|
|
rcu_read_lock_bh();
|
|
|
|
for (i = ahash_bucket_start(r, orig->htable_bits);
|
|
|
|
i < ahash_bucket_end(r, orig->htable_bits); i++) {
|
|
|
|
n = __ipset_dereference(hbucket(orig, i));
|
|
|
|
if (!n)
|
2015-06-13 09:29:56 -06:00
|
|
|
continue;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
for (j = 0; j < n->pos; j++) {
|
|
|
|
if (!test_bit(j, n->used))
|
|
|
|
continue;
|
|
|
|
data = ahash_data(n, j, dsize);
|
|
|
|
if (SET_ELEM_EXPIRED(set, data))
|
|
|
|
continue;
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
/* We have readers running parallel with us,
|
|
|
|
* so the live data cannot be modified.
|
|
|
|
*/
|
|
|
|
flags = 0;
|
|
|
|
memcpy(tmp, data, dsize);
|
|
|
|
data = tmp;
|
|
|
|
mtype_data_reset_flags(data, &flags);
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
key = HKEY(data, h->initval, htable_bits);
|
|
|
|
m = __ipset_dereference(hbucket(t, key));
|
|
|
|
nr = ahash_region(key, htable_bits);
|
|
|
|
if (!m) {
|
|
|
|
m = kzalloc(sizeof(*m) +
|
2015-06-13 09:29:56 -06:00
|
|
|
AHASH_INIT_SIZE * dsize,
|
|
|
|
GFP_ATOMIC);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (!m) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
m->size = AHASH_INIT_SIZE;
|
|
|
|
t->hregion[nr].ext_size +=
|
|
|
|
ext_size(AHASH_INIT_SIZE,
|
|
|
|
dsize);
|
|
|
|
RCU_INIT_POINTER(hbucket(t, key), m);
|
|
|
|
} else if (m->pos >= m->size) {
|
|
|
|
struct hbucket *ht;
|
|
|
|
|
|
|
|
if (m->size >= AHASH_MAX(h)) {
|
|
|
|
ret = -EAGAIN;
|
|
|
|
} else {
|
|
|
|
ht = kzalloc(sizeof(*ht) +
|
2015-06-13 09:29:56 -06:00
|
|
|
(m->size + AHASH_INIT_SIZE)
|
|
|
|
* dsize,
|
|
|
|
GFP_ATOMIC);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (!ht)
|
|
|
|
ret = -ENOMEM;
|
|
|
|
}
|
|
|
|
if (ret < 0)
|
|
|
|
goto cleanup;
|
|
|
|
memcpy(ht, m, sizeof(struct hbucket) +
|
|
|
|
m->size * dsize);
|
|
|
|
ht->size = m->size + AHASH_INIT_SIZE;
|
|
|
|
t->hregion[nr].ext_size +=
|
|
|
|
ext_size(AHASH_INIT_SIZE,
|
|
|
|
dsize);
|
|
|
|
kfree(m);
|
|
|
|
m = ht;
|
|
|
|
RCU_INIT_POINTER(hbucket(t, key), ht);
|
2015-06-13 09:29:56 -06:00
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
d = ahash_data(m, m->pos, dsize);
|
|
|
|
memcpy(d, data, dsize);
|
|
|
|
set_bit(m->pos++, m->used);
|
|
|
|
t->hregion[nr].elements++;
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_data_reset_flags(d, &flags);
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
}
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
rcu_read_unlock_bh();
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
2015-06-13 09:29:56 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
/* There can't be any other writer. */
|
|
|
|
rcu_assign_pointer(h->table, t);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
/* Give time to other readers of the set */
|
2018-11-11 12:43:59 -07:00
|
|
|
synchronize_rcu();
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
pr_debug("set %s resized from %u (%p) to %u (%p)\n", set->name,
|
|
|
|
orig->htable_bits, orig, t->htable_bits, t);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
/* Add/delete elements processed by the SET target during resize.
|
|
|
|
* Kernel-side add cannot trigger a resize and userspace actions
|
|
|
|
* are serialized by the mutex.
|
|
|
|
*/
|
|
|
|
list_for_each_safe(l, lt, &h->ad) {
|
|
|
|
x = list_entry(l, struct mtype_resize_ad, list);
|
|
|
|
if (x->ad == IPSET_ADD) {
|
|
|
|
mtype_add(set, &x->d, &x->ext, &x->mext, x->flags);
|
|
|
|
} else {
|
|
|
|
mtype_del(set, &x->d, NULL, NULL, 0);
|
|
|
|
}
|
|
|
|
list_del(l);
|
|
|
|
kfree(l);
|
|
|
|
}
|
|
|
|
/* If there's nobody else using the table, destroy it */
|
2015-06-13 03:59:45 -06:00
|
|
|
if (atomic_dec_and_test(&orig->uref)) {
|
|
|
|
pr_debug("Table destroy by resize %p\n", orig);
|
|
|
|
mtype_ahash_destroy(set, orig, false);
|
|
|
|
}
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
out:
|
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
|
|
|
kfree(tmp);
|
|
|
|
#endif
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
cleanup:
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
rcu_read_unlock_bh();
|
2015-06-13 09:29:56 -06:00
|
|
|
atomic_set(&orig->ref, 0);
|
|
|
|
atomic_dec(&orig->uref);
|
|
|
|
mtype_ahash_destroy(set, t, false);
|
|
|
|
if (ret == -EAGAIN)
|
|
|
|
goto retry;
|
|
|
|
goto out;
|
2020-12-17 01:53:40 -07:00
|
|
|
|
|
|
|
hbwarn:
|
|
|
|
/* In case we have plenty of memory :-) */
|
|
|
|
pr_warn("Cannot increase the hashsize of set %s further\n", set->name);
|
|
|
|
ret = -IPSET_ERR_HASH_FULL;
|
|
|
|
goto out;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
/* Get the current number of elements and ext_size in the set */
|
|
|
|
static void
|
|
|
|
mtype_ext_size(struct ip_set *set, u32 *elements, size_t *ext_size)
|
|
|
|
{
|
|
|
|
struct htype *h = set->data;
|
|
|
|
const struct htable *t;
|
|
|
|
u32 i, j, r;
|
|
|
|
struct hbucket *n;
|
|
|
|
struct mtype_elem *data;
|
|
|
|
|
|
|
|
t = rcu_dereference_bh(h->table);
|
|
|
|
for (r = 0; r < ahash_numof_locks(t->htable_bits); r++) {
|
|
|
|
for (i = ahash_bucket_start(r, t->htable_bits);
|
|
|
|
i < ahash_bucket_end(r, t->htable_bits); i++) {
|
|
|
|
n = rcu_dereference_bh(hbucket(t, i));
|
|
|
|
if (!n)
|
|
|
|
continue;
|
|
|
|
for (j = 0; j < n->pos; j++) {
|
|
|
|
if (!test_bit(j, n->used))
|
|
|
|
continue;
|
|
|
|
data = ahash_data(n, j, set->dsize);
|
|
|
|
if (!SET_ELEM_EXPIRED(set, data))
|
|
|
|
(*elements)++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
*ext_size += t->hregion[r].ext_size;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Add an element to a hash and update the internal counters when succeeded,
|
2015-06-13 11:45:33 -06:00
|
|
|
* otherwise report the proper error code.
|
|
|
|
*/
|
2013-04-08 13:05:44 -06:00
|
|
|
static int
|
|
|
|
mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
|
|
|
|
struct ip_set_ext *mext, u32 flags)
|
|
|
|
{
|
|
|
|
struct htype *h = set->data;
|
|
|
|
struct htable *t;
|
|
|
|
const struct mtype_elem *d = value;
|
|
|
|
struct mtype_elem *data;
|
2015-06-13 09:29:56 -06:00
|
|
|
struct hbucket *n, *old = ERR_PTR(-ENOENT);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
int i, j = -1, ret;
|
2013-04-08 13:05:44 -06:00
|
|
|
bool flag_exist = flags & IPSET_FLAG_EXIST;
|
2015-06-13 09:29:56 -06:00
|
|
|
bool deleted = false, forceadd = false, reuse = false;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
u32 r, key, multi = 0, elements, maxelem;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
rcu_read_lock_bh();
|
|
|
|
t = rcu_dereference_bh(h->table);
|
|
|
|
key = HKEY(value, h->initval, t->htable_bits);
|
|
|
|
r = ahash_region(key, t->htable_bits);
|
|
|
|
atomic_inc(&t->uref);
|
|
|
|
elements = t->hregion[r].elements;
|
|
|
|
maxelem = t->maxelem;
|
|
|
|
if (elements >= maxelem) {
|
|
|
|
u32 e;
|
|
|
|
if (SET_WITH_TIMEOUT(set)) {
|
|
|
|
rcu_read_unlock_bh();
|
|
|
|
mtype_gc_do(set, h, t, r);
|
|
|
|
rcu_read_lock_bh();
|
|
|
|
}
|
|
|
|
maxelem = h->maxelem;
|
|
|
|
elements = 0;
|
|
|
|
for (e = 0; e < ahash_numof_locks(t->htable_bits); e++)
|
|
|
|
elements += t->hregion[e].elements;
|
|
|
|
if (elements >= maxelem && SET_WITH_FORCEADD(set))
|
2015-06-13 09:29:56 -06:00
|
|
|
forceadd = true;
|
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
rcu_read_unlock_bh();
|
2015-06-13 09:29:56 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
spin_lock_bh(&t->hregion[r].lock);
|
|
|
|
n = rcu_dereference_bh(hbucket(t, key));
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!n) {
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (forceadd || elements >= maxelem)
|
2015-06-13 09:29:56 -06:00
|
|
|
goto set_full;
|
|
|
|
old = NULL;
|
|
|
|
n = kzalloc(sizeof(*n) + AHASH_INIT_SIZE * set->dsize,
|
|
|
|
GFP_ATOMIC);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (!n) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto unlock;
|
|
|
|
}
|
2015-06-13 09:29:56 -06:00
|
|
|
n->size = AHASH_INIT_SIZE;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].ext_size +=
|
|
|
|
ext_size(AHASH_INIT_SIZE, set->dsize);
|
2015-06-13 09:29:56 -06:00
|
|
|
goto copy_elem;
|
|
|
|
}
|
2013-04-08 13:05:44 -06:00
|
|
|
for (i = 0; i < n->pos; i++) {
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!test_bit(i, n->used)) {
|
|
|
|
/* Reuse first deleted entry */
|
|
|
|
if (j == -1) {
|
|
|
|
deleted = reuse = true;
|
|
|
|
j = i;
|
|
|
|
}
|
|
|
|
continue;
|
|
|
|
}
|
2013-09-06 16:10:07 -06:00
|
|
|
data = ahash_data(n, i, set->dsize);
|
2013-04-08 13:05:44 -06:00
|
|
|
if (mtype_data_equal(data, d, &multi)) {
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (flag_exist || SET_ELEM_EXPIRED(set, data)) {
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Just the extensions could be overwritten */
|
|
|
|
j = i;
|
2015-06-13 09:29:56 -06:00
|
|
|
goto overwrite_extensions;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
ret = -IPSET_ERR_EXIST;
|
|
|
|
goto unlock;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
/* Reuse first timed out entry */
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (SET_ELEM_EXPIRED(set, data) && j == -1) {
|
2013-04-08 13:05:44 -06:00
|
|
|
j = i;
|
2015-06-13 09:29:56 -06:00
|
|
|
reuse = true;
|
|
|
|
}
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
2015-06-13 09:29:56 -06:00
|
|
|
if (reuse || forceadd) {
|
2020-02-22 04:01:43 -07:00
|
|
|
if (j == -1)
|
|
|
|
j = 0;
|
2013-09-06 16:10:07 -06:00
|
|
|
data = ahash_data(n, j, set->dsize);
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!deleted) {
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
2015-06-13 09:29:56 -06:00
|
|
|
for (i = 0; i < IPSET_NET_COUNT; i++)
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_del_cidr(set, h,
|
2015-06-13 09:29:56 -06:00
|
|
|
NCIDR_PUT(DCIDR_GET(data->cidr, i)),
|
2016-11-10 04:24:10 -07:00
|
|
|
i);
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
2015-06-13 09:29:56 -06:00
|
|
|
ip_set_ext_destroy(set, data);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].elements--;
|
2015-06-13 09:29:56 -06:00
|
|
|
}
|
|
|
|
goto copy_data;
|
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (elements >= maxelem)
|
2015-06-13 09:29:56 -06:00
|
|
|
goto set_full;
|
|
|
|
/* Create a new slot */
|
|
|
|
if (n->pos >= n->size) {
|
2013-04-08 13:05:44 -06:00
|
|
|
TUNE_AHASH_MAX(h, multi);
|
2015-06-13 09:29:56 -06:00
|
|
|
if (n->size >= AHASH_MAX(h)) {
|
|
|
|
/* Trigger rehashing */
|
|
|
|
mtype_data_next(&h->next, d);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
ret = -EAGAIN;
|
|
|
|
goto resize;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
2015-06-13 09:29:56 -06:00
|
|
|
old = n;
|
|
|
|
n = kzalloc(sizeof(*n) +
|
|
|
|
(old->size + AHASH_INIT_SIZE) * set->dsize,
|
|
|
|
GFP_ATOMIC);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (!n) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto unlock;
|
|
|
|
}
|
2015-06-13 09:29:56 -06:00
|
|
|
memcpy(n, old, sizeof(struct hbucket) +
|
|
|
|
old->size * set->dsize);
|
|
|
|
n->size = old->size + AHASH_INIT_SIZE;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].ext_size +=
|
|
|
|
ext_size(AHASH_INIT_SIZE, set->dsize);
|
2015-06-13 09:29:56 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
copy_elem:
|
|
|
|
j = n->pos++;
|
|
|
|
data = ahash_data(n, j, set->dsize);
|
|
|
|
copy_data:
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].elements++;
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
2015-06-13 09:29:56 -06:00
|
|
|
for (i = 0; i < IPSET_NET_COUNT; i++)
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_add_cidr(set, h, NCIDR_PUT(DCIDR_GET(d->cidr, i)), i);
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
|
|
|
memcpy(data, d, sizeof(struct mtype_elem));
|
2015-06-13 09:29:56 -06:00
|
|
|
overwrite_extensions:
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
|
|
|
mtype_data_set_flags(data, flags);
|
|
|
|
#endif
|
2013-04-08 15:11:02 -06:00
|
|
|
if (SET_WITH_COUNTER(set))
|
2013-09-06 16:10:07 -06:00
|
|
|
ip_set_init_counter(ext_counter(data, set), ext);
|
2013-09-22 12:56:31 -06:00
|
|
|
if (SET_WITH_COMMENT(set))
|
2016-11-10 04:05:34 -07:00
|
|
|
ip_set_init_comment(set, ext_comment(data, set), ext);
|
2014-08-28 00:11:29 -06:00
|
|
|
if (SET_WITH_SKBINFO(set))
|
|
|
|
ip_set_init_skbinfo(ext_skbinfo(data, set), ext);
|
2015-06-13 09:29:56 -06:00
|
|
|
/* Must come last for the case when timed out entry is reused */
|
|
|
|
if (SET_WITH_TIMEOUT(set))
|
|
|
|
ip_set_timeout_set(ext_timeout(data, set), ext->timeout);
|
|
|
|
smp_mb__before_atomic();
|
|
|
|
set_bit(j, n->used);
|
|
|
|
if (old != ERR_PTR(-ENOENT)) {
|
|
|
|
rcu_assign_pointer(hbucket(t, key), n);
|
|
|
|
if (old)
|
|
|
|
kfree_rcu(old, rcu);
|
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
ret = 0;
|
|
|
|
resize:
|
|
|
|
spin_unlock_bh(&t->hregion[r].lock);
|
|
|
|
if (atomic_read(&t->ref) && ext->target) {
|
|
|
|
/* Resize is in process and kernel side add, save values */
|
|
|
|
struct mtype_resize_ad *x;
|
|
|
|
|
|
|
|
x = kzalloc(sizeof(struct mtype_resize_ad), GFP_ATOMIC);
|
|
|
|
if (!x)
|
|
|
|
/* Don't bother */
|
|
|
|
goto out;
|
|
|
|
x->ad = IPSET_ADD;
|
|
|
|
memcpy(&x->d, value, sizeof(struct mtype_elem));
|
|
|
|
memcpy(&x->ext, ext, sizeof(struct ip_set_ext));
|
|
|
|
memcpy(&x->mext, mext, sizeof(struct ip_set_ext));
|
|
|
|
x->flags = flags;
|
|
|
|
spin_lock_bh(&set->lock);
|
|
|
|
list_add_tail(&x->list, &h->ad);
|
|
|
|
spin_unlock_bh(&set->lock);
|
|
|
|
}
|
|
|
|
goto out;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
set_full:
|
|
|
|
if (net_ratelimit())
|
|
|
|
pr_warn("Set %s is full, maxelem %u reached\n",
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
set->name, maxelem);
|
|
|
|
ret = -IPSET_ERR_HASH_FULL;
|
|
|
|
unlock:
|
|
|
|
spin_unlock_bh(&t->hregion[r].lock);
|
|
|
|
out:
|
|
|
|
if (atomic_dec_and_test(&t->uref) && atomic_read(&t->ref)) {
|
|
|
|
pr_debug("Table destroy after resize by add: %p\n", t);
|
|
|
|
mtype_ahash_destroy(set, t, false);
|
|
|
|
}
|
|
|
|
return ret;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
/* Delete an element from the hash and free up space if possible.
|
2013-04-08 13:05:44 -06:00
|
|
|
*/
|
|
|
|
static int
|
|
|
|
mtype_del(struct ip_set *set, void *value, const struct ip_set_ext *ext,
|
|
|
|
struct ip_set_ext *mext, u32 flags)
|
|
|
|
{
|
|
|
|
struct htype *h = set->data;
|
2013-04-30 13:23:18 -06:00
|
|
|
struct htable *t;
|
2013-04-08 13:05:44 -06:00
|
|
|
const struct mtype_elem *d = value;
|
|
|
|
struct mtype_elem *data;
|
|
|
|
struct hbucket *n;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
struct mtype_resize_ad *x = NULL;
|
|
|
|
int i, j, k, r, ret = -IPSET_ERR_EXIST;
|
2013-04-08 13:05:44 -06:00
|
|
|
u32 key, multi = 0;
|
2015-06-13 09:29:56 -06:00
|
|
|
size_t dsize = set->dsize;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
/* Userspace add and resize is excluded by the mutex.
|
|
|
|
* Kernespace add does not trigger resize.
|
|
|
|
*/
|
|
|
|
rcu_read_lock_bh();
|
|
|
|
t = rcu_dereference_bh(h->table);
|
2013-04-08 13:05:44 -06:00
|
|
|
key = HKEY(value, h->initval, t->htable_bits);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
r = ahash_region(key, t->htable_bits);
|
|
|
|
atomic_inc(&t->uref);
|
|
|
|
rcu_read_unlock_bh();
|
|
|
|
|
|
|
|
spin_lock_bh(&t->hregion[r].lock);
|
|
|
|
n = rcu_dereference_bh(hbucket(t, key));
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!n)
|
|
|
|
goto out;
|
|
|
|
for (i = 0, k = 0; i < n->pos; i++) {
|
|
|
|
if (!test_bit(i, n->used)) {
|
|
|
|
k++;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
data = ahash_data(n, i, dsize);
|
2013-04-08 13:05:44 -06:00
|
|
|
if (!mtype_data_equal(data, d, &multi))
|
|
|
|
continue;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (SET_ELEM_EXPIRED(set, data))
|
2013-04-30 13:23:18 -06:00
|
|
|
goto out;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
ret = 0;
|
|
|
|
clear_bit(i, n->used);
|
|
|
|
smp_mb__after_atomic();
|
|
|
|
if (i + 1 == n->pos)
|
|
|
|
n->pos--;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].elements--;
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
2013-09-20 02:13:53 -06:00
|
|
|
for (j = 0; j < IPSET_NET_COUNT; j++)
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
mtype_del_cidr(set, h,
|
|
|
|
NCIDR_PUT(DCIDR_GET(d->cidr, j)), j);
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
2013-09-09 06:44:29 -06:00
|
|
|
ip_set_ext_destroy(set, data);
|
2015-06-13 09:29:56 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (atomic_read(&t->ref) && ext->target) {
|
|
|
|
/* Resize is in process and kernel side del,
|
|
|
|
* save values
|
|
|
|
*/
|
|
|
|
x = kzalloc(sizeof(struct mtype_resize_ad),
|
|
|
|
GFP_ATOMIC);
|
|
|
|
if (x) {
|
|
|
|
x->ad = IPSET_DEL;
|
|
|
|
memcpy(&x->d, value,
|
|
|
|
sizeof(struct mtype_elem));
|
|
|
|
x->flags = flags;
|
|
|
|
}
|
|
|
|
}
|
2015-06-13 09:29:56 -06:00
|
|
|
for (; i < n->pos; i++) {
|
|
|
|
if (!test_bit(i, n->used))
|
|
|
|
k++;
|
|
|
|
}
|
|
|
|
if (n->pos == 0 && k == 0) {
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].ext_size -= ext_size(n->size, dsize);
|
2015-06-13 09:29:56 -06:00
|
|
|
rcu_assign_pointer(hbucket(t, key), NULL);
|
|
|
|
kfree_rcu(n, rcu);
|
|
|
|
} else if (k >= AHASH_INIT_SIZE) {
|
|
|
|
struct hbucket *tmp = kzalloc(sizeof(*tmp) +
|
|
|
|
(n->size - AHASH_INIT_SIZE) * dsize,
|
|
|
|
GFP_ATOMIC);
|
|
|
|
if (!tmp)
|
2013-04-30 13:23:18 -06:00
|
|
|
goto out;
|
2015-06-13 09:29:56 -06:00
|
|
|
tmp->size = n->size - AHASH_INIT_SIZE;
|
|
|
|
for (j = 0, k = 0; j < n->pos; j++) {
|
|
|
|
if (!test_bit(j, n->used))
|
|
|
|
continue;
|
|
|
|
data = ahash_data(n, j, dsize);
|
|
|
|
memcpy(tmp->value + k * dsize, data, dsize);
|
2017-02-16 12:47:30 -07:00
|
|
|
set_bit(k, tmp->used);
|
2015-06-13 09:29:56 -06:00
|
|
|
k++;
|
2013-04-30 13:23:18 -06:00
|
|
|
}
|
2015-06-13 09:29:56 -06:00
|
|
|
tmp->pos = k;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion[r].ext_size -=
|
|
|
|
ext_size(AHASH_INIT_SIZE, dsize);
|
2015-06-13 09:29:56 -06:00
|
|
|
rcu_assign_pointer(hbucket(t, key), tmp);
|
|
|
|
kfree_rcu(n, rcu);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
2013-04-30 13:23:18 -06:00
|
|
|
goto out;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
2013-04-30 13:23:18 -06:00
|
|
|
out:
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
spin_unlock_bh(&t->hregion[r].lock);
|
|
|
|
if (x) {
|
|
|
|
spin_lock_bh(&set->lock);
|
|
|
|
list_add(&x->list, &h->ad);
|
|
|
|
spin_unlock_bh(&set->lock);
|
|
|
|
}
|
|
|
|
if (atomic_dec_and_test(&t->uref) && atomic_read(&t->ref)) {
|
|
|
|
pr_debug("Table destroy after resize by del: %p\n", t);
|
|
|
|
mtype_ahash_destroy(set, t, false);
|
|
|
|
}
|
2013-04-30 13:23:18 -06:00
|
|
|
return ret;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline int
|
|
|
|
mtype_data_match(struct mtype_elem *data, const struct ip_set_ext *ext,
|
|
|
|
struct ip_set_ext *mext, struct ip_set *set, u32 flags)
|
|
|
|
{
|
2018-01-06 07:22:01 -07:00
|
|
|
if (!ip_set_match_extensions(set, ext, mext, flags, data))
|
|
|
|
return 0;
|
|
|
|
/* nomatch entries return -ENOTEMPTY */
|
2013-04-08 13:05:44 -06:00
|
|
|
return mtype_do_data_match(data);
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
|
|
|
/* Special test function which takes into account the different network
|
2015-06-13 11:45:33 -06:00
|
|
|
* sizes added to the set
|
|
|
|
*/
|
2013-04-08 13:05:44 -06:00
|
|
|
static int
|
|
|
|
mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d,
|
|
|
|
const struct ip_set_ext *ext,
|
|
|
|
struct ip_set_ext *mext, u32 flags)
|
|
|
|
{
|
|
|
|
struct htype *h = set->data;
|
2013-04-30 13:23:18 -06:00
|
|
|
struct htable *t = rcu_dereference_bh(h->table);
|
2013-04-08 13:05:44 -06:00
|
|
|
struct hbucket *n;
|
|
|
|
struct mtype_elem *data;
|
2013-09-20 02:13:53 -06:00
|
|
|
#if IPSET_NET_COUNT == 2
|
|
|
|
struct mtype_elem orig = *d;
|
2018-01-06 07:22:01 -07:00
|
|
|
int ret, i, j = 0, k;
|
2013-09-20 02:13:53 -06:00
|
|
|
#else
|
2018-01-06 07:22:01 -07:00
|
|
|
int ret, i, j = 0;
|
2013-09-20 02:13:53 -06:00
|
|
|
#endif
|
2013-04-08 13:05:44 -06:00
|
|
|
u32 key, multi = 0;
|
|
|
|
|
|
|
|
pr_debug("test by nets\n");
|
2016-11-10 04:24:10 -07:00
|
|
|
for (; j < NLEN && h->nets[j].cidr[0] && !multi; j++) {
|
2013-09-20 02:13:53 -06:00
|
|
|
#if IPSET_NET_COUNT == 2
|
|
|
|
mtype_data_reset_elem(d, &orig);
|
2015-06-12 14:11:00 -06:00
|
|
|
mtype_data_netmask(d, NCIDR_GET(h->nets[j].cidr[0]), false);
|
2016-11-10 04:24:10 -07:00
|
|
|
for (k = 0; k < NLEN && h->nets[k].cidr[1] && !multi;
|
2013-09-20 02:13:53 -06:00
|
|
|
k++) {
|
2015-06-12 14:11:00 -06:00
|
|
|
mtype_data_netmask(d, NCIDR_GET(h->nets[k].cidr[1]),
|
|
|
|
true);
|
2013-09-20 02:13:53 -06:00
|
|
|
#else
|
2015-06-12 14:11:00 -06:00
|
|
|
mtype_data_netmask(d, NCIDR_GET(h->nets[j].cidr[0]));
|
2013-09-20 02:13:53 -06:00
|
|
|
#endif
|
2013-04-08 13:05:44 -06:00
|
|
|
key = HKEY(d, h->initval, t->htable_bits);
|
2019-07-15 20:13:01 -06:00
|
|
|
n = rcu_dereference_bh(hbucket(t, key));
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!n)
|
|
|
|
continue;
|
2013-04-08 13:05:44 -06:00
|
|
|
for (i = 0; i < n->pos; i++) {
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!test_bit(i, n->used))
|
|
|
|
continue;
|
2013-09-06 16:10:07 -06:00
|
|
|
data = ahash_data(n, i, set->dsize);
|
2013-04-08 13:05:44 -06:00
|
|
|
if (!mtype_data_equal(data, d, &multi))
|
|
|
|
continue;
|
2018-01-06 07:22:01 -07:00
|
|
|
ret = mtype_data_match(data, ext, mext, set, flags);
|
|
|
|
if (ret != 0)
|
|
|
|
return ret;
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_MULTI
|
2018-01-06 07:22:01 -07:00
|
|
|
/* No match, reset multiple match flag */
|
|
|
|
multi = 0;
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
|
|
|
}
|
2013-09-20 02:13:53 -06:00
|
|
|
#if IPSET_NET_COUNT == 2
|
|
|
|
}
|
|
|
|
#endif
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* Test whether the element is added to the set */
|
|
|
|
static int
|
|
|
|
mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext,
|
|
|
|
struct ip_set_ext *mext, u32 flags)
|
|
|
|
{
|
|
|
|
struct htype *h = set->data;
|
2013-04-30 13:23:18 -06:00
|
|
|
struct htable *t;
|
2013-04-08 13:05:44 -06:00
|
|
|
struct mtype_elem *d = value;
|
|
|
|
struct hbucket *n;
|
|
|
|
struct mtype_elem *data;
|
2013-04-30 13:23:18 -06:00
|
|
|
int i, ret = 0;
|
2013-04-08 13:05:44 -06:00
|
|
|
u32 key, multi = 0;
|
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
rcu_read_lock_bh();
|
2013-04-30 13:23:18 -06:00
|
|
|
t = rcu_dereference_bh(h->table);
|
2013-04-08 13:05:44 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_NETS
|
|
|
|
/* If we test an IP address and not a network address,
|
2015-06-13 11:45:33 -06:00
|
|
|
* try all possible network sizes
|
|
|
|
*/
|
2013-09-20 02:13:53 -06:00
|
|
|
for (i = 0; i < IPSET_NET_COUNT; i++)
|
2016-11-10 04:24:10 -07:00
|
|
|
if (DCIDR_GET(d->cidr, i) != HOST_MASK)
|
2013-09-20 02:13:53 -06:00
|
|
|
break;
|
|
|
|
if (i == IPSET_NET_COUNT) {
|
2013-04-30 13:23:18 -06:00
|
|
|
ret = mtype_test_cidrs(set, d, ext, mext, flags);
|
|
|
|
goto out;
|
|
|
|
}
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
|
|
|
|
|
|
|
key = HKEY(d, h->initval, t->htable_bits);
|
2015-06-13 09:29:56 -06:00
|
|
|
n = rcu_dereference_bh(hbucket(t, key));
|
|
|
|
if (!n) {
|
|
|
|
ret = 0;
|
|
|
|
goto out;
|
|
|
|
}
|
2013-04-08 13:05:44 -06:00
|
|
|
for (i = 0; i < n->pos; i++) {
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!test_bit(i, n->used))
|
|
|
|
continue;
|
2013-09-06 16:10:07 -06:00
|
|
|
data = ahash_data(n, i, set->dsize);
|
2018-01-06 07:22:01 -07:00
|
|
|
if (!mtype_data_equal(data, d, &multi))
|
|
|
|
continue;
|
|
|
|
ret = mtype_data_match(data, ext, mext, set, flags);
|
|
|
|
if (ret != 0)
|
2013-04-30 13:23:18 -06:00
|
|
|
goto out;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
2013-04-30 13:23:18 -06:00
|
|
|
out:
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
rcu_read_unlock_bh();
|
2013-04-30 13:23:18 -06:00
|
|
|
return ret;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Reply a HEADER request: fill out the header part of the set */
|
|
|
|
static int
|
|
|
|
mtype_head(struct ip_set *set, struct sk_buff *skb)
|
|
|
|
{
|
2017-09-11 13:52:40 -06:00
|
|
|
struct htype *h = set->data;
|
2013-04-30 13:23:18 -06:00
|
|
|
const struct htable *t;
|
2013-04-08 13:05:44 -06:00
|
|
|
struct nlattr *nested;
|
|
|
|
size_t memsize;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
u32 elements = 0;
|
|
|
|
size_t ext_size = 0;
|
2015-06-13 09:29:56 -06:00
|
|
|
u8 htable_bits;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
rcu_read_lock_bh();
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t = rcu_dereference_bh(h->table);
|
|
|
|
mtype_ext_size(set, &elements, &ext_size);
|
|
|
|
memsize = mtype_ahash_memsize(h, t) + ext_size + set->ext_size;
|
2015-06-13 09:29:56 -06:00
|
|
|
htable_bits = t->htable_bits;
|
|
|
|
rcu_read_unlock_bh();
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2019-04-26 03:13:09 -06:00
|
|
|
nested = nla_nest_start(skb, IPSET_ATTR_DATA);
|
2013-04-08 13:05:44 -06:00
|
|
|
if (!nested)
|
|
|
|
goto nla_put_failure;
|
|
|
|
if (nla_put_net32(skb, IPSET_ATTR_HASHSIZE,
|
2015-06-13 09:29:56 -06:00
|
|
|
htonl(jhash_size(htable_bits))) ||
|
2013-04-08 13:05:44 -06:00
|
|
|
nla_put_net32(skb, IPSET_ATTR_MAXELEM, htonl(h->maxelem)))
|
|
|
|
goto nla_put_failure;
|
|
|
|
#ifdef IP_SET_HASH_WITH_NETMASK
|
|
|
|
if (h->netmask != HOST_MASK &&
|
|
|
|
nla_put_u8(skb, IPSET_ATTR_NETMASK, h->netmask))
|
|
|
|
goto nla_put_failure;
|
2013-12-17 07:01:44 -07:00
|
|
|
#endif
|
|
|
|
#ifdef IP_SET_HASH_WITH_MARKMASK
|
|
|
|
if (nla_put_u32(skb, IPSET_ATTR_MARKMASK, h->markmask))
|
|
|
|
goto nla_put_failure;
|
2013-04-08 13:05:44 -06:00
|
|
|
#endif
|
2016-03-16 14:49:00 -06:00
|
|
|
if (nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) ||
|
2016-10-10 13:59:21 -06:00
|
|
|
nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)) ||
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
nla_put_net32(skb, IPSET_ATTR_ELEMENTS, htonl(elements)))
|
2013-09-22 12:56:31 -06:00
|
|
|
goto nla_put_failure;
|
|
|
|
if (unlikely(ip_set_put_flags(skb, set)))
|
2013-04-08 13:05:44 -06:00
|
|
|
goto nla_put_failure;
|
2019-04-26 03:13:09 -06:00
|
|
|
nla_nest_end(skb, nested);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
nla_put_failure:
|
|
|
|
return -EMSGSIZE;
|
|
|
|
}
|
|
|
|
|
2015-06-13 03:59:45 -06:00
|
|
|
/* Make possible to run dumping parallel with resizing */
|
|
|
|
static void
|
|
|
|
mtype_uref(struct ip_set *set, struct netlink_callback *cb, bool start)
|
|
|
|
{
|
|
|
|
struct htype *h = set->data;
|
|
|
|
struct htable *t;
|
|
|
|
|
|
|
|
if (start) {
|
|
|
|
rcu_read_lock_bh();
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t = ipset_dereference_bh_nfnl(h->table);
|
2015-06-13 03:59:45 -06:00
|
|
|
atomic_inc(&t->uref);
|
|
|
|
cb->args[IPSET_CB_PRIVATE] = (unsigned long)t;
|
|
|
|
rcu_read_unlock_bh();
|
|
|
|
} else if (cb->args[IPSET_CB_PRIVATE]) {
|
|
|
|
t = (struct htable *)cb->args[IPSET_CB_PRIVATE];
|
|
|
|
if (atomic_dec_and_test(&t->uref) && atomic_read(&t->ref)) {
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
pr_debug("Table destroy after resize "
|
|
|
|
" by dump: %p\n", t);
|
2015-06-13 03:59:45 -06:00
|
|
|
mtype_ahash_destroy(set, t, false);
|
|
|
|
}
|
|
|
|
cb->args[IPSET_CB_PRIVATE] = 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Reply a LIST/SAVE request: dump the elements of the specified set */
|
|
|
|
static int
|
|
|
|
mtype_list(const struct ip_set *set,
|
|
|
|
struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
2015-06-13 03:59:45 -06:00
|
|
|
const struct htable *t;
|
2013-04-08 13:05:44 -06:00
|
|
|
struct nlattr *atd, *nested;
|
|
|
|
const struct hbucket *n;
|
|
|
|
const struct mtype_elem *e;
|
2013-10-18 03:41:55 -06:00
|
|
|
u32 first = cb->args[IPSET_CB_ARG0];
|
2013-04-08 13:05:44 -06:00
|
|
|
/* We assume that one hash bucket fills into one page */
|
|
|
|
void *incomplete;
|
2015-06-13 09:29:56 -06:00
|
|
|
int i, ret = 0;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2019-04-26 03:13:09 -06:00
|
|
|
atd = nla_nest_start(skb, IPSET_ATTR_ADT);
|
2013-04-08 13:05:44 -06:00
|
|
|
if (!atd)
|
|
|
|
return -EMSGSIZE;
|
2015-06-13 09:29:56 -06:00
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
pr_debug("list hash set %s\n", set->name);
|
2015-06-13 03:59:45 -06:00
|
|
|
t = (const struct htable *)cb->args[IPSET_CB_PRIVATE];
|
2015-06-13 09:29:56 -06:00
|
|
|
/* Expire may replace a hbucket with another one */
|
|
|
|
rcu_read_lock();
|
2013-10-18 03:41:55 -06:00
|
|
|
for (; cb->args[IPSET_CB_ARG0] < jhash_size(t->htable_bits);
|
|
|
|
cb->args[IPSET_CB_ARG0]++) {
|
2017-11-30 13:08:05 -07:00
|
|
|
cond_resched_rcu();
|
2013-04-08 13:05:44 -06:00
|
|
|
incomplete = skb_tail_pointer(skb);
|
2015-06-13 09:29:56 -06:00
|
|
|
n = rcu_dereference(hbucket(t, cb->args[IPSET_CB_ARG0]));
|
2013-10-18 03:41:55 -06:00
|
|
|
pr_debug("cb->arg bucket: %lu, t %p n %p\n",
|
|
|
|
cb->args[IPSET_CB_ARG0], t, n);
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!n)
|
|
|
|
continue;
|
2013-04-08 13:05:44 -06:00
|
|
|
for (i = 0; i < n->pos; i++) {
|
2015-06-13 09:29:56 -06:00
|
|
|
if (!test_bit(i, n->used))
|
|
|
|
continue;
|
2013-09-06 16:10:07 -06:00
|
|
|
e = ahash_data(n, i, set->dsize);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
if (SET_ELEM_EXPIRED(set, e))
|
2013-04-08 13:05:44 -06:00
|
|
|
continue;
|
|
|
|
pr_debug("list hash %lu hbucket %p i %u, data %p\n",
|
2013-10-18 03:41:55 -06:00
|
|
|
cb->args[IPSET_CB_ARG0], n, i, e);
|
2019-04-26 03:13:09 -06:00
|
|
|
nested = nla_nest_start(skb, IPSET_ATTR_DATA);
|
2013-04-08 13:05:44 -06:00
|
|
|
if (!nested) {
|
2013-10-18 03:41:55 -06:00
|
|
|
if (cb->args[IPSET_CB_ARG0] == first) {
|
2013-04-08 13:05:44 -06:00
|
|
|
nla_nest_cancel(skb, atd);
|
2015-06-13 09:29:56 -06:00
|
|
|
ret = -EMSGSIZE;
|
|
|
|
goto out;
|
2015-06-13 11:45:33 -06:00
|
|
|
}
|
|
|
|
goto nla_put_failure;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
if (mtype_data_list(skb, e))
|
|
|
|
goto nla_put_failure;
|
2013-09-25 09:44:35 -06:00
|
|
|
if (ip_set_put_extensions(skb, set, e, true))
|
2013-09-22 12:56:31 -06:00
|
|
|
goto nla_put_failure;
|
2019-04-26 03:13:09 -06:00
|
|
|
nla_nest_end(skb, nested);
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
}
|
2019-04-26 03:13:09 -06:00
|
|
|
nla_nest_end(skb, atd);
|
2013-04-08 13:05:44 -06:00
|
|
|
/* Set listing finished */
|
2013-10-18 03:41:55 -06:00
|
|
|
cb->args[IPSET_CB_ARG0] = 0;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
goto out;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
nlmsg_trim(skb, incomplete);
|
2013-10-18 03:41:55 -06:00
|
|
|
if (unlikely(first == cb->args[IPSET_CB_ARG0])) {
|
2014-09-09 22:17:32 -06:00
|
|
|
pr_warn("Can't list set %s: one bucket does not fit into a message. Please report it!\n",
|
|
|
|
set->name);
|
2013-10-18 03:41:55 -06:00
|
|
|
cb->args[IPSET_CB_ARG0] = 0;
|
2015-06-13 09:29:56 -06:00
|
|
|
ret = -EMSGSIZE;
|
2015-06-13 11:45:33 -06:00
|
|
|
} else {
|
2019-04-26 03:13:09 -06:00
|
|
|
nla_nest_end(skb, atd);
|
2015-06-13 11:45:33 -06:00
|
|
|
}
|
2015-06-13 09:29:56 -06:00
|
|
|
out:
|
|
|
|
rcu_read_unlock();
|
|
|
|
return ret;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-04-30 15:02:43 -06:00
|
|
|
IPSET_TOKEN(MTYPE, _kadt)(struct ip_set *set, const struct sk_buff *skb,
|
2015-06-13 11:45:33 -06:00
|
|
|
const struct xt_action_param *par,
|
|
|
|
enum ipset_adt adt, struct ip_set_adt_opt *opt);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
static int
|
2013-04-30 15:02:43 -06:00
|
|
|
IPSET_TOKEN(MTYPE, _uadt)(struct ip_set *set, struct nlattr *tb[],
|
2015-06-13 11:45:33 -06:00
|
|
|
enum ipset_adt adt, u32 *lineno, u32 flags,
|
|
|
|
bool retried);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
static const struct ip_set_type_variant mtype_variant = {
|
|
|
|
.kadt = mtype_kadt,
|
|
|
|
.uadt = mtype_uadt,
|
|
|
|
.adt = {
|
|
|
|
[IPSET_ADD] = mtype_add,
|
|
|
|
[IPSET_DEL] = mtype_del,
|
|
|
|
[IPSET_TEST] = mtype_test,
|
|
|
|
},
|
|
|
|
.destroy = mtype_destroy,
|
|
|
|
.flush = mtype_flush,
|
|
|
|
.head = mtype_head,
|
|
|
|
.list = mtype_list,
|
2015-06-13 03:59:45 -06:00
|
|
|
.uref = mtype_uref,
|
2013-04-08 13:05:44 -06:00
|
|
|
.resize = mtype_resize,
|
|
|
|
.same_set = mtype_same_set,
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
.region_lock = true,
|
2013-04-08 13:05:44 -06:00
|
|
|
};
|
|
|
|
|
|
|
|
#ifdef IP_SET_EMIT_CREATE
|
|
|
|
static int
|
2013-09-30 09:07:02 -06:00
|
|
|
IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
|
|
|
|
struct nlattr *tb[], u32 flags)
|
2013-04-08 13:05:44 -06:00
|
|
|
{
|
|
|
|
u32 hashsize = IPSET_DEFAULT_HASHSIZE, maxelem = IPSET_DEFAULT_MAXELEM;
|
2013-12-17 07:01:44 -07:00
|
|
|
#ifdef IP_SET_HASH_WITH_MARKMASK
|
|
|
|
u32 markmask;
|
|
|
|
#endif
|
2013-04-08 13:05:44 -06:00
|
|
|
u8 hbits;
|
|
|
|
#ifdef IP_SET_HASH_WITH_NETMASK
|
|
|
|
u8 netmask;
|
|
|
|
#endif
|
|
|
|
size_t hsize;
|
2015-05-02 11:28:06 -06:00
|
|
|
struct htype *h;
|
2013-04-30 13:23:18 -06:00
|
|
|
struct htable *t;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
u32 i;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2016-10-10 23:25:00 -06:00
|
|
|
pr_debug("Create set %s with family %s\n",
|
|
|
|
set->name, set->family == NFPROTO_IPV4 ? "inet" : "inet6");
|
|
|
|
|
2018-06-04 08:51:19 -06:00
|
|
|
#ifdef IP_SET_PROTO_UNDEF
|
|
|
|
if (set->family != NFPROTO_UNSPEC)
|
|
|
|
return -IPSET_ERR_INVALID_FAMILY;
|
|
|
|
#else
|
2013-04-08 13:05:44 -06:00
|
|
|
if (!(set->family == NFPROTO_IPV4 || set->family == NFPROTO_IPV6))
|
|
|
|
return -IPSET_ERR_INVALID_FAMILY;
|
2014-09-15 09:36:06 -06:00
|
|
|
#endif
|
2013-12-17 07:01:44 -07:00
|
|
|
|
2013-04-08 13:05:44 -06:00
|
|
|
if (unlikely(!ip_set_optattr_netorder(tb, IPSET_ATTR_HASHSIZE) ||
|
|
|
|
!ip_set_optattr_netorder(tb, IPSET_ATTR_MAXELEM) ||
|
|
|
|
!ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
|
|
|
|
!ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
|
|
|
|
return -IPSET_ERR_PROTOCOL;
|
2016-10-10 23:25:00 -06:00
|
|
|
|
2015-06-13 09:29:56 -06:00
|
|
|
#ifdef IP_SET_HASH_WITH_MARKMASK
|
|
|
|
/* Separated condition in order to avoid directive in argument list */
|
|
|
|
if (unlikely(!ip_set_optattr_netorder(tb, IPSET_ATTR_MARKMASK)))
|
|
|
|
return -IPSET_ERR_PROTOCOL;
|
2013-04-08 13:05:44 -06:00
|
|
|
|
2016-10-10 23:25:00 -06:00
|
|
|
markmask = 0xffffffff;
|
|
|
|
if (tb[IPSET_ATTR_MARKMASK]) {
|
|
|
|
markmask = ntohl(nla_get_be32(tb[IPSET_ATTR_MARKMASK]));
|
|
|
|
if (markmask == 0)
|
|
|
|
return -IPSET_ERR_INVALID_MARKMASK;
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
2016-10-10 23:25:00 -06:00
|
|
|
#endif
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
#ifdef IP_SET_HASH_WITH_NETMASK
|
2016-10-10 23:25:00 -06:00
|
|
|
netmask = set->family == NFPROTO_IPV4 ? 32 : 128;
|
2013-04-08 13:05:44 -06:00
|
|
|
if (tb[IPSET_ATTR_NETMASK]) {
|
|
|
|
netmask = nla_get_u8(tb[IPSET_ATTR_NETMASK]);
|
|
|
|
|
|
|
|
if ((set->family == NFPROTO_IPV4 && netmask > 32) ||
|
|
|
|
(set->family == NFPROTO_IPV6 && netmask > 128) ||
|
|
|
|
netmask == 0)
|
|
|
|
return -IPSET_ERR_INVALID_NETMASK;
|
|
|
|
}
|
|
|
|
#endif
|
2013-12-17 07:01:44 -07:00
|
|
|
|
2016-10-10 23:25:00 -06:00
|
|
|
if (tb[IPSET_ATTR_HASHSIZE]) {
|
|
|
|
hashsize = ip_set_get_h32(tb[IPSET_ATTR_HASHSIZE]);
|
|
|
|
if (hashsize < IPSET_MIMINAL_HASHSIZE)
|
|
|
|
hashsize = IPSET_MIMINAL_HASHSIZE;
|
2013-12-17 07:01:44 -07:00
|
|
|
}
|
2016-10-10 23:25:00 -06:00
|
|
|
|
|
|
|
if (tb[IPSET_ATTR_MAXELEM])
|
|
|
|
maxelem = ip_set_get_h32(tb[IPSET_ATTR_MAXELEM]);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
hsize = sizeof(*h);
|
|
|
|
h = kzalloc(hsize, GFP_KERNEL);
|
|
|
|
if (!h)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2020-12-17 07:53:18 -07:00
|
|
|
/* Compute htable_bits from the user input parameter hashsize.
|
|
|
|
* Assume that hashsize == 2^htable_bits,
|
|
|
|
* otherwise round up to the first 2^n value.
|
|
|
|
*/
|
|
|
|
hbits = fls(hashsize - 1);
|
2013-04-08 13:05:44 -06:00
|
|
|
hsize = htable_size(hbits);
|
|
|
|
if (hsize == 0) {
|
|
|
|
kfree(h);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
2013-04-30 13:23:18 -06:00
|
|
|
t = ip_set_alloc(hsize);
|
|
|
|
if (!t) {
|
2013-04-08 13:05:44 -06:00
|
|
|
kfree(h);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->hregion = ip_set_alloc(ahash_sizeof_regions(hbits));
|
|
|
|
if (!t->hregion) {
|
2020-06-29 18:04:17 -06:00
|
|
|
ip_set_free(t);
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
kfree(h);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
h->gc.set = set;
|
|
|
|
for (i = 0; i < ahash_numof_locks(hbits); i++)
|
|
|
|
spin_lock_init(&t->hregion[i].lock);
|
2016-10-10 23:25:00 -06:00
|
|
|
h->maxelem = maxelem;
|
|
|
|
#ifdef IP_SET_HASH_WITH_NETMASK
|
|
|
|
h->netmask = netmask;
|
|
|
|
#endif
|
|
|
|
#ifdef IP_SET_HASH_WITH_MARKMASK
|
|
|
|
h->markmask = markmask;
|
|
|
|
#endif
|
|
|
|
get_random_bytes(&h->initval, sizeof(h->initval));
|
|
|
|
|
2013-04-30 13:23:18 -06:00
|
|
|
t->htable_bits = hbits;
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
t->maxelem = h->maxelem / ahash_numof_locks(hbits);
|
2016-10-10 23:25:00 -06:00
|
|
|
RCU_INIT_POINTER(h->table, t);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
INIT_LIST_HEAD(&h->ad);
|
2013-04-08 13:05:44 -06:00
|
|
|
set->data = h;
|
2014-09-15 09:36:06 -06:00
|
|
|
#ifndef IP_SET_PROTO_UNDEF
|
2013-09-09 06:44:29 -06:00
|
|
|
if (set->family == NFPROTO_IPV4) {
|
2014-09-15 09:36:06 -06:00
|
|
|
#endif
|
2013-04-30 15:02:43 -06:00
|
|
|
set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
|
2013-09-06 16:43:52 -06:00
|
|
|
set->dsize = ip_set_elem_len(set, tb,
|
2015-11-07 03:21:47 -07:00
|
|
|
sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)),
|
|
|
|
__alignof__(struct IPSET_TOKEN(HTYPE, 4_elem)));
|
2014-09-15 09:36:06 -06:00
|
|
|
#ifndef IP_SET_PROTO_UNDEF
|
2013-09-06 16:43:52 -06:00
|
|
|
} else {
|
2013-04-30 15:02:43 -06:00
|
|
|
set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
|
2013-09-06 16:43:52 -06:00
|
|
|
set->dsize = ip_set_elem_len(set, tb,
|
2015-11-07 03:21:47 -07:00
|
|
|
sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)),
|
|
|
|
__alignof__(struct IPSET_TOKEN(HTYPE, 6_elem)));
|
2013-09-06 16:43:52 -06:00
|
|
|
}
|
2014-09-15 09:36:06 -06:00
|
|
|
#endif
|
2016-10-10 23:25:00 -06:00
|
|
|
set->timeout = IPSET_NO_TIMEOUT;
|
2013-09-06 16:43:52 -06:00
|
|
|
if (tb[IPSET_ATTR_TIMEOUT]) {
|
2013-09-06 16:10:07 -06:00
|
|
|
set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]);
|
2014-09-15 09:36:06 -06:00
|
|
|
#ifndef IP_SET_PROTO_UNDEF
|
2013-09-06 16:43:52 -06:00
|
|
|
if (set->family == NFPROTO_IPV4)
|
2014-09-15 09:36:06 -06:00
|
|
|
#endif
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
IPSET_TOKEN(HTYPE, 4_gc_init)(&h->gc);
|
2014-09-15 09:36:06 -06:00
|
|
|
#ifndef IP_SET_PROTO_UNDEF
|
2013-09-06 16:43:52 -06:00
|
|
|
else
|
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
commit f66ee0410b1c3481ee75e5db9b34547b4d582465 upstream.
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com
Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com
Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com
Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7")
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11 15:20:43 -07:00
|
|
|
IPSET_TOKEN(HTYPE, 6_gc_init)(&h->gc);
|
2014-09-15 09:36:06 -06:00
|
|
|
#endif
|
2013-04-08 13:05:44 -06:00
|
|
|
}
|
|
|
|
pr_debug("create %s hashsize %u (%u) maxelem %u: %p(%p)\n",
|
2013-04-30 13:23:18 -06:00
|
|
|
set->name, jhash_size(t->htable_bits),
|
|
|
|
t->htable_bits, h->maxelem, set->data, t);
|
2013-04-08 13:05:44 -06:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#endif /* IP_SET_EMIT_CREATE */
|
2015-05-02 11:28:18 -06:00
|
|
|
|
|
|
|
#undef HKEY_DATALEN
|