redonkable/alistair23-linux

Author	SHA1	Message	Date
Jeff Layton	441d367644	iversion: add a routine to update a raw value with a larger one Under ceph, clients can be independently updating iversion themselves, while working under comprehensive sets of caps on an inode. In that situation we always want to prefer the largest value of a change attribute. Add a new function that will update a raw value with a larger one, but otherwise leave it alone. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2019-07-08 14:01:43 +02:00
Goffredo Baroncelli	c472c07bfe	iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw} The function inode_cmp_iversion{+raw} is counter-intuitive, because it returns true when the counters are different and false when these are equal. Rename it to inode_eq_iversion{+raw}, which will returns true when the counters are equal and false otherwise. Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> Signed-off-by: Jeff Layton <jlayton@redhat.com>	2018-02-01 08:15:25 -05:00
Jeff Layton	c0cef30e4f	iversion: make inode_cmp_iversion{+raw} return bool instead of s64 As Linus points out: The inode_cmp_iversion{+raw}() functions are pure and utter crap. Why? You say that they return 0/negative/positive, but they do so in a completely broken manner. They return that ternary value as the sequence number difference in a 's64', which means that if you actually care about that ternary value, and do the sane thing that the kernel-doc of the function implies is the right thing, you would do int cmp = inode_cmp_iversion(inode, old); if (cmp < 0 ... and as a result you get code that looks sane, but that doesn't actually WORK right. Since none of the callers actually care about the ternary value here, convert the inode_cmp_iversion{+raw} functions to just return a boolean value (false for matching, true for non-matching). This matches the existing use of these functions just fine, and makes it simple to convert them to return a ternary value in the future if we grow callers that need it. With this change we can also reimplement inode_cmp_iversion in a simpler way using inode_peek_iversion. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-01-31 08:43:35 -08:00
Jeff Layton	f02a9ad1f1	fs: handle inode->i_version more efficiently Since i_version is mostly treated as an opaque value, we can exploit that fact to avoid incrementing it when no one is watching. With that change, we can avoid incrementing the counter on writes, unless someone has queried for it since it was last incremented. If the a/c/mtime don't change, and the i_version hasn't changed, then there's no need to dirty the inode metadata on a write. Convert the i_version counter to an atomic64_t, and use the lowest order bit to hold a flag that will tell whether anyone has queried the value since it was last incremented. When we go to maybe increment it, we fetch the value and check the flag bit. If it's clear then we don't need to do anything if the update isn't being forced. If we do need to update, then we increment the counter by 2, and clear the flag bit, and then use a CAS op to swap it into place. If that works, we return true. If it doesn't then do it again with the value that we fetch from the CAS operation. On the query side, if the flag is already set, then we just shift the value down by 1 bit and return it. Otherwise, we set the flag in our on-stack value and again use cmpxchg to swap it into place if it hasn't changed. If it has, then we use the value from the cmpxchg as the new "old" value and try again. This method allows us to avoid incrementing the counter on writes (and dirtying the metadata) under typical workloads. We only need to increment if it has been queried since it was last changed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Tested-by: Krzysztof Kozlowski <krzk@kernel.org>	2018-01-29 06:42:21 -05:00
Jeff Layton	7594c46116	fs: don't take the i_lock in inode_inc_iversion The rationale for taking the i_lock when incrementing this value is lost in antiquity. The readers of the field don't take it (at least not universally), so my assumption is that it was only done here to serialize incrementors. If that is indeed the case, then we can drop the i_lock from this codepath and treat it as a atomic64_t for the purposes of incrementing it. This allows us to use inode_inc_iversion without any danger of lock inversion. Note that the read side is not fetched atomically with this change. The assumption here is that that is not a critical issue since the i_version is not fully synchronized with anything else anyway. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>	2018-01-29 06:42:20 -05:00
Jeff Layton	ae5e165d85	fs: new API for handling inode->i_version Add a documentation blob that explains what the i_version field is, how it is expected to work, and how it is currently implemented by various filesystems. We already have inode_inc_iversion. Add several other functions for manipulating and accessing the i_version counter. For now, the implementation is trivial and basically works the way that all of the open-coded i_version accesses work today. Future patches will convert existing users of i_version to use the new API, and then convert the backend implementation to do things more efficiently. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>	2018-01-29 06:41:30 -05:00

6 commits