Commit graph

3507 commits

Author SHA1 Message Date
Linus Torvalds ac4de9543a Merge branch 'akpm' (patches from Andrew Morton)
Merge more patches from Andrew Morton:
 "The rest of MM.  Plus one misc cleanup"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
  mm/Kconfig: add MMU dependency for MIGRATION.
  kernel: replace strict_strto*() with kstrto*()
  mm, thp: count thp_fault_fallback anytime thp fault fails
  thp: consolidate code between handle_mm_fault() and do_huge_pmd_anonymous_page()
  thp: do_huge_pmd_anonymous_page() cleanup
  thp: move maybe_pmd_mkwrite() out of mk_huge_pmd()
  mm: cleanup add_to_page_cache_locked()
  thp: account anon transparent huge pages into NR_ANON_PAGES
  truncate: drop 'oldsize' truncate_pagecache() parameter
  mm: make lru_add_drain_all() selective
  memcg: document cgroup dirty/writeback memory statistics
  memcg: add per cgroup writeback pages accounting
  memcg: check for proper lock held in mem_cgroup_update_page_stat
  memcg: remove MEMCG_NR_FILE_MAPPED
  memcg: reduce function dereference
  memcg: avoid overflow caused by PAGE_ALIGN
  memcg: rename RESOURCE_MAX to RES_COUNTER_MAX
  memcg: correct RESOURCE_MAX to ULLONG_MAX
  mm: memcg: do not trap chargers with full callstack on OOM
  mm: memcg: rework and document OOM waiting and wakeup
  ...
2013-09-12 15:44:27 -07:00
Kirill A. Shutemov 7caef26767 truncate: drop 'oldsize' truncate_pagecache() parameter
truncate_pagecache() doesn't care about old size since commit
cedabed49b ("vfs: Fix vmtruncate() regression").  Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:02 -07:00
Linus Torvalds b7c09ad401 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is against 3.11-rc7, but was pulled and tested against your tree
  as of yesterday.  We do have two small incrementals queued up, but I
  wanted to get this bunch out the door before I hop on an airplane.

  This is a fairly large batch of fixes, performance improvements, and
  cleanups from the usual Btrfs suspects.

  We've included Stefan Behren's work to index subvolume UUIDs, which is
  targeted at speeding up send/receive with many subvolumes or snapshots
  in place.  It closes a long standing performance issue that was built
  in to the disk format.

  Mark Fasheh's offline dedup work is also here.  In this case offline
  means the FS is mounted and active, but the dedup work is not done
  inline during file IO.  This is a building block where utilities are
  able to ask the FS to dedup a series of extents.  The kernel takes
  care of verifying the data involved really is the same.  Today this
  involves reading both extents, but we'll continue to evolve the
  patches"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
  Btrfs: optimize key searches in btrfs_search_slot
  Btrfs: don't use an async starter for most of our workers
  Btrfs: only update disk_i_size as we remove extents
  Btrfs: fix deadlock in uuid scan kthread
  Btrfs: stop refusing the relocation of chunk 0
  Btrfs: fix memory leak of uuid_root in free_fs_info
  btrfs: reuse kbasename helper
  btrfs: return btrfs error code for dev excl ops err
  Btrfs: allow partial ordered extent completion
  Btrfs: convert all bug_ons in free-space-cache.c
  Btrfs: add support for asserts
  Btrfs: adjust the fs_devices->missing count on unmount
  Btrf: cleanup: don't check for root_refs == 0 twice
  Btrfs: fix for patch "cleanup: don't check the same thing twice"
  Btrfs: get rid of one BUG() in write_all_supers()
  Btrfs: allocate prelim_ref with a slab allocater
  Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC
  Btrfs: fix race conditions in BTRFS_IOC_FS_INFO ioctl
  Btrfs: fix race between removing a dev and writing sbs
  Btrfs: remove ourselves from the cluster list under lock
  ...
2013-09-12 09:58:51 -07:00
Linus Torvalds 2e515bf096 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "The usual trivial updates all over the tree -- mostly typo fixes and
  documentation updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
  doc: Documentation/cputopology.txt fix typo
  treewide: Convert retrun typos to return
  Fix comment typo for init_cma_reserved_pageblock
  Documentation/trace: Correcting and extending tracepoint documentation
  mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
  power: Documentation: Update s2ram link
  doc: fix a typo in Documentation/00-INDEX
  Documentation/printk-formats.txt: No casts needed for u64/s64
  doc: Fix typo "is is" in Documentations
  treewide: Fix printks with 0x%#
  zram: doc fixes
  Documentation/kmemcheck: update kmemcheck documentation
  doc: documentation/hwspinlock.txt fix typo
  PM / Hibernate: add section for resume options
  doc: filesystems : Fix typo in Documentations/filesystems
  scsi/megaraid fixed several typos in comments
  ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
  treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
  page_isolation: Fix a comment typo in test_pages_isolated()
  doc: fix a typo about irq affinity
  ...
2013-09-06 09:36:28 -07:00
Linus Torvalds 45d9a2220f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "Unfortunately, this merge window it'll have a be a lot of small piles -
  my fault, actually, for not keeping #for-next in anything that would
  resemble a sane shape ;-/

  This pile: assorted fixes (the first 3 are -stable fodder, IMO) and
  cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last
  components) + several long-standing patches from various folks.

  There definitely will be a lot more (starting with Miklos'
  check_submount_and_drop() series)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  direct-io: Handle O_(D)SYNC AIO
  direct-io: Implement generic deferred AIO completions
  add formats for dentry/file pathnames
  kvm eventfd: switch to fdget
  powerpc kvm: use fdget
  switch fchmod() to fdget
  switch epoll_ctl() to fdget
  switch copy_module_from_fd() to fdget
  git simplify nilfs check for busy subtree
  ibmasmfs: don't bother passing superblock when not needed
  don't pass superblock to hypfs_{mkdir,create*}
  don't pass superblock to hypfs_diag_create_files
  don't pass superblock to hypfs_vm_create_files()
  oprofile: get rid of pointless forward declarations of struct super_block
  oprofilefs_create_...() do not need superblock argument
  oprofilefs_mkdir() doesn't need superblock argument
  don't bother with passing superblock to oprofile_create_stats_files()
  oprofile: don't bother with passing superblock to ->create_files()
  don't bother passing sb to oprofile_create_files()
  coh901318: don't open-code simple_read_from_buffer()
  ...
2013-09-05 08:50:26 -07:00
Linus Torvalds 27703bb4a6 PTR_RET() is a weird name, and led to some confusing usage. We ended
up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages.
 
 This has been sitting in linux-next for a whole cycle.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJSJo+1AAoJENkgDmzRrbjxIC4QALJK95o8AUXuwUkl+2fmFkUt
 hh2/PJ1vDYgk4Xt0J6hyoK7XMa0H1RkbBrROuDdsBnorMFpEsGcgdkUZte9ufoAS
 97Bg+7N0KPbTB/S8vOwtW1vbERTJIVPN2uf6h1Wqm9Xc2puCh3HbMMr1AWMGu0WQ
 NqY5+Zz8zecy1UOrMhEP6H1CjeQcL1w1DO6YM5ydeqlKNzAz+JMfDXriLPDwiE7+
 XFPDF/O3Vtd2ckA7L70Lio7hfHwxV5U4WwFVfiwls98XB4jcZqDKIoh1r8z4SRgR
 +0Rae2DN3BaOabGMr//5XdrzQVpwJTh5m2w8BAOHJvCJ9HR7Sq29UIN4u+TowZBy
 L2xYo4dvFxkympwu5zEd3c7vHYWKIaqmSq5PIjr4gF/uIo2OeOTrpPIK782ZEYb7
 e+qUgOEM05V9AmQZCrSZeP9u474Sj8ow3sCtWxfdRtwNfoEIcUXsNNJd/zDHlVtW
 cEtXqc2xXIpcuUJQWlSaGp8fmRQjVZPzrLKYLM2m39ZcOOJbf5rzQAYS7hHPosIa
 SK+YVux/+Zzi+Xo/vXq1OlM/SruCr5S7JOgCxLowoQ88vupgXME6uPyC8EO+QQ50
 GsrHes5ZNLbk0uVsfcexIyojkUnyvDmmnDpv+1zdC6RgZLJQn8OXp5yNhHhnhrFT
 BiHX6YFWtDDqRlVv8Q0F
 =LeaW
 -----END PGP SIGNATURE-----

Merge tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull PTR_RET() removal patches from Rusty Russell:
 "PTR_RET() is a weird name, and led to some confusing usage.  We ended
  up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages.

  This has been sitting in linux-next for a whole cycle"

[ There are still some PTR_RET users scattered about, with some of them
  possibly being new, but most of them existing in Rusty's tree too.  We
  have that

      #define PTR_RET(p) PTR_ERR_OR_ZERO(p)

  thing in <linux/err.h>, so they continue to work for now  - Linus ]

* tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO
  Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO
  drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO
  sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO
  dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO
  drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO
  mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR().
  staging/zcache: don't use PTR_RET().
  remoteproc: don't use PTR_RET().
  pinctrl: don't use PTR_RET().
  acpi: Replace weird use of PTR_RET.
  s390: Replace weird use of PTR_RET.
  PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.
  PTR_RET is now PTR_ERR_OR_ZERO
2013-09-04 17:31:11 -07:00
Christoph Hellwig 02afc27fae direct-io: Handle O_(D)SYNC AIO
Call generic_write_sync() from the deferred I/O completion handler if
O_DSYNC is set for a write request.  Also make sure various callers
don't call generic_write_sync if the direct I/O code returns
-EIOCBQUEUED.

Based on an earlier patch from Jan Kara <jack@suse.cz> with updates from
Jeff Moyer <jmoyer@redhat.com> and Darrick J. Wong <darrick.wong@oracle.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-04 09:23:46 -04:00
Filipe David Borba Manana d7396f0735 Btrfs: optimize key searches in btrfs_search_slot
When the binary search returns 0 (exact match), the target key
will necessarily be at slot 0 of all nodes below the current one,
so in this case the binary search is not needed because it will
always return 0, and we waste time doing it, holding node locks
for longer than necessary, etc.

Below follow histograms with the times spent on the current approach of
doing a binary search when the previous binary search returned 0, and
times for the new approach, which directly picks the first item/child
node in the leaf/node.

Current approach:

Count: 6682
Range: 35.000 - 8370.000; Mean: 85.837; Median: 75.000; Stddev: 106.429
Percentiles:  90th: 124.000; 95th: 145.000; 99th: 206.000
  35.000 -   61.080:  1235 ################
  61.080 -  106.053:  4207 #####################################################
 106.053 -  183.606:  1122 ##############
 183.606 -  317.341:   111 #
 317.341 -  547.959:     6 |
 547.959 - 8370.000:     1 |

Approach proposed by this patch:

Count: 6682
Range:  6.000 - 135.000; Mean: 16.690; Median: 16.000; Stddev:  7.160
Percentiles:  90th: 23.000; 95th: 27.000; 99th: 40.000
   6.000 -    8.418:    58 #
   8.418 -   11.670:  1149 #########################
  11.670 -   16.046:  2418 #####################################################
  16.046 -   21.934:  2098 ##############################################
  21.934 -   29.854:   744 ################
  29.854 -   40.511:   154 ###
  40.511 -   54.848:    41 #
  54.848 -   74.136:     5 |
  74.136 -  100.087:     9 |
 100.087 -  135.000:     6 |

These samples were captured during a run of the btrfs tests 001, 002 and
004 in the xfstests, with a leaf/node size of 4Kb.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:42 -04:00
Josef Bacik 45d5fd14d2 Btrfs: don't use an async starter for most of our workers
We only need an async starter if we can't make a GFP_NOFS allocation in our
current path.  This is the case for the endio stuff since it happens in IRQ
context, but things like the caching thread workers and the delalloc flushers we
can easily make this allocation and start threads right away.  Also change the
worker count for the caching thread pool.  Traditionally we limited this to 2
since we took read locks while caching, but nowadays we do this lockless so
there's no reason to limit the number of caching threads.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:41 -04:00
Josef Bacik 7f4f6e0a3f Btrfs: only update disk_i_size as we remove extents
This fixes a problem where if we fail a truncate we will leave the i_size set
where we wanted to truncate to instead of where we were able to truncate to.
Fix this by making btrfs_truncate_inode_items do the disk_i_size update as it
removes extents, that way it will always be consistent with where its extents
are.  Then if the truncate fails at all we can update the in-ram i_size with
what we have on disk and delete the orphan item.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:40 -04:00
Filipe David Borba Manana f45388f387 Btrfs: fix deadlock in uuid scan kthread
If there's an ongoing transaction when the uuid scan kthread attempts
to create one, the kthread will block, waiting for that transaction to
finish while it's keeping locks on the tree root, and in turn the existing
transaction is waiting for those locks to be free.

The stack trace reported by the kernel follows.

[36700.671601] INFO: task btrfs-uuid:15480 blocked for more than 120 seconds.
[36700.671602] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36700.671602] btrfs-uuid      D 0000000000000000     0 15480      2 0x00000000
[36700.671604]  ffff880710bd5b88 0000000000000046 ffff8803d36ba850 0000000000030000
[36700.671605]  ffff8806d76dc530 ffff880710bd5fd8 ffff880710bd5fd8 ffff880710bd5fd8
[36700.671607]  ffff8808098ac530 ffff8806d76dc530 ffff880710bd5b98 ffff8805e4508e40
[36700.671608] Call Trace:
[36700.671610]  [<ffffffff816f36b9>] schedule+0x29/0x70
[36700.671620]  [<ffffffffa05a3bdf>] wait_current_trans.isra.33+0xbf/0x120 [btrfs]
[36700.671623]  [<ffffffff81066760>] ? add_wait_queue+0x60/0x60
[36700.671629]  [<ffffffffa05a5b06>] start_transaction+0x3d6/0x530 [btrfs]
[36700.671636]  [<ffffffffa05bb1f4>] ? btrfs_get_token_32+0x64/0xf0 [btrfs]
[36700.671642]  [<ffffffffa05a5fbb>] btrfs_start_transaction+0x1b/0x20 [btrfs]
[36700.671649]  [<ffffffffa05c8a81>] btrfs_uuid_scan_kthread+0x211/0x3d0 [btrfs]
[36700.671655]  [<ffffffffa05c8870>] ? __btrfs_open_devices+0x2a0/0x2a0 [btrfs]
[36700.671657]  [<ffffffff81065fa0>] kthread+0xc0/0xd0
[36700.671659]  [<ffffffff81065ee0>] ? flush_kthread_worker+0xb0/0xb0
[36700.671661]  [<ffffffff816fcd1c>] ret_from_fork+0x7c/0xb0
[36700.671662]  [<ffffffff81065ee0>] ? flush_kthread_worker+0xb0/0xb0
[36700.671663] INFO: task btrfs:15481 blocked for more than 120 seconds.
[36700.671664] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36700.671665] btrfs           D 0000000000000000     0 15481  15212 0x00000004
[36700.671666]  ffff880248cbf4c8 0000000000000086 ffff8803d36ba700 ffff8801dbd5c280
[36700.671668]  ffff880807815c40 ffff880248cbffd8 ffff880248cbffd8 ffff880248cbffd8
[36700.671669]  ffff8805e86a0000 ffff880807815c40 ffff880248cbf4d8 ffff8801dbd5c280
[36700.671670] Call Trace:
[36700.671672]  [<ffffffff816f36b9>] schedule+0x29/0x70
[36700.671679]  [<ffffffffa05d9b0d>] btrfs_tree_lock+0x6d/0x230 [btrfs]
[36700.671680]  [<ffffffff81066760>] ? add_wait_queue+0x60/0x60
[36700.671685]  [<ffffffffa0582829>] btrfs_search_slot+0x999/0xb00 [btrfs]
[36700.671691]  [<ffffffffa05bd9de>] ? btrfs_lookup_first_ordered_extent+0x5e/0xb0 [btrfs]
[36700.671698]  [<ffffffffa05e3e54>] __btrfs_write_out_cache+0x8c4/0xa80 [btrfs]
[36700.671704]  [<ffffffffa05e4362>] btrfs_write_out_cache+0xb2/0xf0 [btrfs]
[36700.671710]  [<ffffffffa05c4441>] ? free_extent_buffer+0x61/0xc0 [btrfs]
[36700.671716]  [<ffffffffa0594c82>] btrfs_write_dirty_block_groups+0x562/0x650 [btrfs]
[36700.671723]  [<ffffffffa0610092>] commit_cowonly_roots+0x171/0x24b [btrfs]
[36700.671729]  [<ffffffffa05a4dde>] btrfs_commit_transaction+0x4fe/0xa10 [btrfs]
[36700.671735]  [<ffffffffa0610af3>] create_subvol+0x5c0/0x636 [btrfs]
[36700.671742]  [<ffffffffa05d49ff>] btrfs_mksubvol.isra.60+0x33f/0x3f0 [btrfs]
[36700.671747]  [<ffffffffa05d4bf2>] btrfs_ioctl_snap_create_transid+0x142/0x190 [btrfs]
[36700.671752]  [<ffffffffa05d4c6c>] ? btrfs_ioctl_snap_create+0x2c/0x80 [btrfs]
[36700.671757]  [<ffffffffa05d4c9e>] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs]
[36700.671759]  [<ffffffff8113a764>] ? handle_pte_fault+0x84/0x920
[36700.671764]  [<ffffffffa05d87eb>] btrfs_ioctl+0xf0b/0x1d00 [btrfs]
[36700.671766]  [<ffffffff8113c120>] ? handle_mm_fault+0x210/0x310
[36700.671768]  [<ffffffff816f83a4>] ? __do_page_fault+0x284/0x4e0
[36700.671770]  [<ffffffff81180aa6>] do_vfs_ioctl+0x96/0x550
[36700.671772]  [<ffffffff81170fe3>] ? __sb_end_write+0x33/0x70
[36700.671774]  [<ffffffff81180ff1>] SyS_ioctl+0x91/0xb0
[36700.671775]  [<ffffffff816fcdc2>] system_call_fastpath+0x16/0x1b

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:39 -04:00
Ilya Dryomov 795a332139 Btrfs: stop refusing the relocation of chunk 0
AFAICT chunk 0 is no longer special, and so it should be restriped just
like every other chunk.  One reason for this change is us refusing the
relocation can lead to filesystems that can only be mounted ro, and
never rw -- see the bugzilla [1] for details.  The other reason is that
device removal code is already doing this: it will happily relocate
chunk 0 is part of shrinking the device.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=60594

Reported-by: Xavier Bassery <xavier@bartica.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:38 -04:00
Filipe David Borba Manana d8f980391f Btrfs: fix memory leak of uuid_root in free_fs_info
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:37 -04:00
Andy Shevchenko ed84885d1e btrfs: reuse kbasename helper
To get name of the file from a pathname let's use kbasename() helper. It allows
to simplify code a bit.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:36 -04:00
Anand Jain e57138b3e9 btrfs: return btrfs error code for dev excl ops err
now threads can return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS
as defined in btrfs.h for the dev excl operation error in
the FS, which means with this kernel would stop logging
(almost an user error) into the /var/log/messages

v2: accepts Josef' comment

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:35 -04:00
Josef Bacik 77cef2ec54 Btrfs: allow partial ordered extent completion
We currently have this problem where you can truncate pages that have not yet
been written for an ordered extent.  We do this because the truncate will be
coming behind to clean us up anyway so what's the harm right?  Well if truncate
fails for whatever reason we leave an orphan item around for the file to be
cleaned up later.  But if the user goes and truncates up the file and tries to
read from the area that had been discarded previously they will get a csum error
because we never actually wrote that data out.

This patch fixes this by allowing us to either discard the ordered extent
completely, by which I mean we just free up the space we had allocated and not
add the file extent, or adjust the length of the file extent we write.  We do
this by setting the length we truncated down to in the ordered extent, and then
we set the file extent length and ram bytes to this length.  The total disk
space stays unchanged since we may be compressed and we can't just chop off the
disk space, but at least this way the file extent only points to the valid data.
Then when the file extent is free'd the extent and csums will be freed normally.

This patch is needed for the next series which will give us more graceful
recovery of failed truncates.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:34 -04:00
Josef Bacik b12d6869f6 Btrfs: convert all bug_ons in free-space-cache.c
All of these are logic checks to make sure we're not breaking anything, so
convert them over to ASSERT().  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:33 -04:00
Josef Bacik 2e17c7c65e Btrfs: add support for asserts
One of the complaints we get a lot is how many BUG_ON()'s we have.  So to help
with this I'm introducing a kconfig option to enable/disable a new ASSERT()
mechanism much like what XFS does.  This will allow us developers to still get
our nice panics but allow users/distros to compile them out.  With this we can
go through and convert any BUG_ON()'s that we have to catch actual programming
mistakes to the new ASSERT() and then fix everybody else to return errors.  This
will also allow developers to leave sanity checks in their new code to make sure
we don't trip over problems while testing stuff and vetting new features.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:32 -04:00
Josef Bacik 726551ebc7 Btrfs: adjust the fs_devices->missing count on unmount
I noticed that if I tried to mount a file system with -o degraded after having
done it once already we would fail to mount.  This is because the
fs_devices->missing count was getting bumped everytime we mounted, but not
getting reset whenever we unmounted.  To fix this we just drop the missing count
as we're closing devices to make sure this doesn't happen.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:31 -04:00
Stefan Behrens 23fa76b0ba Btrf: cleanup: don't check for root_refs == 0 twice
btrfs_read_fs_root_no_name() already checks if btrfs_root_refs()
is zero and returns ENOENT in this case. There is no need to do
it again in three more places.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:29 -04:00
Stefan Behrens 4847547172 Btrfs: fix for patch "cleanup: don't check the same thing twice"
Mitch Harder noticed that the patch 3c64a1a mentioned in the subject
line was causing a kernel BUG() on snapshot deletion.

The patch was wrong. It did not handle cached roots correctly. The
check for root_refs == 0 was removed everywhere where
btrfs_read_fs_root_no_name() had been used to retrieve the root,
because this check was already dealt with in
btrfs_read_fs_root_no_name(). But in the case when the root was
found in the cache, there was no such check.

This patch adds the missing check in the case where the root is
found in the cache.

Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:29 -04:00
Stefan Behrens 9d565ba433 Btrfs: get rid of one BUG() in write_all_supers()
The second round uses btrfs_error() and return -EIO, the first round
can handle write errors the same way.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:28 -04:00
Wang Shilong b9e9a6cbc6 Btrfs: allocate prelim_ref with a slab allocater
struct __prelim_ref is allocated and freed frequently when
walking backref tree, using slab allocater can not only
speed up allocating but also detect memory leaks.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:27 -04:00
Wang Shilong 742916b885 Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC
Currently, only add_delayed_refs have to allocate with GFP_ATOMIC,
So just pass arg 'gfp_t' to decide which allocation mode.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:26 -04:00
Filipe David Borba Manana f717175024 Btrfs: fix race conditions in BTRFS_IOC_FS_INFO ioctl
The handler for the ioctl BTRFS_IOC_FS_INFO was reading the
number of devices before acquiring the device list mutex.

This could lead to inconsistent results because the update of
the device list and the number of devices counter (amongst other
counters related to the device list) are updated in volumes.c
while holding the device list mutex - except for 2 places, one
was volumes.c:btrfs_prepare_sprout() and the other was
volumes.c:device_list_add().

For example, if we have 2 devices, with IDs 1 and 2 and then add
a new device, with ID 3, and while adding the device is in progress
an BTRFS_IOC_FS_INFO ioctl arrives, it could return a number of
devices of 2 and a max dev id of 3. This would be incorrect.

Also, this ioctl handler was reading the fsid while it can be
updated concurrently. This can happen when while a new device is
being added and the current filesystem is in seeding mode.
Example:

$ mkfs.btrfs -f /dev/sdb1
$ mkfs.btrfs -f /dev/sdb2
$ btrfstune -S 1 /dev/sdb1
$ mount /dev/sdb1 /mnt/test
$ btrfs device add /dev/sdb2 /mnt/test

If during the last step a BTRFS_IOC_FS_INFO ioctl was requested, it
could read an fsid that was never valid (some bits part of the old
fsid and others part of the new fsid). Also, it could read a number
of devices that doesn't match the number of devices in the list and
the max device id, as explained before.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:25 -04:00
Filipe David Borba Manana d730680184 Btrfs: fix race between removing a dev and writing sbs
This change fixes an issue when removing a device and writing
all super blocks run simultaneously. Here's the steps necessary
for the issue to happen:

1) disk-io.c:write_all_supers() gets a number of N devices from the
   super_copy, so it will not panic if it fails to write super blocks
   for N - 1 devices;

2) Then it tries to acquire the device_list_mutex, but blocks because
   volumes.c:btrfs_rm_device() got it first;

3) btrfs_rm_device() removes the device from the list, then unlocks the
   mutex and after the unlock it updates the number of devices in
   super_copy to N - 1.

4) write_all_supers() finally acquires the mutex, iterates over all the
   devices in the list and gets N - 1 errors, that is, it failed to write
   super blocks to all the devices;

5) Because write_all_supers() thinks there are a total of N devices, it
   considers N - 1 errors to be ok, and therefore won't panic.

So this change just makes sure that write_all_supers() reads the number
of devices from super_copy after it acquires the device_list_mutex.
Conversely, it changes btrfs_rm_device() to update the number of devices
in super_copy before it releases the device list mutex.

The code path to add a new device (volumes.c:btrfs_init_new_device),
already has the right behaviour: it updates the number of devices in
super_copy while holding the device_list_mutex.

The only code path that doesn't lock the device list mutex
before updating the number of devices in the super copy is
disk-io.c:next_root_backup(), called by open_ctree() during
mount time where concurrency issues can't happen.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:24 -04:00
Josef Bacik b8d0c69b94 Btrfs: remove ourselves from the cluster list under lock
A user was reporting weird warnings from btrfs_put_delayed_ref() and I noticed
that we were doing this list_del_init() on our head ref outside of
delayed_refs->lock.  This is a problem if we have people still on the list, we
could end up modifying old pointers and such.  Fix this by removing us from the
list before we do our run_delayed_ref on our head ref.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:23 -04:00
Josef Bacik e8e7cff667 Btrfs: do not clear our orphan item runtime flag on eexist
We were unconditionally clearing our runtime flag on the inode on error when
trying to insert an orphan item.  This is wrong in the case of -EEXIST since we
obviously have an orphan item.  This was causing us to not do the correct
cleanup of our orphan items which caused issues on cleanup.  This happens
because currently when truncate fails we just leave the orphan item on there so
it can be cleaned up, so if we go to remove the file later we will hit this
issue.  What we do for truncate isn't right either, but we shouldn't screw this
sort of thing up on error either, so fix this and then I'll fix truncate in a
different patch.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:22 -04:00
Josef Bacik 57cfd46270 Btrfs: fix send to deal with sparse files properly
Send was just sending everything it found, even if the extent was a hole.  This
is unpleasant for users, so just skip holes when we are sending.  This will also
skip sending prealloc extents since the send spec doesn't have a prealloc
command.  Eventually we will add a prealloc command and rev the send version so
we can send down the prealloc info.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:21 -04:00
Filipe David Borba Manana bdab49d760 Btrfs: fix printing of non NULL terminated string
The name buffer is not terminated by a '\0' character,
therefore it needs to be printed with %.*s and use the
length of the buffer.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:20 -04:00
Geert Uytterhoeven 8d78eb1663 Btrfs: Use %z to format size_t
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:19 -04:00
Geert Uytterhoeven fce29364e5 Btrfs: Do not truncate sector_t on 32-bit with CONFIG_LBDAF=y
sector_t may be either "u64" (always 64 bit) or "unsigned long" (32 or 64
bit).  Casting it to "unsigned long" will truncate it on 32-bit platforms
where CONFIG_LBDAF=y.

Cast to "unsigned long long" and format using "ll" instead.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:18 -04:00
Geert Uytterhoeven 778746b53b Btrfs: PAGE_CACHE_SIZE is already unsigned long
PAGE_CACHE_SIZE == PAGE_SIZE is "unsigned long" everywhere, so there's no
need to cast it to "unsigned long".

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:17 -04:00
Geert Uytterhoeven b308bc2f05 Btrfs: Make btrfs_header_chunk_tree_uuid() return unsigned long
Internally, btrfs_header_chunk_tree_uuid() calculates an unsigned long, but
casts it to a pointer, while all callers cast it to unsigned long again.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:16 -04:00
Geert Uytterhoeven fba6aa7565 Btrfs: Make btrfs_header_fsid() return unsigned long
Internally, btrfs_header_fsid() calculates an unsigned long, but casts
it to a pointer, while all callers cast it to unsigned long again.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:15 -04:00
Geert Uytterhoeven 231e88f410 Btrfs: Make btrfs_dev_extent_chunk_tree_uuid() return unsigned long
Internally, btrfs_dev_extent_chunk_tree_uuid() calculates an unsigned long,
but casts it to a pointer, while all callers cast it to unsigned long
again.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:14 -04:00
Geert Uytterhoeven 1473b24ee0 Btrfs: Make btrfs_device_fsid() return unsigned long
All callers of btrfs_device_fsid() cast its return type to unsigned long.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:13 -04:00
Geert Uytterhoeven 410ba3a291 Btrfs: Make btrfs_device_uuid() return unsigned long
All callers of btrfs_device_uuid() cast its return type to unsigned long.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:12 -04:00
Geert Uytterhoeven 118a0a2514 Btrfs: Format mirror_num as int
mirror_num is always "int", hence don't cast it to "unsigned long long" and
format it as a 64-bit number.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:11 -04:00
Geert Uytterhoeven 27f9f02357 Btrfs: Format PAGE_SIZE as unsigned long
PAGE_SIZE is "unsigned long" everywhere, so there's no need to cast it to
"unsigned long long" and format it as a 64-bit number.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:10 -04:00
Geert Uytterhoeven 6e71c47afe Btrfs: Make BTRFS_DEV_REPLACE_DEVID an unsigned long long constant
The internal btrfs device id is a u64, hence make the constant
BTRFS_DEV_REPLACE_DEVID "unsigned long long" as well, so we no longer need
a cast to print it.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:09 -04:00
Geert Uytterhoeven c1c9ff7c94 Btrfs: Remove superfluous casts from u64 to unsigned long long
u64 is "unsigned long long" on all architectures now, so there's no need to
cast it when formatting it using the "ll" length modifier.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:08 -04:00
Filipe David Borba Manana 1cb048f596 Btrfs: fix memory leak of orphan block rsv
This issue is simple to reproduce and observe if kmemleak is enabled.
Two simple ways to reproduce it:

** 1

$ mkfs.btrfs -f /dev/loop0
$ mount /dev/loop0 /mnt/btrfs
$ btrfs balance start /mnt/btrfs
$ umount /mnt/btrfs

** 2

$ mkfs.btrfs -f /dev/loop0
$ mount /dev/loop0 /mnt/btrfs
$ touch /mnt/btrfs/foobar
$ rm -f /mnt/btrfs/foobar
$ umount /mnt/btrfs

After a while, kmemleak reports the leak:

$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff880402b13e00 (size 128):
  comm "btrfs", pid 19621, jiffies 4341648183 (age 70057.844s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 fc c6 b1 04 88 ff ff 04 00 04 00 ad 4e ad de  .............N..
  backtrace:
    [<ffffffff817275a6>] kmemleak_alloc+0x26/0x50
    [<ffffffff8117832b>] kmem_cache_alloc_trace+0xeb/0x1d0
    [<ffffffffa04db499>] btrfs_alloc_block_rsv+0x39/0x70 [btrfs]
    [<ffffffffa04f8bad>] btrfs_orphan_add+0x13d/0x1b0 [btrfs]
    [<ffffffffa04e2b13>] btrfs_remove_block_group+0x143/0x500 [btrfs]
    [<ffffffffa0518158>] btrfs_relocate_chunk.isra.63+0x618/0x790 [btrfs]
    [<ffffffffa051bc27>] btrfs_balance+0x8f7/0xe90 [btrfs]
    [<ffffffffa05240a0>] btrfs_ioctl_balance+0x250/0x550 [btrfs]
    [<ffffffffa05269ca>] btrfs_ioctl+0xdfa/0x25f0 [btrfs]
    [<ffffffff8119c936>] do_vfs_ioctl+0x96/0x570
    [<ffffffff8119cea1>] SyS_ioctl+0x91/0xb0
    [<ffffffff81750242>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff

This affects btrfs-next, revision be8e3cd00d7293dd177e3f8a4a1645ce09ca3acb
(Btrfs: separate out tests into their own directory).

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:07 -04:00
Ilya Dryomov a1e8780a89 Btrfs: rollback btrfs_device fields on umount
It turns out we don't properly rollback in-core btrfs_device state on
umount.  We zero out ->bdev, ->in_fs_metadata and that's about it.  In
particular, we don't zero out ->generation, and this can lead to us
refusing a mount -- a non-NULL fs_devices->latest_bdev is essential, but
btrfs_close_extra_devices will happily assign NULL to ->latest_bdev if
the first device on the dev_list happens to be missing and consequently
has no bdev attached.  This happens because since commit a6b0d5c8
btrfs_close_extra_devices adjusts ->latest_bdev, and in doing that,
relies on the ->generation.  Fix this, and possibly other problems, by
zeroing out everything except for what device_list_add sets, so that a
mount right after insmod and 'btrfs dev scan' is no different from any
later mount in this respect.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:06 -04:00
Ilya Dryomov 2208a378f3 Btrfs: add alloc_fs_devices and switch to it
In the spirit of btrfs_alloc_device, add a helper for allocating and
doing some common initialization of btrfs_fs_devices struct.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:05 -04:00
Ilya Dryomov 12bd2fc0d2 Btrfs: add btrfs_alloc_device and switch to it
Currently btrfs_device is allocated ad-hoc in a few different places,
and as a result not all fields are initialized properly.  In particular,
readahead state is only initialized in device_list_add (at scan time),
and not in btrfs_init_new_device (when the new device is added with
'btrfs dev add').  Fix this by adding an allocation helper and switch
everybody but __btrfs_close_devices to it.  (__btrfs_close_devices is
dealt with in a later commit.)

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:04 -04:00
Ilya Dryomov 53f10659f9 Btrfs: find_next_devid: root -> fs_info
find_next_devid() knows which root to search, so it should take an
fs_info instead of an arbitrary root.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:03 -04:00
Stefan Behrens bbb651e469 Btrfs: don't allow the replace procedure on read only filesystems
If you start the replace procedure on a read only filesystem, at
the end the procedure fails to write the updated dev_items to the
chunk tree. The problem is that this error is not indicated except
for a WARN_ON(). If the user now thinks that everything was done
as expected and destroys the source device (with mkfs or with a
hammer). The next mount fails with "failed to read chunk root" and
the filesystem is gone.

This commit adds code to fail the attempt to start the replace
procedure if the filesystem is mounted read-only.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:02 -04:00
Filipe David Borba Manana 633085c79c Btrfs: reset force_compress on btrfs_file_defrag failure
After we set force_compress with a new value (which was not being done
while holding the inode mutex), if an error happens and we jump to
the label out_ra, the force_compress property of the inode is not set
to BTRFS_COMPRESS_NONE (unlike in the case where no errors happen).

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:00 -04:00
Stefan Behrens f420ee1e92 Btrfs: add mount option to force UUID tree checking
This should never be needed, but since all functions are there
to check and rebuild the UUID tree, a mount option is added that
allows to force this check and rebuild procedure.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:15:59 -04:00