Commit graph

342 commits

Author SHA1 Message Date
Dan Carpenter 75af271ed5 dlm: NULL dereference on failure in kmem_cache_create()
We aren't allowed to pass NULL pointers to kmem_cache_destroy() so if
both allocations fail, it leads to a NULL dereference.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-15 10:39:28 -05:00
David Teigland 4875647a08 dlm: fixes for nodir mode
The "nodir" mode (statically assign master nodes instead
of using the resource directory) has always been highly
experimental, and never seriously used.  This commit
fixes a number of problems, making nodir much more usable.

- Major change to recovery: recover all locks and restart
  all in-progress operations after recovery.  In some
  cases it's not possible to know which in-progess locks
  to recover, so recover all.  (Most require recovery
  in nodir mode anyway since rehashing changes most
  master nodes.)

- Change the way nodir mode is enabled, from a command
  line mount arg passed through gfs2, into a sysfs
  file managed by dlm_controld, consistent with the
  other config settings.

- Allow recovering MSTCPY locks on an rsb that has not
  yet been turned into a master copy.

- Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
  from a previous, aborted recovery cycle.  Base this
  on the local recovery status not being in the state
  where any nodes should be sending LOCK messages for the
  current recovery cycle.

- Hold rsb lock around dlm_purge_mstcpy_locks() because it
  may run concurrently with dlm_recover_master_copy().

- Maintain highbast on process-copy lkb's (in addition to
  the master as is usual), because the lkb can switch
  back and forth between being a master and being a
  process copy as the master node changes in recovery.

- When recovering MSTCPY locks, flag rsb's that have
  non-empty convert or waiting queues for granting
  at the end of recovery.  (Rename flag from LOCKS_PURGED
  to RECOVER_GRANT and similar for the recovery function,
  because it's not only resources with purged locks
  that need grant a grant attempt.)

- Replace a couple of unnecessary assertion panics with
  error messages.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-02 14:15:27 -05:00
David Teigland 6d40c4a708 dlm: improve error and debug messages
Change some existing error/debug messages to
collect more useful information, and add
some new error/debug messages to address
recently found problems.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:41:46 -05:00
David Teigland 57638bf3aa dlm: avoid unnecessary search in search_rsb
If the rsb is found in the "keep" tree, but is
not the right type (i.e. not MASTER), we can
return immediately with the result.  There's
no point in going on to search the "toss" list
as if we hadn't found it.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:37:56 -05:00
David Teigland d6e24788d2 dlm: limit rcom debug messages
Unify the checking for both types of ignored
rcom messages, and replace the two log_debug
statements with a single, rate limited debug
message.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:37:37 -05:00
David Teigland 13ef11110f dlm: fix waiter recovery
An outstanding remote operation (an lkb on the "waiter"
list) could sometimes miss being resent during recovery.
The decision was based on the lkb_nodeid field, which
could have changed during an earlier aborted recovery,
so it no longer represents the actual remote destination.
The lkb_wait_nodeid is always the actual remote node,
so it is the best value to use.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:36:04 -05:00
David Teigland 513ef596d4 dlm: prevent connections during shutdown
During lowcomms shutdown, a new connection could possibly
be created, and attempt to use a workqueue that's been
destroyed.  Similarly, during startup, a new connection
could attempt to use a workqueue that's not been set up
yet.  Add a global variable to indicate when new connections
are allowed.

Based on patch by: Christine Caulfield <ccaulfie@redhat.com>

Reported-by: dann frazier <dann.frazier@canonical.com>
Reviewed-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-26 15:35:38 -05:00
Linus Torvalds 721b024bd4 dlm fixes for 3.4
This includes one short patch fixing the behavior of
 the QUECVT flag, which the gfs2 folks are waiting on.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPlYXZAAoJEDgbc8f8gGmqpzYP/RFkCn8mC5y5cM8lWBk2JQAJ
 u7khyqowm3TWxjIpX85n7Uxq1vEX4RxFiRzCeiZj3ZoWE3PEQim8Tqrw8SFs8lcT
 y7oYL6TBkgCbM1ROuKDqXRiw8oRAfRud3cqtRvQzxuds3AoaoyYvE6N+to2y9XlR
 5DuUBJEtrpKOEdW1ZeXeUmCnvDwrUyEFuIlACoyochzbk6ug1EF926dgSaViE4ZG
 OFcGMy8ELNqVYibVcJof2ZfztTvrMcXPIpsJrkK5tIW6w6q+2+eN4Xc2/xMZ4OYc
 5AHHXxrqbK1ZABLrqsK/lUQi0Z241kAnqIi33i2nl3mhWSDF3K5CNXmrF9rvGsN7
 wEqsfdGOnwFQucF1VU95neo+jYMnom9VGodpvSop7Xy5r+i59MPcfMDfz/I1KqX7
 vBDuM5rwisYNfOb6wsfFNcBhkf1ktgo2h2iH5UdIaWfHApF1Lnls7D2j/o7r2uxF
 tRd4sPhRt2eIn68XRggbWOVxMfdUKtaW50ZhKzW9osMItYX748O8XfQdk0sQUbD9
 ZXbFEfbfsfRgMKhMSyNFcGDh6ePsT/cmZL/zR5VKVEHuprL3hEDPhCui5GT0Sm1G
 9sXpLu9p51r0d4OIJpScOFMv8aD64w/mwLJ3r5nrGZz2APK9SwWJOqX82fyqivQc
 uvO42yNGkwSGnBjXKiM6
 =KDNZ
 -----END PGP SIGNATURE-----

Merge tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm fixes from David Teigland:
 "This includes one short patch fixing the behavior of the QUECVT flag,
  which the gfs2 folks are waiting on."

* tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: fix QUECVT when convert queue is empty
2012-04-23 18:22:42 -07:00
David Teigland 53ad1c980d dlm: fix QUECVT when convert queue is empty
The QUECVT flag should not prevent conversions from
being granted immediately when the convert queue is
empty.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-04-23 11:30:59 -05:00
Stephen Boyd 234e340582 simple_open: automatically convert to simple_open()
Many users of debugfs copy the implementation of default_open() when
they want to support a custom read/write function op.  This leads to a
proliferation of the default_open() implementation across the entire
tree.

Now that the common implementation has been consolidated into libfs we
can replace all the users of this function with simple_open().

This replacement was done with the following semantic patch:

<smpl>
@ open @
identifier open_f != simple_open;
identifier i, f;
@@
-int open_f(struct inode *i, struct file *f)
-{
(
-if (i->i_private)
-f->private_data = i->i_private;
|
-f->private_data = i->i_private;
)
-return 0;
-}

@ has_open depends on open @
identifier fops;
identifier open.open_f;
@@
struct file_operations fops = {
...
-.open = open_f,
+.open = simple_open,
...
};
</smpl>

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
Linus Torvalds 30d73f3752 dlm for 3.4
This set includes one trivial fix, and one simple recovery
 speed up.  Directory recovery can use the standard hash table
 to find resources rather than always searching the linear
 recovery list.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPahBcAAoJEDgbc8f8gGmqeHEP/i288yZV8NVbIJG7XpX9JjTY
 4n4R1CI/qTMDn74GXDkk/OolHc8XTSQwbp02oFlJbPzj71lsWBUWijTAnwxiLIRz
 OHQg7eZ2aYL0YmaxAlvM2/6xLNOINmLW/DVwwH4QnpnSB4ymoCHBzyXxrNxgvgRv
 KWKUUXj7SDaUmbcK0TFZ39VprTmpw3L+mXIm+Y6kCCS2m4GfISp3Zij4OnxztA/c
 brex0R97EoZwrQOvPSRbVA5IaK6BjwfNScXAKsYCOSLsd+tvelD+UgYBdVHBTOmG
 godQ5pg8C7SpUB9NQqnLc8r78xpIUcOHQbWRqtwNQ2/6uPI/mWFj+lhpcHRmmzPk
 TczdDZVg+pIl9U+SMqiG689KgvnUTciPte0sYqksEbk3NqUMJOWOB7Cv79ZYquaV
 Pdmg788Essq7/5BmgeSRlOvS08RvdVfHXqYGOA6/tJ3f0b15M1YuSLjJdwYVWJkS
 gVmo4raN44Yh99R/+eNqeI8dvoVfd1pNDAD9VYXk4KdIv3AtKfRZi8XvWZ0o5uQI
 EdXTIhiA78ogjxG92cnnzj3+CAIpK4Iv1s53Y0KZgJ7gyExvVHyGp7zl1J7hlFLP
 jLuORsL+xMKTGbSWom796QuVn3jL/CGj/OKbnd1D98S0uRuSS6wiy/6ucBFaKmt0
 HvT7AVcX2Gh6t/qdTJ9h
 =ByvY
 -----END PGP SIGNATURE-----

Merge tag 'dlm-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates for 3.4 from David Teigland:
 "This set includes one trivial fix, and one simple recovery speed up.
  Directory recovery can use the standard hash table to find resources
  rather than always searching the linear recovery list."

* tag 'dlm-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: last element of dlm_local_addr[] never used
  dlm: fix slow rsb search in dir recovery
2012-03-21 13:54:22 -07:00
David Teigland 1b189b8889 dlm: last element of dlm_local_addr[] never used
The last element of dlm_local_addr[DLM_MAX_ADDR_COUNT]
was not used because the loop ended at COUNT - 1.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-03-21 09:18:34 -05:00
Benjamin Poirier 2f2d76cc3e dlm: Do not allocate a fd for peeloff
avoids allocating a fd that a) propagates to every kernel thread and
usermodehelper b) is not properly released.

References: http://article.gmane.org/gmane.linux.network.drbd/22529
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 13:52:09 -08:00
David Teigland 7210cb7a72 dlm: fix slow rsb search in dir recovery
The function used to find an rsb during directory
recovery was searching the single linear list of
rsb's.  This wasted a lot of time compared to
using the standard hash table to find the rsb.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-03-08 14:46:30 -06:00
Linus Torvalds 49d41bae46 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: add recovery callbacks
  dlm: add node slots and generation
  dlm: move recovery barrier calls
  dlm: convert rsb list to rb_tree
2012-01-10 14:55:55 -08:00
David Teigland 60f98d1839 dlm: add recovery callbacks
These new callbacks notify the dlm user about lock recovery.
GFS2, and possibly others, need to be aware of when the dlm
will be doing lock recovery for a failed lockspace member.

In the past, this coordination has been done between dlm and
file system daemons in userspace, which then direct their
kernel counterparts.  These callbacks allow the same
coordination directly, and more simply.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-04 08:56:31 -06:00
David Teigland 757a427196 dlm: add node slots and generation
Slot numbers are assigned to nodes when they join the lockspace.
The slot number chosen is the minimum unused value starting at 1.
Once a node is assigned a slot, that slot number will not change
while the node remains a lockspace member.  If the node leaves
and rejoins it can be assigned a new slot number.

A new generation number is also added to a lockspace.  It is
set and incremented during each recovery along with the slot
collection/assignment.

The slot numbers will be passed to gfs2 which will use them as
journal id's.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-04 08:55:57 -06:00
David Teigland f95a34c665 dlm: move recovery barrier calls
Put all the calls to recovery barriers in the same function
to clarify where they each happen.  Should not change any behavior.
Also modify some recovery debug lines to make them consistent.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-04 08:53:27 -06:00
Alexey Dobriyan 4e3fd7a06d net: remove ipv6_addr_copy()
C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22 16:43:32 -05:00
Bob Peterson 9beb3bf5a9 dlm: convert rsb list to rb_tree
Change the linked lists to rb_tree's in the rsb
hash table to speed up searches.  Slow rsb searches
were having a large impact on gfs2 performance due
to the large number of dlm locks gfs2 uses.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2011-11-18 10:20:15 -06:00
Linus Torvalds 2dad3206db Merge branch 'for-3.1' of git://linux-nfs.org/~bfields/linux
* 'for-3.1' of git://linux-nfs.org/~bfields/linux:
  nfsd: don't break lease on CLAIM_DELEGATE_CUR
  locks: rename lock-manager ops
  nfsd4: update nfsv4.1 implementation notes
  nfsd: turn on reply cache for NFSv4
  nfsd4: call nfsd4_release_compoundargs from pc_release
  nfsd41: Deny new lock before RECLAIM_COMPLETE done
  fs: locks: remove init_once
  nfsd41: check the size of request
  nfsd41: error out when client sets maxreq_sz or maxresp_sz too small
  nfsd4: fix file leak on open_downgrade
  nfsd4: remember to put RW access on stateid destruction
  NFSD: Added TEST_STATEID operation
  NFSD: added FREE_STATEID operation
  svcrpc: fix list-corrupting race on nfsd shutdown
  rpc: allow autoloading of gss mechanisms
  svcauth_unix.c: quiet sparse noise
  svcsock.c: include sunrpc.h to quiet sparse noise
  nfsd: Remove deprecated nfsctl system call and related code.
  NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND

Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
2011-07-25 22:49:19 -07:00
J. Bruce Fields 8fb47a4fbf locks: rename lock-manager ops
Both the filesystem and the lock manager can associate operations with a
lock.  Confusingly, one of them (fl_release_private) actually has the
same name in both operation structures.

It would save some confusion to give the lock-manager ops different
names.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-20 20:23:19 -04:00
David Teigland 10d1459faf dlm: don't limit active work items
Allow multiple workqueue items (locks with callbacks) to be
processed concurrently.  There should be no reason not to
take advantage of this workqueue feature.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-19 14:22:32 -05:00
David Teigland 23e8e1aaac dlm: use workqueue for callbacks
Instead of creating our own kthread (dlm_astd) to deliver
callbacks for all lockspaces, use a per-lockspace workqueue
to deliver the callbacks.  This eliminates complications and
slowdowns from many lockspaces sharing the same thread.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-15 12:30:43 -05:00
David Teigland 883ba74f43 dlm: remove deadlock debug print
gfs2 recently began using this feature heavily,
creating more debug output than we want to see.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-14 12:31:49 -05:00
David Teigland 3881ac04eb dlm: improve rsb searches
By pre-allocating rsb structs before searching the hash
table, they can be inserted immediately.  This avoids
always having to repeat the search when adding the struct
to hash list.

This also adds space to the rsb struct for a max resource
name, so an rsb allocation can be used by any request.
The constant size also allows us to finally use a slab
for the rsb structs.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-12 16:02:09 -05:00
David Teigland 3d6aa675ff dlm: keep lkbs in idr
This is simpler and quicker than the hash table, and
avoids needing to search the hash list for every new
lkid to check if it's used.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-11 08:43:45 -05:00
David Teigland a22ca48068 dlm: fix kmalloc args
The gfp and size args were switched.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-11 08:40:53 -05:00
Jesper Juhl 5d70828a77 dlm: don't do pointless NULL check, use kzalloc and fix order of arguments
In fs/dlm/lock.c in the dlm_scan_waiters() function there are 3 small
issues:

1) There's no need to test the return value of the allocation and do a
memset if is succeedes. Just use kzalloc() to obtain zeroed memory.

2) Since kfree() handles NULL pointers gracefully, the test of
'warned' against NULL before the kfree() after the loop is completely
pointless. Remove it.

3) The arguments to kmalloc() (now kzalloc()) were swapped. Thanks to
Dr. David Alan Gilbert for pointing this out.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-11 08:39:42 -05:00
Masatake YAMATO bcaadf5c1a dlm: dump address of unknown node
When the dlm fails to make a network connection to another
node, include the address of the node in the error message.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-06 16:37:23 -05:00
Bryn M. Reeves c282af4990 dlm: use vmalloc for hash tables
Allocate dlm hash tables in the vmalloc area to allow a greater
maximum size without restructuring of the hash table code.

Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2011-07-01 15:49:23 -05:00
Masatake YAMATO 55b3286d3d dlm: show addresses in configfs
Display all addresses the dlm is using for the local node
from the configfs file config/dlm/<cluster>/comms/<comm>/addr_list
Also make the addr file write only.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2011-06-30 14:45:28 -05:00
Linus Torvalds b7c2f03628 Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
  gfs2: Drop __TIME__ usage
  isdn/diva: Drop __TIME__ usage
  atm: Drop __TIME__ usage
  dlm: Drop __TIME__ usage
  wan/pc300: Drop __TIME__ usage
  parport: Drop __TIME__ usage
  hdlcdrv: Drop __TIME__ usage
  baycom: Drop __TIME__ usage
  pmcraid: Drop __DATE__ usage
  edac: Drop __DATE__ usage
  rio: Drop __DATE__ usage
  scsi/wd33c93: Drop __TIME__ usage
  scsi/in2000: Drop __TIME__ usage
  aacraid: Drop __TIME__ usage
  media/cx231xx: Drop __TIME__ usage
  media/radio-maxiradio: Drop __TIME__ usage
  nozomi: Drop __TIME__ usage
  cyclades: Drop __TIME__ usage
2011-05-26 13:19:00 -07:00
Michal Marek 75ce481e15 dlm: Drop __TIME__ usage
The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.

Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-05-26 09:46:17 +02:00
Linus Torvalds df3256f9ab Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: make plock operation killable
  dlm: remove shared message stub for recovery
  dlm: delayed reply message warning
  dlm: Remove superfluous call to recalc_sigpending()
2011-05-24 15:04:00 -07:00
David Teigland 901025d2f3 dlm: make plock operation killable
Allow processes blocked on plock requests to be interrupted
when they are killed.  This leaves the problem of cleaning
up the lock state in userspace.  This has three parts:

1. Add a flag to unlock operations sent to userspace
indicating the file is being closed.  Userspace will
then look for and clear any waiting plock operations that
were abandoned by an interrupted process.

2. Queue an unlock-close operation (like in 1) to clean up
userspace from an interrupted plock request.  This is needed
because the vfs will not send a cleanup-unlock if it sees no
locks on the file, which it won't if the interrupted operation
was the only one.

3. Do not use replies from userspace for unlock-close operations
because they are unnecessary (they are just cleaning up for the
process which did not make an unlock call).  This also simplifies
the new unlock-close generated from point 2.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-05-23 10:47:06 -05:00
David Teigland 2a7ce0edd6 dlm: remove shared message stub for recovery
kmalloc a stub message struct during recovery instead of sharing the
struct in the lockspace.  This leaves the lockspace stub_ms only for
faking downconvert replies, where it is never modified and sharing
is not a problem.

Also improve the debug messages in the same recovery function.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-04-05 10:54:47 -05:00
David Teigland c6ff669bac dlm: delayed reply message warning
Add an option (disabled by default) to print a warning message
when a lock has been waiting a configurable amount of time for
a reply message from another node.  This is mainly for debugging.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-04-01 14:19:06 -05:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Matt Fleming 4bcad6c1ef dlm: Remove superfluous call to recalc_sigpending()
recalc_sigpending() is called within sigprocmask(), so there is no
need call it again after sigprocmask() has returned.

Signed-off-by: Matt Fleming <matt.fleming@linux.intel.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2011-03-28 10:20:17 -05:00
David Teigland e43f055a95 dlm: use alloc_workqueue function
Replaces deprecated create_singlethread_workqueue().

Signed-off-by: David Teigland <teigland@redhat.com>
2011-03-10 13:22:34 -06:00
David Teigland e3853a90e2 dlm: increase default hash table sizes
Make all three hash tables a consistent size of 1024
rather than 1024, 512, 256.  All three tables, for
resources, locks, and lock dir entries, will generally
be filled to the same order of magnitude.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-03-10 13:08:22 -06:00
David Teigland 8304d6f24c dlm: record full callback state
Change how callbacks are recorded for locks.  Previously, information
about multiple callbacks was combined into a couple of variables that
indicated what the end result should be.  In some situations, we
could not tell from this combined state what the exact sequence of
callbacks were, and would end up either delivering the callbacks in
the wrong order, or suppress redundant callbacks incorrectly.  This
new approach records all the data for each callback, leaving no
uncertainty about what needs to be delivered.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-03-10 10:40:00 -06:00
David Teigland 6b155c8fd4 dlm: use single thread workqueues
The recent commit to use cmwq for send and recv threads
dcce240ead introduced problems,
apparently due to multiple workqueue threads.  Single threads
make the problems go away, so return to that until we fully
understand the concurrency issues with multiple threads.

Signed-off-by: David Teigland <teigland@redhat.com>
2011-02-11 16:50:47 -06:00
Nicholas Bellinger 86c747d2a4 dlm: Make DLM depend on CONFIGFS_FS
This patch fixes the following kconfig error after changing
CONFIGFS_FS -> select SYSFS:

fs/sysfs/Kconfig:1:error: recursive dependency detected!
fs/sysfs/Kconfig:1:	symbol SYSFS is selected by CONFIGFS_FS
fs/configfs/Kconfig:1:	symbol CONFIGFS_FS is selected by DLM
fs/dlm/Kconfig:1:	symbol DLM depends on SYSFS

Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: James Bottomley <James.Bottomley@suse.de>
2011-01-16 21:22:37 +00:00
Namhyung Kim b9d4105279 dlm: sanitize work_start() in lowcomms.c
The create_workqueue() returns NULL if failed rather than ERR_PTR().
Fix error checking and remove unnecessary variable 'error'.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-12-13 13:42:24 -06:00
Bob Peterson f92c8dd7a0 dlm: reduce cond_resched during send
Calling cond_resched() after every send can unnecessarily
degrade performance.  Go back to an old method of scheduling
after 25 messages.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-11-12 11:15:20 -06:00
David Teigland cb2d45da81 dlm: use TCP_NODELAY
Nagling doesn't help and can sometimes hurt dlm comms.

Signed-off-by: David Teigland <teigland@redhat.com>
2010-11-12 11:12:55 -06:00
Steven Whitehouse dcce240ead dlm: Use cmwq for send and receive workqueues
So far as I can tell, there is no reason to use a single-threaded
send workqueue for dlm, since it may need to send to several sockets
concurrently. Both workqueues are set to WQ_MEM_RECLAIM to avoid
any possible deadlocks, WQ_HIGHPRI since locking traffic is highly
latency sensitive (and to avoid a priority inversion wrt GFS2's
glock_workqueue) and WQ_FREEZABLE just in case someone needs to do
that (even though with current cluster infrastructure, it doesn't
make sense as the node will most likely land up ejected from the
cluster) in the future.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-11-12 11:08:03 -06:00
David Miller b36930dd50 dlm: Handle application limited situations properly.
In the normal regime where an application uses non-blocking I/O
writes on a socket, they will handle -EAGAIN and use poll() to
wait for send space.

They don't actually sleep on the socket I/O write.

But kernel level RPC layers that do socket I/O operations directly
and key off of -EAGAIN on the write() to "try again later" don't
use poll(), they instead have their own sleeping mechanism and
rely upon ->sk_write_space() to trigger the wakeup.

So they do effectively sleep on the write(), but this mechanism
alone does not let the socket layers know what's going on.

Therefore they must emulate what would have happened, otherwise
TCP cannot possibly see that the connection is application window
size limited.

Handle this, therefore, like SUNRPC by setting SOCK_NOSPACE and
bumping the ->sk_write_count as needed when we hit the send buffer
limits.

This should make TCP send buffer size auto-tuning and the
->sk_write_space() callback invocations actually happen.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-11-11 13:05:12 -06:00