Commit graph

209769 commits

Author SHA1 Message Date
Paul Mundt 25ab998e2e mfd: Fix up section mismatches in SH SDHI.
The current probe/remove definitions are split between __init and
__devexit, make them consistent by switching to __devinit.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-16 17:14:40 +09:00
Arnd Bergmann d71415e884 sh: kill big kernel lock
The only BKL user in arch/sh protects a single bit,
so we can trivially replace it with test_and_set_bit.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-16 16:37:56 +09:00
Paul Mundt 5d75b3a247 Revert "sh: ecovec24: modify tsc2007 platform settings"
According to Morimoto-san, this is no longer needed. Revert it.

This reverts commit e0009b0a44.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-14 18:26:38 +09:00
Paul Mundt 51d149be0a sh: Provide a non-multiplexed sys_recvmmsg path.
Now that the rest of the socket calls are provided through their own
paths, do the same for sys_recvmmsg. It's unlikely we'll ever be able to
kill off the socketcall path, but this at least permits userspace to
gradually begin migrating.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-14 17:43:11 +09:00
Carmelo AMOROSO 459ebb34bd sh: Add syscall entries for non multiplexed socket calls
Linux kernel already has socket syscalls that can be invoked
without the multiplexing sys_socketcall wrapper.
C library wrappers are ready to use them directly. It needs just
to define the missing syscall numbers and provide the related entries
into the syscalls table, like sh64 aleady does.

Signed-off-by: Francesco Rundo <francesco.rundo@st.com>
Signed-off-by: Carmelo Amoroso <carmelo.amoroso@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-14 17:37:53 +09:00
Kuninori Morimoto e0009b0a44 sh: ecovec24: modify tsc2007 platform settings
This patch modify x_plate_ohms to correct value for tsc2007 panel,
and removed un-necessary ts_get_pendown_state()

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-14 17:35:57 +09:00
Matt Fleming 248bc0d93b sh: Set CONFIG_SYSFS_DEPRECATED_V2=n
As the help for the config option suggests, this option really shouldn't
be set by default for any recent distribution as it changes the layout
of sysfs. I spotted this while running debian when udev got very
confused by the sysfs layout and failed to create some device nodes.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-14 17:35:52 +09:00
Paul Mundt 52204705b2 Merge branch 'sh/pci-express-integration' 2010-09-07 17:56:27 +09:00
Paul Mundt 1c3bb3871a sh: Hook up 3rd memory window for all SH7786 PCIe channels.
Now that the resource assignment issues are resolved, we can finally wire
up the small third memory window -- in the future we may reclaim this for
MSI.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-07 17:07:05 +09:00
Paul Mundt f048519309 sh: Properly wire up channel 2's I/O window on SH7786 PCIe.
An IORESOURCE_IO was missing here, which meant that we weren't properly
establishing the I/O window for this particular slot. With this
corrected, cards with I/O BARs have them actually assigned and
accessible.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-07 17:05:08 +09:00
Paul Mundt da03a63ac8 sh: Ignore 32-bit windows in 29-bit mode for SH7786 PCIe.
Certain memory windows are only available for 32-bit space, so skip over
these in 29-bit mode. This will severely restrict the amount of memory
that can be mapped, but since a boot loader bug makes booting in 29-bit
mode close to impossible anyways, everything is ok.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-07 17:03:10 +09:00
Paul Mundt 2c5f674339 sh: Establish a SuperHyway<->PCIe window mapping on SH7786 PCIe.
This bumps up the low address to match the physical memory windows for
SHway<->PCIe transfers. The previous implementation was banking on a 1:1
virt<->phys SHway mapping, which doesn't apply here.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-07 16:12:26 +09:00
Paul Mundt 2dbfa1e37d sh: Make SH7786 PCIe port reset logic more aggressive.
This attempts a more complete port reset, building on top of the existing
approach.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-07 16:11:04 +09:00
Matt Fleming 9ec1651668 sh: Additional register definitions for SH7786 PCIe.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-09-07 16:09:14 +09:00
Paul Mundt b9afa3e015 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:
	arch/sh/kernel/process_32.c

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-08-20 20:52:23 +09:00
Paul Mundt 144c749423 Merge branch 'sh/pci-express-integration' 2010-08-20 20:39:22 +09:00
Paul Mundt d2d5bc58d7 sh: sh2007/sh7757lcr defconfig reduction.
These were newly added while the defconfig reduction work was ongoing,
strip them down now.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-08-20 20:38:24 +09:00
Paul Mundt 65c23f54c0 sh: Relax devfn constraints for SH7786 PCIe.
SH7786 PCIe has 1 slot per port, but no specific restriction on function.
Relax the devfn restriction and look to the slot number instead when
configured as a root complex.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-08-20 20:26:41 +09:00
Paul Mundt 960bc368e7 sh: reinstate clock framework rate rounding.
This was killed off by a simplification patch previously that failed to
take the cpufreq use case in to account, so reinstate the old bounding
logic. The lowest rate bounding on the other hand was broken in that it
never actually got assigned a rate and the best fit rate was instead just
getting lucky based on the ordering of the rate table, fix this up so the
code actually does what it was intended to do originally.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-08-20 19:10:38 +09:00
Paul Mundt 53178d71b9 sh: Fix up SH7786 PCIe PHY initialization.
This brings the clocking and register setting in line with the somewhat
factually ambiguous specification.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-08-20 16:04:59 +09:00
Paul Mundt 7656e2486c sh: Support type 1 accesses for SH7786 PCI.
This enables support for type 1 config space accesses on the SH7786
PCI controller. At the same time, add in some extra sanity checks for
controller asserted errors.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-08-20 15:59:40 +09:00
Linus Torvalds 763008c435 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix an Oops in the NFSv4 atomic open code
  NFS: Fix the selection of security flavours in Kconfig
  NFS: fix the return value of nfs_file_fsync()
  rpcrdma: Fix SQ size calculation when memreg is FRMR
  xprtrdma: Do not truncate iova_start values in frmr registrations.
  nfs: Remove redundant NULL check upon kfree()
  nfs: Add "lookupcache" to displayed mount options
  NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
  SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
2010-08-18 15:45:23 -07:00
Linus Torvalds d1126ad907 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  USB HID: Add ID for eGalax Multitouch used in JooJoo tablet
  HID: hiddev: fix memory corruption due to invalid intfdata
  HID: hiddev: protect against disconnect/NULL-dereference race
  HID: picolcd: correct ordering of framebuffer freeing
  HID: picolcd: testing the wrong variable
2010-08-18 15:29:38 -07:00
Linus Torvalds 2a554736f0 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Fix build error: conflicting types for ‘sys_execve’
2010-08-18 13:27:41 -07:00
David Howells d15ca32037 Fix the declaration of sys_execve() in asm-generic/syscalls.h
Fix the declaration of sys_execve() in asm-generic/syscalls.h to have
various consts applied to its pointers.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-18 12:12:38 -07:00
Tony Luck 145e5aa269 [IA64] Fix build error: conflicting types for ‘sys_execve’
arch/ia64/kernel/process.c:636: error: conflicting types for ‘sys_execve’

commit d7627467b7
Make do_execve() take a const filename pointer

Missed the declaration of sys_execve in the ia64 asm/unistd.h (perhaps
because there is no reason for it to be there ... it might be a left over
from the COMPAT code?). Just delete the conflicting version.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2010-08-18 10:17:44 -07:00
Linus Torvalds 145c3ae46b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  fs: brlock vfsmount_lock
  fs: scale files_lock
  lglock: introduce special lglock and brlock spin locks
  tty: fix fu_list abuse
  fs: cleanup files_lock locking
  fs: remove extra lookup in __lookup_hash
  fs: fs_struct rwlock to spinlock
  apparmor: use task path helpers
  fs: dentry allocation consolidation
  fs: fix do_lookup false negative
  mbcache: Limit the maximum number of cache entries
  hostfs ->follow_link() braino
  hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
  remove SWRITE* I/O types
  kill BH_Ordered flag
  vfs: update ctime when changing the file's permission by setfacl
  cramfs: only unlock new inodes
  fix reiserfs_evict_inode end_writeback second call
2010-08-18 09:35:08 -07:00
Uwe Kleine-König 81ca03a0e2 mmc: build fix: mmc_pm_notify is only available with CONFIG_PM=y
This fixes a build breakage introduced by commit 4c2ef25fe0 ("mmc: fix
all hangs related to mmc/sd card insert/removal during suspend/resume")

Cc: David Brownell <david-b@pacbell.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-mmc@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Acked-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-18 09:34:05 -07:00
Linus Torvalds 1ca72feb93 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Fix build on POSIX shells
  latencytop: Fix kconfig dependency warnings
  perf annotate tui: Fix exit and RIGHT keys handling
  tracing: Sanitize value returned from write(trace_marker, "...", len)
  tracing/events: Convert format output to seq_file
  tracing: Extend recordmcount to better support Blackfin mcount
  tracing: Fix ring_buffer_read_page reading out of page boundary
  tracing: Fix an unallocated memory access in function_graph
2010-08-18 09:32:13 -07:00
Linus Torvalds 7dfb2d4069 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter)
  ALSA: hda - Fix ALC680 base model capture
  ASoC: Remove DSP mode support for WM8776
  ALSA: hda - Add quirk for Dell Vostro 1220
  ALSA: riptide - Fix detection / load of firmware files
2010-08-18 09:30:08 -07:00
Linus Torvalds 6c8bfb7f7d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: include sched.h in ColdFire/SPI driver
  m68knommu: formatting of pointers in printk()
  m68knommu: arch/m68k/include/asm/ide.h fix for nommu
2010-08-18 09:27:10 -07:00
Linus Torvalds d9f5d41569 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md raid-1/10 Fix bio_rw bit manipulations again
  md: provide appropriate return value for spare_active functions.
  md: Notify sysfs when RAID1/5/10 disk is In_sync.
  Update recovery_offset even when external metadata is used.
2010-08-18 09:26:42 -07:00
Linus Torvalds 86ea51d4a2 Merge branch 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6
* 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6:
  spi.h: missing kernel-doc notation, please fix
  of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers
  of: Fix missing includes
  ata: update for of_device to platform_device replacement
  microblaze: Fix of: eliminate of_device->node and dev_archdata->{of,prom}_node
  microblaze: Fix of/address: Merge all of the bus translation code
  booting-without-of: Remove nonexistent chapters from TOC, fix numbering
2010-08-18 09:26:17 -07:00
Trond Myklebust 0a377cff94 NFS: Fix an Oops in the NFSv4 atomic open code
Adam Lackorzynski reports:

with 2.6.35.2 I'm getting this reproducible Oops:

[  110.825396] BUG: unable to handle kernel NULL pointer dereference at
(null)
[  110.828638] IP: [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[  110.828638] PGD be89f067 PUD bf18f067 PMD 0
[  110.828638] Oops: 0000 [#1] SMP
[  110.828638] last sysfs file: /sys/class/net/lo/operstate
[  110.828638] CPU 2
[  110.828638] Modules linked in: rtc_cmos rtc_core rtc_lib amd64_edac_mod
i2c_amd756 edac_core i2c_core dm_mirror dm_region_hash dm_log dm_snapshot
sg sr_mod usb_storage ohci_hcd mptspi tg3 mptscsih mptbase usbcore nls_base
[last unloaded: scsi_wait_scan]
[  110.828638]
[  110.828638] Pid: 11264, comm: setchecksum Not tainted 2.6.35.2 #1
[  110.828638] RIP: 0010:[<ffffffff811247b7>]  [<ffffffff811247b7>]
encode_attrs+0x1a/0x2a4
[  110.828638] RSP: 0000:ffff88003bf5b878  EFLAGS: 00010296
[  110.828638] RAX: ffff8800bddb48a8 RBX: ffff88003bf5bb18 RCX:
0000000000000000
[  110.828638] RDX: ffff8800be258800 RSI: 0000000000000000 RDI:
ffff88003bf5b9f8
[  110.828638] RBP: 0000000000000000 R08: ffff8800bddb48a8 R09:
0000000000000004
[  110.828638] R10: 0000000000000003 R11: ffff8800be779000 R12:
ffff8800be258800
[  110.828638] R13: ffff88003bf5b9f8 R14: ffff88003bf5bb20 R15:
ffff8800be258800
[  110.828638] FS:  0000000000000000(0000) GS:ffff880041e00000(0063)
knlGS:00000000556bd6b0
[  110.828638] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  110.828638] CR2: 0000000000000000 CR3: 00000000be8ef000 CR4:
00000000000006e0
[  110.828638] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  110.828638] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  110.828638] Process setchecksum (pid: 11264, threadinfo
ffff88003bf5a000, task ffff88003f232210)
[  110.828638] Stack:
[  110.828638]  0000000000000000 ffff8800bfbcf920 0000000000000000
0000000000000ffe
[  110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[  110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[  110.828638] Call Trace:
[  110.828638]  [<ffffffff81124c1f>] ? nfs4_xdr_enc_setattr+0x90/0xb4
[  110.828638]  [<ffffffff81371161>] ? call_transmit+0x1c3/0x24a
[  110.828638]  [<ffffffff813774d9>] ? __rpc_execute+0x78/0x22a
[  110.828638]  [<ffffffff81371a91>] ? rpc_run_task+0x21/0x2b
[  110.828638]  [<ffffffff81371b7e>] ? rpc_call_sync+0x3d/0x5d
[  110.828638]  [<ffffffff8111e284>] ? _nfs4_do_setattr+0x11b/0x147
[  110.828638]  [<ffffffff81109466>] ? nfs_init_locked+0x0/0x32
[  110.828638]  [<ffffffff810ac521>] ? ifind+0x4e/0x90
[  110.828638]  [<ffffffff8111e2fb>] ? nfs4_do_setattr+0x4b/0x6e
[  110.828638]  [<ffffffff8111e634>] ? nfs4_do_open+0x291/0x3a6
[  110.828638]  [<ffffffff8111ed81>] ? nfs4_open_revalidate+0x63/0x14a
[  110.828638]  [<ffffffff811056c4>] ? nfs_open_revalidate+0xd7/0x161
[  110.828638]  [<ffffffff810a2de4>] ? do_lookup+0x1a4/0x201
[  110.828638]  [<ffffffff810a4733>] ? link_path_walk+0x6a/0x9d5
[  110.828638]  [<ffffffff810a42b6>] ? do_last+0x17b/0x58e
[  110.828638]  [<ffffffff810a5fbe>] ? do_filp_open+0x1bd/0x56e
[  110.828638]  [<ffffffff811cd5e0>] ? _atomic_dec_and_lock+0x30/0x48
[  110.828638]  [<ffffffff810a9b1b>] ? dput+0x37/0x152
[  110.828638]  [<ffffffff810ae063>] ? alloc_fd+0x69/0x10a
[  110.828638]  [<ffffffff81099f39>] ? do_sys_open+0x56/0x100
[  110.828638]  [<ffffffff81027a22>] ? ia32_sysret+0x0/0x5
[  110.828638] Code: 83 f1 01 e8 f5 ca ff ff 48 83 c4 50 5b 5d 41 5c c3 41
57 41 56 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 81 ec 18 01 00 00
<8b> 06 89 c2 83 e2 08 83 fa 01 19 db 83 e3 f8 83 c3 18 a8 01 8d
[  110.828638] RIP  [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[  110.828638]  RSP <ffff88003bf5b878>
[  110.828638] CR2: 0000000000000000
[  112.840396] ---[ end trace 95282e83fd77358f ]---

We need to ensure that the O_EXCL flag is turned off if the user doesn't
set O_CREAT.

Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-18 09:25:42 -04:00
Takashi Iwai 2ea1ef5789 Merge branch 'fix/asoc' into for-linus 2010-08-18 15:22:18 +02:00
Takashi Iwai 76165a3063 Merge branch 'fix/hda' into for-linus 2010-08-18 15:22:15 +02:00
Jaroslav Kysela 56385a12d9 ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter)
With some hardware combinations, the PCM interrupts are acknowledged
before the period boundary from the emu10k1 chip. The midlevel PCM code
gets confused and the playback stream is interrupted.

It seems that the interrupt processing shift by 2 samples is enough
to fix this issue. This default value does not harm other,
non-affected hardware.

More information: Kernel bugzilla bug#16300

[A copmile warning fixed by tiwai]

Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-08-18 15:10:59 +02:00
Nick Piggin 99b7db7b8f fs: brlock vfsmount_lock
fs: brlock vfsmount_lock

Use a brlock for the vfsmount lock. It must be taken for write whenever
modifying the mount hash or associated fields, and may be taken for read when
performing mount hash lookups.

A new lock is added for the mnt-id allocator, so it doesn't need to take
the heavy vfsmount write-lock.

The number of atomics should remain the same for fastpath rlock cases, though
code would be slightly slower due to per-cpu access. Scalability is not not be
much improved in common cases yet, due to other locks (ie. dcache_lock) getting
in the way. However path lookups crossing mountpoints should be one case where
scalability is improved (currently requiring the global lock).

The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
Altix system (high latency to remote nodes), a simple umount microbenchmark
(mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
took 6.8s, afterwards took 7.1s, about 5% slower.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:48 -04:00
Nick Piggin 6416ccb789 fs: scale files_lock
fs: scale files_lock

Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).

One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.

However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.

A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.

Testing results:

On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.

Booting:    locks=  25049 cpu-hits=  23174 (92.5%) node-hits=  23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64   locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)

So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.

Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.

                throughput
2.6.34-rc2      24.5
+patch          24.9

                us      sys     idle    IO wait (in %)
2.6.34-rc2      51.25   28.25   17.25   3.25
+patch          53.75   18.5    19      8.75

So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.

Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.

Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:48 -04:00
Nick Piggin 2dc91abe03 lglock: introduce special lglock and brlock spin locks
lglock: introduce special lglock and brlock spin locks

This patch introduces "local-global" locks (lglocks). These can be used to:

- Provide fast exclusive access to per-CPU data, with exclusive access to
  another CPU's data allowed but possibly subject to contention, and to provide
  very slow exclusive access to all per-CPU data.
- Or to provide very fast and scalable read serialisation, and to provide
  very slow exclusive serialisation of data (not necessarily per-CPU data).

Brlocks are also implemented as a short-hand notation for the latter use
case.

Thanks to Paul for local/global naming convention.

Cc: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:48 -04:00
Nick Piggin d996b62a8d tty: fix fu_list abuse
tty: fix fu_list abuse

tty code abuses fu_list, which causes a bug in remount,ro handling.

If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".

Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.

The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.

[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:47 -04:00
Nick Piggin ee2ffa0dfd fs: cleanup files_lock locking
fs: cleanup files_lock locking

Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:47 -04:00
Nick Piggin b04f784e5d fs: remove extra lookup in __lookup_hash
fs: remove extra lookup in __lookup_hash

Optimize lookup for create operations, where no dentry should often be
common-case. In cases where it is not, such as unlink, the added overhead
is much smaller than the removed.

Also, move comments about __d_lookup racyness to the __d_lookup call site.
d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
vein, add kerneldoc comments to __d_lookup and clean up some of the comments:

- We are interested in how the RCU lookup works here, particularly with
  renames. Make that explicit, and point to the document where it is explained
  in more detail.
- RCU is pretty standard now, and macros make implementations pretty mindless.
  If we want to know about RCU barrier details, we look in RCU code.
- Delete some boring legacy comments because we don't care much about how the
  code used to work, more about the interesting parts of how it works now. So
  comments about lazy LRU may be interesting, but would better be done in the
  LRU or refcount management code.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:47 -04:00
Nick Piggin 2a4419b5b2 fs: fs_struct rwlock to spinlock
fs: fs_struct rwlock to spinlock

struct fs_struct.lock is an rwlock with the read-side used to protect root and
pwd members while taking references to them. Taking a reference to a path
typically requires just 2 atomic ops, so the critical section is very small.
Parallel read-side operations would have cacheline contention on the lock, the
dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
real parallelism increase.

Replace it with a spinlock to avoid one or two atomic operations in typical
path lookup fastpath.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:46 -04:00
Nick Piggin 44672e4fbd apparmor: use task path helpers
apparmor: use task path helpers

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:46 -04:00
Nick Piggin baa0389073 fs: dentry allocation consolidation
fs: dentry allocation consolidation

There are 2 duplicate copies of code in dentry allocation in path lookup.
Consolidate them into a single function.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:45 -04:00
Nick Piggin 2e2e88ea8c fs: fix do_lookup false negative
fs: fix do_lookup false negative

In do_lookup, if we initially find no dentry, we take the directory i_mutex and
re-check the lookup. If we find a dentry there, then we revalidate it if
needed. However if that revalidate asks for the dentry to be invalidated, we
return -ENOENT from do_lookup. What should happen instead is an attempt to
allocate and lookup a new dentry.

This is probably not noticed because it is rare. It is only reached if a
concurrent create races in first (in which case, the dentry probably won't be
invalidated anyway), or if the racy __d_lookup has failed due to a
false-negative (which is very rare).

Fix this by removing code and have it use the normal reval path.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 08:35:45 -04:00
Andreas Gruenbacher 3a48ee8a4a mbcache: Limit the maximum number of cache entries
Limit the maximum number of mb_cache entries depending on the number of
hash buckets: if the only limit to the number of cache entries is the
available memory the hash chains can grow very long, taking a long time
to search.

At least partially solves https://bugzilla.lustre.org/show_bug.cgi?id=22771.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 06:24:41 -04:00
Al Viro 3b6036d148 hostfs ->follow_link() braino
we want the assignment to err done inside the if () to be
visible after it, so (re)declaring err inside if () body
is wrong.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 06:21:10 -04:00
Al Viro 850a496f96 hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
... not harmless in this case - we have a string in the end of buffer
already.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-18 06:18:57 -04:00