Commit graph

132357 commits

Author SHA1 Message Date
Ingo Molnar 5d0859cef2 Merge branch 'sched/clock' into tracing/ftrace
Conflicts:
	kernel/sched_clock.c
2009-02-26 21:21:59 +01:00
Ingo Molnar 83ce400928 x86: set X86_FEATURE_TSC_RELIABLE
If the TSC is constant and non-stop, also set it reliable.

(We will turn this off in DMI quirks for multi-chassis systems)

The performance number on a 16-way Nehalem system running
32 tasks that context-switch between each other is significant:

   sched_clock_stable=0		sched_clock_stable=1
   ....................         ....................
   22.456925 million/sec        24.306972 million/sec   [+8.2%]

lmbench's "lat_ctx -s 0 2" goes from 0.63 microseconds to
0.59 microseconds - a 6.7% increase in context-switching
performance.

Perfstat of 1 million pipe context switches between two tasks:

 Performance counter stats for './pipe-test-1m':

       [before]           [after]
   ............      ............
   37621.421089      36436.848378    task clock ticks     (msecs)

              0                 0    CPU migrations       (events)
        2000274           2000189    context switches     (events)
            194               193    pagefaults           (events)
     8433799643        8171016416    CPU cycles           (events) -3.21%
     8370133368        8180999694    instructions         (events) -2.31%
        4158565           3895941    cache references     (events) -6.74%
          44312             46264    cache misses         (events)

    2349.287976       2279.362465    wall-time            (msecs)  -3.06%

The speedup comes straight from the reduction in the instruction
count. sched_clock_cpu() got simpler and the whole workload thus
executes faster.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 21:20:25 +01:00
Ingo Molnar b342501cd3 sched: allow architectures to specify sched_clock_stable
Allow CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures to still specify
that their sched_clock() implementation is reliable.

This will be used by x86 to switch on a faster sched_clock_cpu()
implementation on certain CPU types.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 21:20:22 +01:00
Kiran Divekar ab65f649d3 libertas: fix misuse of netdev_priv() and dev->ml_priv
The mesh and radiotap interfaces need to use the same private data as
the main wifi interface.  If the main wifi interface uses netdev_priv(),
but the other interfaces ->ml_priv, there's no way to figure out where
the private data actually is in the WEXT handlers and netdevice
callbacks.  So make everything use ->ml_priv.

Fixes botched netdev_priv() conversion introduced by "netdevice
libertas: Fix directly reference of netdev->priv", though admittedly
libertas' use of ->priv was somewhat "special".

Signed-off-by: Kiran Divekar <dkiran@marvell.com>
Acked-by: Dan Williams <dcbw@redhat.com>
Tested-by: Chris Ball <cjb@laptop.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-26 15:15:44 -05:00
Kyle McMartin f6be37fdc6 x86: enable DMAR by default
Now that the obvious bugs have been worked out, specifically
the iwlagn issue, and the write buffer errata, DMAR should be safe
to turn back on by default. (We've had it on since those patches were
first written a few weeks ago, without any noticeable bug reports
(most have been due to the dma-api debug patchset.))

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 20:59:47 +01:00
David Woodhouse b50be33e42 [MTD] [MAPS] Remove MODULE_DEVICE_TABLE() from ck804rom driver.
We really don't want the BIOS flash mapping hacks to get automatically
loaded.

No idea why it isn't using pci_register_driver() though -- that should
be fine... and is even _present_ but disabled by #if 0.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-02-27 04:52:45 +09:00
wengang wang 28d57d4377 ocfs2: add IO error check in ocfs2_get_sector()
Check for IO error in ocfs2_get_sector().

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:12 -08:00
Tiger Yang 4442f51826 ocfs2: set gap to seperate entry and value when xattr in bucket
This patch set a gap (4 bytes) between xattr entry and
name/value when xattr in bucket. This gap use to seperate
entry and name/value when a bucket is full. It had already
been set when xattr in inode/block.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:11 -08:00
Tao Ma c8b9cf9a7c ocfs2: lock the metaecc process for xattr bucket
For other metadata in ocfs2, metaecc is checked in ocfs2_read_blocks
with io_mutex held. While for xattr bucket, it is calculated by
the whole buckets. So we have to add a spin_lock to prevent multiple
processes calculating metaecc.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Tested-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:11 -08:00
Tao Ma 89a907afe0 ocfs2: Use the right access_* method in ctime update of xattr.
In ctime updating of xattr, it use the wrong type of access for
inode, so use ocfs2_journal_access_di instead.

Reported-and-Tested-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:11 -08:00
Sunil Mushran 53ecd25e14 ocfs2/dlm: Make dlm_assert_master_handler() kill itself instead of the asserter
In dlm_assert_master_handler(), if we get an incorrect assert master from a node
that, we reply with EINVAL asking the asserter to die. The problem is that an
assert is sent after so many hoops, it is invariably the node that thinks the
asserter is wrong, is actually wrong. So instead of killing the asserter, this
patch kills the assertee.

This patch papers over a race that is still being addressed.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:11 -08:00
Sunil Mushran dabc47de7a ocfs2/dlm: Use ast_lock to protect ast_list
The code was using dlm->spinlock instead of dlm->ast_lock to protect the
ast_list. This patch fixes the issue.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:09 -08:00
Sunil Mushran c74ff8bb22 ocfs2: Cleanup the lockname print in dlmglue.c
The dentry lock has a different format than other locks. This patch fixes
ocfs2_log_dlm_error() macro to make it print the dentry lock correctly.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:09 -08:00
Sunil Mushran 7dc102b737 ocfs2/dlm: Retract fix for race between purge and migrate
Mainline commit d4f7e650e5 attempts to delay
the dlm_thread from sending the drop ref message if the lockres is being
migrated. The problem is that we make the dlm_thread wait for the migration
to complete. This causes a deadlock as dlm_thread also participates in the
lockres migration process.

A better fix for the original oss bugzilla#1012 is in testing.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:09 -08:00
Tao Ma 47be12e4ee ocfs2: Access and dirty the buffer_head in mark_written.
In __ocfs2_mark_extent_written, when we meet with the situation
of c_split_covers_rec, the old solution just replace the extent
record and forget to access and dirty the buffer_head. This will
cause a problem when the unwritten extent is in an extent block.
So access and dirty it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26 11:51:09 -08:00
Linus Torvalds 64e71303e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: try committing transaction before returning ENOSPC
  Btrfs: add better -ENOSPC handling
2009-02-26 10:37:00 -08:00
Linus Torvalds babb29b0a3 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  xen/blkfront: use blk_rq_map_sg to generate ring entries
  block: reduce stack footprint of blk_recount_segments()
  cciss: shorten 30s timeout on controller reset
  block: add documentation for register_blkdev()
  block: fix bogus gcc warning for uninitialized var usage
2009-02-26 10:36:35 -08:00
Linus Torvalds 6fc79d40d3 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix 64bit __copy_tofrom_user() regression
  powerpc: Fix 64bit memcpy() regression
  powerpc: Fix load/store float double alignment handler
2009-02-26 10:36:19 -08:00
Linus Torvalds 86883c2736 Make ieee1394_init a fs-initcall
It needs to happen before any firewire driver actually registers itself,
and that was previously handled by having the Makefile list the core
ieee1394 files before the drivers.

But now there are firewire drivers in drivers/media, and the Makefile
games aren't enough.  So just make ieee1394_init happen earlier in the
init sequence, the way all other bus layers already do.

Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Henrik Kurelid <henrik@kurelid.se>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Ben Backx <ben@bbackx.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-26 10:32:31 -08:00
Ingo Molnar 14131f2f98 tracing: implement trace_clock_*() APIs
Impact: implement new tracing timestamp APIs

Add three trace clock variants, with differing scalability/precision
tradeoffs:

 -   local: CPU-local trace clock
 -  medium: scalable global clock with some jitter
 -  global: globally monotonic, serialized clock

Make the ring-buffer use the local trace clock internally.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 18:44:06 +01:00
Ingo Molnar 6409c4da28 sched: sched_clock() improvement: use in_nmi()
make sure we dont execute more complex sched_clock() code in NMI context.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 18:44:05 +01:00
Jason Baron af39241b90 tracing, genirq: add irq enter and exit trace events
Impact: add new tracepoints

Add them to the generic IRQ code, that way every architecture
gets these new tracepoints, not just x86.

Using Steve's new 'TRACE_FORMAT', I can get function graph
trace as follows using the original two IRQ tracepoints:

 3)               |    handle_IRQ_event() {
 3)               |    /* (irq_handler_entry) irq=28 handler=eth0 */
 3)               |    e1000_intr_msi() {
 3)   2.460 us    |      __napi_schedule();
 3)   9.416 us    |    }
 3)               |    /* (irq_handler_exit) irq=28 handler=eth0 return=handled */
 3) + 22.935 us   |  }

Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 18:43:50 +01:00
Hiroshi Shimamoto cac64d00c2 sched_rt: don't start timer when rt bandwidth disabled
Impact: fix incorrect condition check

No need to start rt bandwidth timer when rt bandwidth is disabled.
If this timer starts, it may stop at sched_rt_period_timer() on the first time.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 14:18:55 +01:00
Frederic Weisbecker 8656e7a2fa tracing/core: make the per cpu trace files in per cpu directories
Impact: restructure the VFS layout of per CPU trace buffers

The per cpu trace files are all in a single directory:
/debug/tracing/per_cpu. In case of a large number of cpu, the
content of this directory becomes messy so we create now one
directory per cpu inside /debug/tracing/per_cpu which contain
each their own trace_pipe and trace files.

Ie:

 /debug/tracing$ ls -R per_cpu
 per_cpu:
 cpu0  cpu1

 per_cpu/cpu0:
 trace  trace_pipe

 per_cpu/cpu1:
 trace  trace_pipe

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 14:04:08 +01:00
Pavel Emelyanov 3f53a38131 ipv6: don't use tw net when accounting for recycled tw
We already have a valid net in that place, but this is not just a
cleanup - the tw pointer can be NULL there sometimes, thus causing
an oops in NET_NS=y case.

The same place in ipv4 code already works correctly using existing 
net, rather than tw's one.

The bug exists since 2.6.27.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 03:35:13 -08:00
Jens Axboe 9e973e64ac xen/blkfront: use blk_rq_map_sg to generate ring entries
On occasion, the request will apparently have more segments than we
fit into the ring. Jens says:

> The second problem is that the block layer then appears to create one
> too many segments, but from the dump it has rq->nr_phys_segments ==
> BLKIF_MAX_SEGMENTS_PER_REQUEST. I suspect the latter is due to
> xen-blkfront not handling the merging on its own. It should check that
> the new page doesn't form part of the previous page. The
> rq_for_each_segment() iterates all single bits in the request, not dma
> segments. The "easiest" way to do this is to call blk_rq_map_sg() and
> then iterate the mapped sg list. That will give you what you are
> looking for.

> Here's a test patch, compiles but otherwise untested. I spent more
> time figuring out how to enable XEN than to code it up, so YMMV!
> Probably the sg list wants to be put inside the ring and only
> initialized on allocation, then you can get rid of the sg on stack and
> sg_init_table() loop call in the function. I'll leave that, and the
> testing, to you.

[Moved sg array into info structure, and initialize once. -J]

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-02-26 10:45:48 +01:00
Jens Axboe 1e42807918 block: reduce stack footprint of blk_recount_segments()
blk_recalc_rq_segments() requires a request structure passed in, which
we don't have from blk_recount_segments(). So the latter allocates one on
the stack, using > 400 bytes of stack for that. This can cause us to spill
over one page of stack from ext4 at least:

 0)     4560     400   blk_recount_segments+0x43/0x62
 1)     4160      32   bio_phys_segments+0x1c/0x24
 2)     4128      32   blk_rq_bio_prep+0x2a/0xf9
 3)     4096      32   init_request_from_bio+0xf9/0xfe
 4)     4064     112   __make_request+0x33c/0x3f6
 5)     3952     144   generic_make_request+0x2d1/0x321
 6)     3808      64   submit_bio+0xb9/0xc3
 7)     3744      48   submit_bh+0xea/0x10e
 8)     3696     368   ext4_mb_init_cache+0x257/0xa6a [ext4]
 9)     3328     288   ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
10)     3040     160   ext4_mb_new_blocks+0x211/0x4b4 [ext4]
11)     2880     336   ext4_ext_get_blocks+0xb61/0xd45 [ext4]
12)     2544      96   ext4_get_blocks_wrap+0xf2/0x200 [ext4]
13)     2448      80   ext4_da_get_block_write+0x6e/0x16b [ext4]
14)     2368     352   mpage_da_map_blocks+0x7e/0x4b3 [ext4]
15)     2016     352   ext4_da_writepages+0x2ce/0x43c [ext4]
16)     1664      32   do_writepages+0x2d/0x3c
17)     1632     144   __writeback_single_inode+0x162/0x2cd
18)     1488      96   generic_sync_sb_inodes+0x1e3/0x32b
19)     1392      16   sync_sb_inodes+0xe/0x10
20)     1376      48   writeback_inodes+0x69/0xb3
21)     1328     208   balance_dirty_pages_ratelimited_nr+0x187/0x2f9
22)     1120     224   generic_file_buffered_write+0x1d4/0x2c4
23)      896     176   __generic_file_aio_write_nolock+0x35f/0x393
24)      720      80   generic_file_aio_write+0x6c/0xc8
25)      640      80   ext4_file_write+0xa9/0x137 [ext4]
26)      560     320   do_sync_write+0xf0/0x137
27)      240      48   vfs_write+0xb3/0x13c
28)      192      64   sys_write+0x4c/0x74
29)      128     128   system_call_fastpath+0x16/0x1b

Split the segment counting out into a __blk_recalc_rq_segments() helper
to avoid allocating an onstack request just for checking the physical
segment count.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-26 10:45:48 +01:00
Jens Axboe 5e4c91c84b cciss: shorten 30s timeout on controller reset
If reset_devices is set for kexec, then cciss will delay 30 seconds
since the old 5i controller _may_ need that long to recover. Replace
the long sleep with incremental sleep and tests to reduce the 30 seconds
to worst case for 5i, so that other controllers will proceed quickly.

Reviewed-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-26 10:45:48 +01:00
Márton Németh 9e8c0bccdc block: add documentation for register_blkdev()
Add documentation for register_blkdev() function and for the parameters.

Signed-off-by: Márton Németh <nm127@freemail.hu>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-26 10:45:48 +01:00
Jens Axboe b2bf96833c block: fix bogus gcc warning for uninitialized var usage
Newer gcc throw this warning:

        fs/bio.c: In function ?bio_alloc_bioset?:
        fs/bio.c:305: warning: ?p? may be used uninitialized in this function

since it cannot figure out that 'p' is only ever used if 'bs' is non-NULL.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-26 10:45:48 +01:00
Herbert Xu a760a6656e crypto: api - Fix module load deadlock with fallback algorithms
With the mandatory algorithm testing at registration, we have
now created a deadlock with algorithms requiring fallbacks.
This can happen if the module containing the algorithm requiring
fallback is loaded first, without the fallback module being loaded
first.  The system will then try to test the new algorithm, find
that it needs to load a fallback, and then try to load that.

As both algorithms share the same module alias, it can attempt
to load the original algorithm again and block indefinitely.

As algorithms requiring fallbacks are a special case, we can fix
this by giving them a different module alias than the rest.  Then
it's just a matter of using the right aliases according to what
algorithms we're trying to find.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-02-26 14:06:31 +08:00
Eric Sandeen 8f64b32eb7 ext4: don't call jbd2_journal_force_commit_nested without journal
Running without a journal, I oopsed when I ran out of space,
because we called jbd2_journal_force_commit_nested() from
ext4_should_retry_alloc() without a journal.

This should take care of it, I think.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-02-26 00:57:35 -05:00
Theodore Ts'o d8ae4601a4 ext4: Reorder fs/Makefile so that ext2 root fs's are mounted using ext2
In fs/Makefile, ext3 was placed before ext2 so that a root filesystem
that possessed a journal, it would be mounted as ext3 instead of ext2.
This was necessary because a cleanly unmounted ext3 filesystem was
fully backwards compatible with ext2, and could be mounted by ext2 ---
but it was desirable that it be mounted with ext3 so that the
journaling would be enabled.

The ext4 filesystem supports new incompatible features, so there is no
danger of an ext4 filesystem being mistaken for an ext2 filesystem.
At that point, the relative ordering of ext4 with respect to ext2
didn't matter until ext4 gained the ability to mount filesystems
without a journal starting in 2.6.29-rc1.  Now that this is the case,
given that ext4 is before ext2, it means that root filesystems that
were using the plain-jane ext2 format are getting mounted using the
ext4 filesystem driver, which is a change in behavior which could be
surprising to users.

It's doubtful that there are that many ext2-only root filesystem users
that would also have ext4 compiled into the kernel, but to adhere to
the principle of least surprise, the correct ordering in fs/Makefile
is ext3, followed by ext2, and finally ext4.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-02-28 09:50:01 -05:00
Theodore Ts'o 8b1a8ff8b3 ext4: Remove duplicate call to ext4_commit_super() in ext4_freeze()
Commit c4be0c1d added error checking to ext4_freeze() when calling
ext4_commit_super().  Unfortunately the patch failed to remove the
original call to ext4_commit_super(), with the net result that when
freezing the filesystem, the superblock gets written twice, the first
time without error checking.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-02-28 00:08:53 -05:00
Paul E. McKenney a682604838 rcu: Teach RCU that idle task is not quiscent state at boot
This patch fixes a bug located by Vegard Nossum with the aid of
kmemcheck, updated based on review comments from Nick Piggin,
Ingo Molnar, and Andrew Morton.  And cleans up the variable-name
and function-name language.  ;-)

The boot CPU runs in the context of its idle thread during boot-up.
During this time, idle_cpu(0) will always return nonzero, which will
fool Classic and Hierarchical RCU into deciding that a large chunk of
the boot-up sequence is a big long quiescent state.  This in turn causes
RCU to prematurely end grace periods during this time.

This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
function to ignore the idle task as a quiescent state until the
system has started up the scheduler in rest_init(), introducing a
new non-API function rcu_idle_now_means_idle() to inform RCU of this
transition.  RCU maintains an internal rcu_idle_cpu_truthful variable
to track this state, which is then used by rcu_check_callback() to
determine if it should believe idle_cpu().

Because this patch has the effect of disallowing RCU grace periods
during long stretches of the boot-up sequence, this patch also introduces
Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
no-op if num_online_cpus() returns 1.  This allows boot-time code that
calls synchronize_rcu() to proceed normally.  Note, however, that RCU
callbacks registered by call_rcu() will likely queue up until later in
the boot sequence.  Although rcuclassic and rcutree can also use this
same optimization after boot completes, rcupreempt must restrict its
use of this optimization to the portion of the boot sequence before the
scheduler starts up, given that an rcupreempt RCU read-side critical
section may be preeempted.

In addition, this patch takes Nick Piggin's suggestion to make the
system_state global variable be __read_mostly.

Changes since v4:

o	Changes the name of the introduced function and variable to
	be less emotional.  ;-)

Changes since v3:

o	WARN_ON(nr_context_switches() > 0) to verify that RCU
	switches out of boot-time mode before the first context
	switch, as suggested by Nick Piggin.

Changes since v2:

o	Created rcu_blocking_is_gp() internal-to-RCU API that
	determines whether a call to synchronize_rcu() is itself
	a grace period.

o	The definition of rcu_blocking_is_gp() for rcuclassic and
	rcutree checks to see if but a single CPU is online.

o	The definition of rcu_blocking_is_gp() for rcupreempt
	checks to see both if but a single CPU is online and if
	the system is still in early boot.

	This allows rcupreempt to again work correctly if running
	on a single CPU after booting is complete.

o	Added check to rcupreempt's synchronize_sched() for there
	being but one online CPU.

Tested all three variants both SMP and !SMP, booted fine, passed a short
rcutorture test on both x86 and Power.

Located-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 04:08:14 +01:00
Mark Nelson f72b728bf1 powerpc: Fix 64bit __copy_tofrom_user() regression
This fixes a regression introduced by commit
a4e22f02f5 ("powerpc: Update 64bit
__copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").

The same bug that existed in the 64bit memcpy() also exists here so fix
it here too. The fix is the same as that applied to memcpy() with the
addition of fixes for the exception handling code required for
__copy_tofrom_user().

This stops us reading beyond the end of the source region we were told
to copy.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:54 +11:00
Mark Nelson e423b9ecd6 powerpc: Fix 64bit memcpy() regression
This fixes a regression introduced by commit
25d6e2d7c5 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").

This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC,

The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.

In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.

Below is an example of the regression, as reported by Sachin Sant:

Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
    pc: c000000000039574: .memcpy+0x74/0x244
    lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
    sp: c00000003baf32a0
   msr: 8000000000009032
   dar: c00000003f380000
 dsisr: 40000000
  current = 0xc00000003e54b010
  paca    = 0xc000000000a53680
    pid   = 1840, comm = readahead
enter ? for help
[link register   ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:53 +11:00
Michael Neuling 49f297f8df powerpc: Fix load/store float double alignment handler
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct.  Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.

Below fixes this and merges the little/big endian case.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:53 +11:00
Ingo Molnar f4abfb8d0d Merge branch 'tip/tracing/ftrace' of ssh://master.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace 2009-02-26 03:48:44 +01:00
Ingo Molnar e36b1e136a Merge branches 'tracing/ftrace', 'tracing/hw-branch-tracing' and 'linus' into tracing/core 2009-02-26 03:47:27 +01:00
Steven Rostedt 3cdfdf91fc tracing: wrap arguments with PARAMS
Peter Zijlstra warned that TPPROTO and TPARGS might become something
other than a simple copy of itself. To prevent this from having
side effects in the TRACE_FORMAT macro in tracepoint.h, we add a
PARAMS() macro to be defined as just a wrapper.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-25 21:44:26 -05:00
Steven Rostedt eef62a6826 tracing: rename DEFINE_TRACE_FMT to just TRACE_FORMAT
There's been a bit confusion to whether DEFINE/DECLARE_TRACE_FMT should
be a DEFINE or a DECLARE. Ingo Molnar suggested simply calling it
TRACE_FORMAT.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-25 21:44:22 -05:00
Linus Torvalds 169d418b12 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: emu10k1 - Fix digital/analog switch on audigy2 ZS
  ALSA: hda - Quirk for Acer Aspire 6530G
  ALSA: hda - add another MacBook Pro 3,1 SSID
  ALSA: fix excessive background noise introduced by OSS emulation rate shrink
  ALSA: aw2: do not grab every saa7146 based device
  ALSA: hda - Fix parse of init_verbs sysfs entry
  ALSA: pcxhr.h replace signed one-bit bitfields
2009-02-25 15:16:18 -08:00
Linus Torvalds 70c01f01a2 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Don't go beyond iosapic_intr_info's arraysize
  [IA64] Do not go beyond ARRAY_SIZE of unw.hash
  [IA64] enable setting DMAR on by default
2009-02-25 15:14:37 -08:00
Linus Torvalds c4eb1bf63f Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  [libata] pata_legacy: for VLB 32bit PIO don't try tricks with slop
  [libata] pata_amd: program FIFO
  sata_mv: fix SoC interrupt breakage
  pata_it821x: resume from hibernation fails with RAID volume
2009-02-25 15:12:48 -08:00
Alan Cox c55af1f5ab [libata] pata_legacy: for VLB 32bit PIO don't try tricks with slop
These devices are generally used with ATA anyway and it seems that some
ATAPI will need us to issue the right number of words.  Therefore as we
can't switch mid burst on VLB devices we should only use 32bit I/O for
suitable block sizes.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-25 15:30:23 -05:00
Alan Cox c48052cc36 [libata] pata_amd: program FIFO
With 32bit PIO we can use the posted write buffers, but only for 32bit I/O
cycles.  This means we must disable the FIFO for ATAPI where a final 16bit
cycle may occur.

Rework the FIFO logic so that we disable the FIFO then selectively
re-enable it when we set the timings on AMD devices.  Also fix a case
where we scribbled on PCI config 0x41 of Nvidia chips when we shouldn't.

Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-25 15:30:16 -05:00
Mark Lord 6be96ac15e sata_mv: fix SoC interrupt breakage
For some reason, sata_mv doesn't clear interrupt status during init
when it's running on an SoC host adapter.  If the bootloader has
touched the SATA controller before starting Linux, Linux can end up
enabling the SATA interrupt with events pending, which will cause the
interrupt to be marked as spurious and then be disabled, which then
breaks all further accesses to the controller.

This patch makes the SoC path clear interrupt status on init like in
the non-SoC case.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-25 15:25:35 -05:00
Ondrej Zary 7ba07d16bd pata_it821x: resume from hibernation fails with RAID volume
Hibernation didn't work for me since I started to use IT8212 controller.
I did some debugging (booting with no_console_suspend init=/bin/sh).

Found that resume fails (2.6.28) with "serial number mismatch 'some
garbage' != 'some other garbage'" and "revalidation failed" messages.
That's because the controller firmware fills different serial number in
the IDENTIFY every boot.

The patch below fixes the resume simply clearing the serial number.  The
proper fix would be probably to fill in the serial number of the RAID
volume instead.  I assume that there must be something like that stored on
the drives but I don't know where.

Fix resume on pata_it821x RAID volume by clearing the serial number in
IDENTIFY data, which is otherwise different on each boot.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-25 15:22:44 -05:00
Linus Torvalds a36e4f0cab Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  ide: fix refcounting in device drivers
  ide-cd: document capacity hack
  it821x: remove dead URL
  atiixp: fix missing parentheses
  amd74xx: device/vendor confusion
  ide: ide.c 'clear' fix, update "ide=nodma" documentation
2009-02-25 12:22:06 -08:00