redonkable/alistair23-linux

Author	SHA1	Message	Date
Mikael Pettersson	df17b1d990	x86, 32-bit: fix boot failure on TSC-less processors Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6" (invalid opcode) and a kernel halt immediately after the kernel has been uncompressed. The BUG shows EIP pointing to an rdtsc instruction in native_read_tsc(), invoked from native_sched_clock(). (This error occurs so early that not even the serial console can capture it.) A bisection showed that this bug first occurs in 2.6.26-rc3-git7, via commit `9ccc906c97`: >x86: distangle user disabled TSC from unstable > >tsc_enabled is set to 0 from the command line switch "notsc" and from >the mark_tsc_unstable code. Seperate those functionalities and replace >tsc_enable with tsc_disable. This makes also the native_sched_clock() >decision when to use TSC understandable. > >Preparatory patch to solve the sched_clock() issue on 32 bit. > >Signed-off-by: Thomas Gleixner <tglx@linutronix.de> The core reason for this bug is that native_sched_clock() gets called before tsc_init(). Before the commit above, tsc_32.c used a "tsc_enabled" variable which defaulted to 0 == disabled, and which only got enabled late in tsc_init(). Thus early calls to native_sched_clock() would skip the TSC and use jiffies instead. After the commit above, tsc_32.c uses a "tsc_disabled" variable which defaults to 0, meaning that the TSC is Ok to use. Early calls to native_sched_clock() now erroneously try to use the TSC on !cpu_has_tsc processors, leading to invalid opcode exceptions. My proposed fix is to initialise tsc_disabled to a "soft disabled" state distinct from the hard disabled state set up by the "notsc" kernel option. This fixes the native_sched_clock() problem. It also allows tsc_init() to be simplified: instead of setting tsc_disabled = 1 on every error return, we just set tsc_disabled = 0 once when all checks have succeeded. I've verified that this lets my 486 boot again. I've also verified that a Core2 machine still uses the TSC as clocksource after the patch. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-19 10:08:47 +02:00
Thomas Gleixner	74dc51a3de	x86: disable TSC for sched_clock() when calibration failed When the TSC calibration fails then TSC is still used in sched_clock(). Disable it completely in that case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org	2008-05-23 14:08:06 +02:00
Thomas Gleixner	9ccc906c97	x86: distangle user disabled TSC from unstable tsc_enabled is set to 0 from the command line switch "notsc" and from the mark_tsc_unstable code. Seperate those functionalities and replace tsc_enable with tsc_disable. This makes also the native_sched_clock() decision when to use TSC understandable. Preparatory patch to solve the sched_clock() issue on 32 bit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-05-23 14:08:06 +02:00
Thomas Gleixner	d8bb6f4c16	x86: tsc prevent time going backwards We already catch most of the TSC problems by sanity checks, but there is a subtle bug which has been in the code forever. This can cause time jumps in the range of hours. This was reported in: http://lkml.org/lkml/2007/8/23/96 and http://lkml.org/lkml/2008/3/31/23 I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the sync check. Still there exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well. CPU 0 updates the clock source variables under xtime/vyscall lock and CPU1, where the TSC is slighty behind CPU0, is reading the time right after the seqlock was unlocked. The clocksource reference data was updated with the TSC from CPU0 and the value which is read from TSC on CPU1 is less than the reference data. This results in a huge delta value due to the unsigned subtraction of the TSC value and the reference value. This algorithm can not be changed due to the support of wrapping clock sources like pm timer. The huge delta is converted to nanoseconds and added to xtime, which is then observable by the caller. The next gettimeofday call on CPU1 will show the correct time again as now the TSC has advanced above the reference value. To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value. I pondered to mark the TSC unstable when the readout is smaller than the reference value, but this would render an otherwise good and fast clocksource unusable without a real good reason. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:19:55 +02:00
Pavel Machek	4bd01600b2	x86: clean up =0 initializations in arch/x86/kernel/tsc_32.c Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-04-19 19:19:55 +02:00
Ingo Molnar	1725037f72	x86: set_cyc2ns_scale() remove prev scale Peter Zijlstra pointed out that it's unused. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-17 17:41:34 +02:00
Rusty Russell	3c2047cd32	x86: if we cannot calibrate the TSC, we panic. The current tsc_init() clears the TSC feature bit if the TSC khz cannot be calculated, causing us to panic in arch/x86/kernel/cpu/bugs.c check_config(). We should simply mark it unstable. Frankly, someone should take an axe to this code. mark_tsc_unstable() not only marks it unstable, but sets tsc_enabled to 0, which seems redundant but is actually important here because means it won't be used by sched_clock() either. Perhaps a tristate enum "UNUSABLE, UNSTABLE, OK" would be clearer, and separate mark_tsc_unstable() and mark_tsc_broken() functions? Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-17 17:40:52 +02:00
Karsten Wiese	4f41c94d5c	x86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier() In time_cpufreq_notifier() the cpu id to act upon is held in freq->cpu. Use it instead of smp_processor_id() in the call to set_cyc2ns_scale(). This makes the preempt_*able() unnecessary and lets set_cyc2ns_scale() update the intended cpu's cyc2ns. Related mail/thread: http://lkml.org/lkml/2007/12/7/130 Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-07 21:09:14 +02:00
Ingo Molnar	5b13d86357	revert "x86: tsc prevent time going backwards" revert: \| commit `47001d6033` \| Author: Thomas Gleixner <tglx@linutronix.de> \| Date: Tue Apr 1 19:45:18 2008 +0200 \| \| x86: tsc prevent time going backwards it has been identified to cause suspend regression - and the commit fixes a longstanding bug that existed before 2.6.25 was opened - so it can wait some more until the effects are better understood. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-07 21:09:14 +02:00
Thomas Gleixner	47001d6033	x86: tsc prevent time going backwards We already catch most of the TSC problems by sanity checks, but there is a subtle bug which has been in the code for ever. This can cause time jumps in the range of hours. This was reported in: http://lkml.org/lkml/2007/8/23/96 and http://lkml.org/lkml/2008/3/31/23 I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the sync check. Still there exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well. CPU 0 updates the clock source variables under xtime/vyscall lock and CPU1, where the TSC is slighty behind CPU0, is reading the time right after the seqlock was unlocked. The clocksource reference data was updated with the TSC from CPU0 and the value which is read from TSC on CPU1 is less than the reference data. This results in a huge delta value due to the unsigned subtraction of the TSC value and the reference value. This algorithm can not be changed due to the support of wrapping clock sources like pm timer. The huge delta is converted to nanoseconds and added to xtime, which is then observable by the caller. The next gettimeofday call on CPU1 will show the correct time again as now the TSC has advanced above the reference value. To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value. I pondered to mark the TSC unstable when the readout is smaller than the reference value, but this would render an otherwise good and fast clocksource unusable without a real good reason. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-04 18:36:49 +02:00
Pavel Machek	7265b6f10d	x86: notsc is ignored on common configurations notsc is ignored in 32-bit kernels if CONFIG_X86_TSC is on.. which is bad, fix it. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-02-26 12:55:54 +01:00
Andi Kleen	404ee5b14b	x86: convert TSC disabling to generic cpuid disable bitmap Fix from: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:33:20 +01:00
Andi Kleen	51fc97b935	x86: allow TSC clock source on AMD Fam10h and some cleanup After a lot of discussions with AMD it turns out that TSC on Fam10h CPUs is synchronized when the CONSTANT_TSC cpuid bit is set. Or rather that if there are ever systems where that is not true it would be their BIOS' task to disable the bit. So finally use TSC gettimeofday on Fam10h by default. Or rather it is always used now on CPUs where the AMD specific CONSTANT_TSC bit is set. This gives a nice speed bost for gettimeofday() on these systems which tends to be by far the most common v/syscall. On a Fam10h system here TSC gtod uses about 20% of the CPU time of acpi_pm based gtod(). This was measured on 32bit, on 64bit it is even better because TSC gtod() can use a vsyscall and stay in ring 3, which acpi_pm doesn't. The Intel check simply checks for CONSTANT_TSC too without hardcoding Intel vendor. This is equivalent on 64bit because all 64bit capable Intel CPUs will have CONSTANT_TSC set. On Intel there is no CPU supplied CONSTANT_TSC bit currently, but we synthesize one based on hardcoded knowledge which steppings have p-state invariant TSC. So the new logic is now: On CPUs which have the AMD specific CONSTANT_TSC bit set or on Intel CPUs which are new enough to be known to have p-state invariant TSC always use TSC based gettimeofday() Cc: lenb@kernel.org Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:32:40 +01:00
Guillaume Chazarain	53d517cdba	x86: scale cyc_2_nsec according to CPU frequency scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ mingo@elte.hu: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-01-30 13:30:06 +01:00
Dave Johnson	8c66006538	x86: fix more TSC clock source calibration errors The previous patch wasn't correctly handling the 'count' variable. If a CPU gave bad results on the 1st or 2nd run but good results on the 3rd, it wouldn't do the correct thing. No idea if any such CPU exists, but the patch below handles that case by discarding the bad runs. If a bad result (too quick, or too slow) occurs on any of the 3 runs it will be discarded. Also updated some comments to explain what's going on. Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-23 22:37:22 +02:00
Dave Johnson	edaf420fdc	x86: fix TSC clock source calibration error I ran into this problem on a system that was unable to obtain NTP sync because the clock was running very slow (over 10000ppm slow). ntpd had declared all of its peers 'reject' with 'peer_dist' reason. On investigation, the tsc_khz variable was significantly incorrect causing xtime to run slow. After a reboot tsc_khz was correct so I did a reboot test to see how often the problem occurred: Test was done on a 2000 Mhz Xeon system. Of 689 reboots, 8 of them had unacceptable tsc_khz values (>500ppm): range of tsc_khz # of boots % of boots ---------------- ---------- ---------- < 1999750 0 0.000% 1999750 - 1999800 21 3.048% 1999800 - 1999850 166 24.128% 1999850 - 1999900 241 35.029% 1999900 - 1999950 211 30.669% 1999950 - 2000000 42 6.105% 2000000 - 2000000 0 0.000% `2000050` - 2000100 0 0.000% [...] 2000100 - 2015000 1 0.145% << BAD 2015000 - 2030000 6 0.872% << BAD 2030000 - 2045000 1 0.145% << BAD 2045000 < 0 0.000% The worst boot was 2032.577 Mhz, over 1.5% off! It appears that on rare occasions, mach_countup() is taking longer to complete than necessary. I suspect that this is caused by the CPU taking a periodic SMI interrupt right at the end of the 30ms calibration loop. This would cause the loop to delay while the SMI BIOS hander runs. The resulting TSC value is beyond what it actually should be resulting in a higher tsc_khz. The below patch makes native_calculate_cpu_khz() take the best (shortest duration, lowest khz) run of it's 3 calibration loops. If a SMI goes off causing a bad result (long duration, higher khz) it will be discarded. With the patch applied, 300 boots of the same system produce good results: range of tsc_khz # of boots % of boots ---------------- ---------- ---------- < 1999750 0 0.000% 1999750 - 1999800 30 10.000% 1999800 - 1999850 166 55.333% 1999850 - 1999900 89 29.667% 1999900 - 1999950 15 5.000% 1999950 < 0 0.000% Problem was found and tested against 2.6.18. Patch is against 2.6.22. Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-23 22:37:22 +02:00
Linus Torvalds	c00046c279	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits) fix do_sys_open() prototype sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake Documentation: Fix typo in SubmitChecklist. Typo: depricated -> deprecated Add missing profile=kvm option to Documentation/kernel-parameters.txt fix typo about TBI in e1000 comment proc.txt: Add /proc/stat field small documentation fixes Fix compiler warning in smount example program from sharedsubtree.txt docs/sysfs: add missing word to sysfs attribute explanation documentation/ext3: grammar fixes Documentation/java.txt: typo and grammar fixes Documentation/filesystems/vfs.txt: typo fix include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros trivial copy_data_pages() tidy up Fix typo in arch/x86/kernel/tsc_32.c file link fix for Pegasus USB net driver help remove unused return within void return function Typo fixes retrun -> return x86 hpet.h: remove broken links ...	2007-10-19 20:36:17 -07:00
Josh Triplett	9631512973	Fix typo in arch/x86/kernel/tsc_32.c Signed-off-by: Josh Triplett <josh@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 02:23:49 +02:00
Simon Arlott	27b46d7661	spelling fixes: arch/i386/ Spelling fixes in arch/i386/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 01:13:56 +02:00
Mike Travis	92cb7612ae	x86: convert cpuinfo_x86 array to a per_cpu array cpu_data is currently an array defined using NR_CPUS. This means that we overallocate since we will rarely really use maximum configured cpus. When NR_CPU count is raised to 4096 the size of cpu_data becomes 3,145,728 bytes. These changes were adopted from the sparc64 (and ia64) code. An additional field was added to cpuinfo_x86 to be a non-ambiguous cpu index. This corresponds to the index into a cpumask_t as well as the per_cpu index. It's used in various places like show_cpuinfo(). cpu_data is defined to be the boot_cpu_data structure for the NON-SMP case. Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-19 20:35:04 +02:00
Ingo Molnar	f97586b610	x86: do not crash on non-Geode PCs in TSC probe with this fix Geode kernels can be booted (and QA-ed) on generic PCs. otherwise it crashes and burns during early bootup: Detected 2160.212 MHz processor. general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 EIP: 0060:[<c09071f6>] Not tainted VLI EFLAGS: 00010002 (2.6.23-rc9 #90) EIP is at tsc_init+0xa6/0x150 eax: 00000001 ebx: c1dce000 ecx: 00001900 edx: 00000001 esi: 00051000 edi: 00051000 ebp: c08fdfc4 esp: c08fdfa4 ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Process swapper (pid: 0, ti=c08fc000 task=c082a180 task.ti=c08fc000) Stack: c076b870 00000870 000000d4 0000001d c0831e80 c1dce000 00051000 00051000 c08fdfcc c09053f8 c08fdff8 c09045ff 000001e2 c09040a0 00051000 00000020 0004e500 c0932140 00020800 00099800 c08ed000 01409007 00000000 Call Trace: [<c010517a>] show_trace_log_lvl+0x1a/0x30 [<c0105246>] show_stack_log_lvl+0xb6/0x100 [<c0105732>] show_registers+0x212/0x3a0 [<c0105aa4>] die+0x104/0x220 [<c0105f5f>] do_general_protection+0x1ef/0x2b0 [<c06699f2>] error_code+0x72/0x78 [<c09053f8>] time_init+0x8/0x20 [<c09045ff>] start_kernel+0x1af/0x320 [<00000000>] 0x0 ======================= Code: 31 d2 b8 00 00 09 3d f7 35 2c 70 9b c0 a3 04 95 8f c0 e8 ce 4e 99 ff b8 e0 45 93 c0 e8 94 b1 c5 ff e8 7f 3d 80 ff b9 00 19 00 00 <0f> 32 f6 c4 01 74 07 83 25 24 ce 82 c0 fd 8b 0d 20 ce 82 c0 b8 EIP: [<c09071f6>] tsc_init+0xa6/0x150 SS:ESP 0068:c08fdfa4 Kernel panic - not syncing: Attempted to kill the idle task! Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-17 20:15:37 +02:00
Dave Jones	835c34a168	Delete filenames in comments. Since the x86 merge, lots of files that referenced their own filenames are no longer correct. Rather than keep them up to date, just delete them, as they add no real value. Additionally: - fix up comment formatting in scx200_32.c - Remove a credit from myself in setup_64.c from a time when we had no SCM - remove longwinded history from tsc_32.c which can be figured out from git. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-13 10:01:23 -07:00
Linus Torvalds	19ad7ae47e	Merge branch 'dmi-const' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6 * 'dmi-const' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6: drivers/firmware: const-ify DMI API and internals	2007-10-11 19:18:45 -07:00
Thomas Gleixner	9a163ed8e0	i386: move kernel Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-11 11:17:01 +02:00

24 commits