Commit graph

215 commits

Author SHA1 Message Date
Thomas Graf be577ddc2b [PKT_SCHED] qdisc: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:09 -07:00
Arnaldo Carvalho de Melo dc5fc579b9 [NETLINK]: Use nlmsg_trim() where appropriate
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:37 -07:00
Arnaldo Carvalho de Melo 27a884dc3c [SK_BUFF]: Convert skb->tail to sk_buff_data_t
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:28 -07:00
Patrick McHardy 514bca322c [NET_SCHED]: Fix warning
net/sched/sch_api.c: In function 'psched_show':
net/sched/sch_api.c:1219: warning: format '%08x' expects type 'unsigned int', but argument 6 has type 's64'

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:17 -07:00
Patrick McHardy bb239acf56 [NET_SCHED]: sch_cbq: fix watchdog scheduled too late
q->now is increased during dequeue and doesn't contain the current time
afterwards, resulting in a too large timeout value for the qdisc watchdog.
Use "now" instead, which still contains the current time.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:16 -07:00
Patrick McHardy 4361cb17f0 [NET_SCHED]: Export real timer resolution in /proc/net/psched
The timer resolution exported in /proc/net/psched is used by userspace to
calculate HTB's burst values. Currently it is set to HZ, since we're now
using hrtimers, use KTIME_MONOTONIC_RES, which makes HTB use smaller burst
values.

This patch also affects libnl, which incorrectly uses this value for
the SFQ perturbation parameter, which is always in seconds, and some
routing cache values, which are in USER_HZ, so both cases are broken
anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:15 -07:00
Patrick McHardy 00c04af9df [NET_SCHED]: kill jiffie conversion macros
Now that all packet schedulers have been converted to hrtimers most users
of PSCHED_JIFFIE2US and PSCHED_US2JIFFIE are gone. The remaining users use
it to convert external time units to packet scheduler clock ticks, so use
PSCHED_TICKS_PER_SEC instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:14 -07:00
Patrick McHardy fb983d4578 [NET_SCHED]: sch_htb: use hrtimer based watchdog
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:13 -07:00
Patrick McHardy 1a13cb63d6 [NET_SCHED]: sch_cbq: use hrtimer for delay_timer
Switch delay_timer to hrtimer.

The class penalty parameter is changed to use psched ticks as units.
Since iproute never supported using this and the only existing user
(libnl) incorrectly assumes psched ticks as units anyway, this
shouldn't break anything.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:12 -07:00
Patrick McHardy e9054a339e [NET_SCHED]: sch_cbq: fix cbq_undelay_prio for non-active priorites
cbq_undelay_prio is supposed to return a time delta, but returns the
current time for non-active priorities, causing cbq_undelay to mark
the priority as active and schedule a timer for twice the current
time.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:11 -07:00
Patrick McHardy 88a993540a [NET_SCHED]: sch_cbq: use hrtimer based watchdog
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:09 -07:00
Patrick McHardy 59cb5c6734 [NET_SCHED]: sch_netem: use hrtimer based watchdog
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:08 -07:00
Patrick McHardy f7f593e383 [NET_SCHED]: sch_tbf: use hrtimer based watchdog
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:07 -07:00
Patrick McHardy ed2b229a97 [NET_SCHED]: sch_hfsc: use hrtimer based watchdog
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:06 -07:00
Patrick McHardy 4179477f63 [NET_SCHED]: Add hrtimer based qdisc watchdog
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:05 -07:00
Patrick McHardy 641b9e0e8b [NET_SCHED]: Use ktime as clocksource
Get rid of the manual clock source selection mess and use ktime. Also
use a scalar representation, which allows to clean up pkt_sched.h a bit
more and results in less ktime_to_ns() calls in most cases.

The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite
inefficient by this patch, following patches will convert all qdiscs
to hrtimers and get rid of them entirely.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:04 -07:00
Arnaldo Carvalho de Melo 0660e03f6b [SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h
Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo eddc9ec53b [SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo d56f90a7c9 [SK_BUFF]: Introduce skb_network_header()
For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo bbe735e424 [SK_BUFF]: Introduce skb_network_offset()
For the quite common 'skb->nh.raw - skb->data' sequence.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:58 -07:00
YOSHIFUJI Hideaki b6d9bcb069 [NET] SCHED: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:00 -07:00
Patrick McHardy bb8a954f27 [NET_SCHED]: cls_tcindex: fix compatibility breakage
Userspace uses an integer for TCA_TCINDEX_SHIFT, the kernel was changed
to expect and use a u16 value in 2.6.11, which broke compatibility on
big endian machines. Change back to use int.

Reported by Ole Reinartz <ole.reinartz@gmx.de>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-09 13:31:13 -07:00
Patrick McHardy 31ba548f96 [NET_SCHED]: cls_basic: fix memory leak in basic_destroy
tp->root is not freed on destruction.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-02 13:30:52 -07:00
Patrick McHardy c01003c205 [IFB]: Fix crash on input device removal
The input_device pointer is not refcounted, which means the device may
disappear while packets are queued, causing a crash when ifb passes packets
with a stale skb->dev pointer to netif_rx().

Fix by storing the interface index instead and do a lookup where neccessary.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-29 11:46:52 -07:00
Patrick McHardy c38c83cb70 [NET_SCHED]: sch_htb/sch_hfsc: fix oops in qlen_notify
During both HTB and HFSC class deletion the class is removed from the
class hash before calling qdisc_tree_decrease_qlen. This makes the
->get operation in qdisc_tree_decrease_qlen fail, so it passes a NULL
pointer to ->qlen_notify, causing an oops.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-27 14:04:24 -07:00
Robert P. J. Day 9b2f7bcf0e [NET]: Remove dead net/sched/Makefile entry for sch_hpfq.o.
Remove the worthless net/sched/Makefile entry for the non-existent
source file sch_hpfq.c.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-26 16:20:34 -07:00
Patrick McHardy d3fa76ee6b [NET_SCHED]: cls_basic: fix NULL pointer dereference
cls_basic doesn't allocate tp->root before it is linked into the
active classifier list, resulting in a NULL pointer dereference
when packets hit the classifier before its ->change function is
called.

Reported by Chris Madden <chris@reflexsecurity.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:11 -07:00
Dave Jones b6f99a2119 [NET]: fix up misplaced inlines.
Turning up the warnings on gcc makes it emit warnings
about the placement of 'inline' in function declarations.
Here's everything that was under net/

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-22 12:27:49 -07:00
Tim Schmielau cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Patrick McHardy 3d50f23108 [NET_SCHED]: sch_hfsc: replace ASSERT macro by WARN_ON
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13 12:36:57 -08:00
Arjan van de Ven da7071d7e3 [PATCH] mark struct file_operations const 8
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:46 -08:00
YOSHIFUJI Hideaki 10297b9931 [NET] SCHED: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:08 -08:00
Jan Engelhardt e60a13e030 [NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined structure names
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:20 -08:00
Patrick McHardy a8d0f9526f [NET]: Add UDPLITE support in a few missing spots
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:14 -08:00
Arjan van de Ven f5a6e01c09 [NET]: user of the jiffies rounding code: Networking
This patch introduces users of the round_jiffies() function in the
networking code.

These timers all were of the "about once a second" or "about once
every X seconds" variety and several showed up in the "what wakes the
cpu up" profiles that the tickless patches provide.  Some timers are
highly dynamic based on network load; but even on low activity systems
they still show up so the rounding is done only in cases of low
activity, allowing higher frequency timers in the high activity case.

The various hardware watchdogs are an obvious case; they run every 2
seconds but aren't otherwise specific of exactly when they need to
run.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:52 -08:00
Jarek Poplawski 2cf6c36cb4 [NET_SCHED] sch_prio: class statistics printing enabled
This patch adds a dump_stats callback to enable
printing of basic statistics of prio classes.
(With help of Patrick McHardy).

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:40 -08:00
Patrick McHardy 239a87c876 [NET_SCHED]: act_ipt: fix regression in ipt action
The x_tables patch broke target module autoloading in the ipt action
by replacing the ipt_find_target call (which does autoloading) by
xt_find_target (which doesn't do autoloading). Additionally xt_find_target
may return ERR_PTR values in case of an error, which are not handled.

Use xt_request_find_target, which does both autoloading and ERR_PTR
handling properly. Also don't forget to drop the target module reference
again when xt_check_target fails.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-02 00:40:36 -08:00
Jarek Poplawski 160d5e10f8 [NET_SCHED] sch_htb: turn intermediate classes into leaves
- turn intermediate classes into leaves again when their
  last child is deleted (struct htb_class changed)

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:32 -08:00
Jarek Poplawski a37ef2e325 [NET_SCHED] sch_cbq: deactivating when grafting, purging etc.
- deactivating of active classes when q.qlen drops to zero
  (cbq_drop)

- a redundant instruction removed from cbq_deactivate_class

PS: probably htb_deactivate in htb_delete and
cbq_deactivate_class in cbq_delete are also
redundant now.

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08 17:19:31 -08:00
Patrick McHardy 5c804bfdcc [NET_SCHED]: cls_fw: fix NULL pointer dereference
When the first fw classifier is initialized, there is a small window
between the ->init() and ->change() calls, during which the classifier
is active but not entirely set up and tp->root is still NULL (->init()
does nothing).

When a packet is queued during this window a NULL pointer dereference
occurs in fw_classify() when trying to dereference head->mask;

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06 18:39:07 -08:00
Kim Nordlund a163148c1b [PKT_SCHED] act_gact: division by zero
Not returning -EINVAL, because someone might want to use the value
zero in some future gact_prob algorithm?

Signed-off-by: Kim Nordlund <kim.nordlund@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:32:11 -08:00
Patrick McHardy 1e9b3d5339 [NET_SCHED]: policer: restore compatibility with old iproute binaries
The tc actions increased the size of struct tc_police, which broke
compatibility with old iproute binaries since both the act_police
and the old NET_CLS_POLICE code check for an exact size match.

Since the new members are not even used, the simple fix is to also
accept the size of the old structure. Dumping is not affected since
old userspace will receive a bigger structure, which is handled fine.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:32:07 -08:00
Adrian Bunk 5f68e4c07c [PKT_SCHED]: Remove unused exports.
This patch removes the following unused EXPORT_SYMBOL's:
- sch_api.c: qdisc_lookup
- sch_generic.c: __netdev_watchdog_up
- sch_generic.c: noop_qdisc_ops
- sch_generic.c: qdisc_alloc

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:32:06 -08:00
Patrick McHardy e488eafcc5 [NET_SCHED]: Fix endless loops (part 5): netem/tbf/hfsc ->requeue failures
When peeking at the next packet in a child qdisc by calling dequeue/requeue,
the upper qdisc qlen counter may get out of sync in case the requeue fails.
The qdisc and the child qdisc both have their counter decremented, but since
no packet is given to the upper qdisc it won't decrement its counter itself.

requeue should not fail, so this is mostly for "correctness".

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:46 -08:00
Patrick McHardy 256d61b87b [NET_SCHED]: Fix endless loops (part 4): HTB
Convert HTB to use qdisc_tree_decrease_len() and add a callback
for deactivating a class when its child queue becomes empty.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:45 -08:00
Patrick McHardy f973b913e1 [NET_SCHED]: Fix endless loops (part 3): HFSC
Convert HFSC to use qdisc_tree_decrease_len() and add a callback
for deactivating a class when its child queue becomes empty.

All queue purging goes through hfsc_purge_queue(), which is used in
three cases: grafting, class creation (when a leaf class is turned
into an intermediate class by attaching a new class) and class
deletion. In all cases qdisc_tree_decrease_len() is needed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:44 -08:00
Patrick McHardy 5e50da01d0 [NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs
Convert the "simple" qdiscs to use qdisc_tree_decrease_qlen() where
necessary:

- all graft operations
- destruction of old child qdiscs in prio, red and tbf change operation
- purging of queue in sfq change operation

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:43 -08:00
Patrick McHardy 43effa1e57 [NET_SCHED]: Fix endless loops caused by inaccurate qlen counters (part 1)
There are multiple problems related to qlen adjustment that can lead
to an upper qdisc getting out of sync with the real number of packets
queued, leading to endless dequeueing attempts by the upper layer code.

All qdiscs must maintain an accurate q.qlen counter. There are basically
two groups of operations affecting the qlen: operations that propagate
down the tree (enqueue, dequeue, requeue, drop, reset) beginning at the
root qdisc and operations only affecting a subtree or single qdisc
(change, graft, delete class). Since qlen changes during operations from
the second group don't propagate to ancestor qdiscs, their qlen values
become desynchronized.

This patch adds a function to propagate qlen changes up the qdisc tree,
optionally calling a callback function to perform qdisc-internal
maintenance when the child qdisc becomes empty. The follow-up patches
will convert all qdiscs to use this function where necessary.

Noticed by Timo Steinbach <tsteinbach@astaro.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:42 -08:00
Patrick McHardy 9f9afec482 [NET_SCHED]: Set parent classid in default qdiscs
Set parent classids in default qdiscs to allow walking up the tree
from outside the qdiscs. This is needed by the next patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:41 -08:00
Patrick McHardy 814a175e7b [NET_SCHED]: sch_htb: perform qlen adjustment immediately in ->delete
qlen adjustment should happen immediately in ->delete and not in the
class destroy function because the reference count will not hit zero in
->delete (sch_api holds a reference) but in ->put. Since the qdisc
lock is released between deletion of the class and final destruction
this creates an externally visible error in the qlen counter.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:40 -08:00