1
0
Fork 0

Merge branch 'net-fix-issues-around-register_netdevice-failures'

Jakub Kicinski says:

====================
net: fix issues around register_netdevice() failures

This series attempts to clean up the life cycle of struct
net_device. Dave has added dev->needs_free_netdev in the
past to fix double frees, we can lean on that mechanism
a little more to fix remaining issues with register_netdevice().

This is the next chapter of the saga which already includes:
commit 0e0eee2465 ("net: correct error path in rtnl_newlink()")
commit e51fb15231 ("rtnetlink: fix a memory leak when ->newlink fails")
commit cf124db566 ("net: Fix inconsistent teardown and release of private netdev state.")
commit 93ee31f14f ("[NET]: Fix free_netdev on register_netdev failure.")
commit 814152a89e ("net: fix memleak in register_netdevice()")
commit 10cc514f45 ("net: Fix null de-reference of device refcount")

The immediate problem which gets fixed here is that calling
free_netdev() right after unregister_netdevice() is illegal
because we need to release rtnl_lock first, to let the
unregistration finish. Note that unregister_netdevice() is
just a wrapper of unregister_netdevice_queue(), it only
does half of the job.

Where this limitation becomes most problematic is in failure
modes of register_netdevice(). There is a notifier call right
at the end of it, which lets other subsystems veto the entire
thing. At which point we should really go through a full
unregister_netdevice(), but we can't because callers may
go straight to free_netdev() after the failure, and that's
no bueno (see the previous paragraph).

This set makes free_netdev() more lenient, when device
is still being unregistered free_netdev() will simply set
dev->needs_free_netdev and let the unregister process do
the freeing.

With the free_netdev() problem out of the way failures in
register_netdevice() can make use of net_todo, again.
Users are still expected to call free_netdev() right after
failure but that will only set dev->needs_free_netdev.

To prevent the pathological case of:

 dev->needs_free_netdev = true;
 if (register_netdevice(dev)) {
   rtnl_unlock();
   free_netdev(dev);
 }

make register_netdevice()'s failure clear dev->needs_free_netdev.

Problems described above are only present with register_netdevice() /
unregister_netdevice(). We have two parallel APIs for registration
of devices:
 - those called outside rtnl_lock (register_netdev(), and
   unregister_netdev());
 - and those to be used under rtnl_lock - register_netdevice()
   and unregister_netdevice().
The former is trivial and has no problems. The alternative
approach to fix the latter would be to also separate the
freeing functions - i.e. add free_netdevice(). This has been
implemented (incl. converting all relevant calls in the tree)
but it feels a little unnecessary to put the burden of choosing
the right free_netdev{,ice}() call on the programmer when we
can "just do the right thing" by default.
====================

Link: https://lore.kernel.org/r/20210106184007.1821480-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
master
Jakub Kicinski 2021-01-08 19:27:44 -08:00
commit c49243e889
4 changed files with 187 additions and 36 deletions

View File

@ -10,18 +10,177 @@ Introduction
The following is a random collection of documentation regarding The following is a random collection of documentation regarding
network devices. network devices.
struct net_device allocation rules struct net_device lifetime rules
================================== ================================
Network device structures need to persist even after module is unloaded and Network device structures need to persist even after module is unloaded and
must be allocated with alloc_netdev_mqs() and friends. must be allocated with alloc_netdev_mqs() and friends.
If device has registered successfully, it will be freed on last use If device has registered successfully, it will be freed on last use
by free_netdev(). This is required to handle the pathologic case cleanly by free_netdev(). This is required to handle the pathological case cleanly
(example: rmmod mydriver </sys/class/net/myeth/mtu ) (example: ``rmmod mydriver </sys/class/net/myeth/mtu``)
alloc_netdev_mqs()/alloc_netdev() reserve extra space for driver alloc_netdev_mqs() / alloc_netdev() reserve extra space for driver
private data which gets freed when the network device is freed. If private data which gets freed when the network device is freed. If
separately allocated data is attached to the network device separately allocated data is attached to the network device
(netdev_priv(dev)) then it is up to the module exit handler to free that. (netdev_priv()) then it is up to the module exit handler to free that.
There are two groups of APIs for registering struct net_device.
First group can be used in normal contexts where ``rtnl_lock`` is not already
held: register_netdev(), unregister_netdev().
Second group can be used when ``rtnl_lock`` is already held:
register_netdevice(), unregister_netdevice(), free_netdevice().
Simple drivers
--------------
Most drivers (especially device drivers) handle lifetime of struct net_device
in context where ``rtnl_lock`` is not held (e.g. driver probe and remove paths).
In that case the struct net_device registration is done using
the register_netdev(), and unregister_netdev() functions:
.. code-block:: c
int probe()
{
struct my_device_priv *priv;
int err;
dev = alloc_netdev_mqs(...);
if (!dev)
return -ENOMEM;
priv = netdev_priv(dev);
/* ... do all device setup before calling register_netdev() ...
*/
err = register_netdev(dev);
if (err)
goto err_undo;
/* net_device is visible to the user! */
err_undo:
/* ... undo the device setup ... */
free_netdev(dev);
return err;
}
void remove()
{
unregister_netdev(dev);
free_netdev(dev);
}
Note that after calling register_netdev() the device is visible in the system.
Users can open it and start sending / receiving traffic immediately,
or run any other callback, so all initialization must be done prior to
registration.
unregister_netdev() closes the device and waits for all users to be done
with it. The memory of struct net_device itself may still be referenced
by sysfs but all operations on that device will fail.
free_netdev() can be called after unregister_netdev() returns on when
register_netdev() failed.
Device management under RTNL
----------------------------
Registering struct net_device while in context which already holds
the ``rtnl_lock`` requires extra care. In those scenarios most drivers
will want to make use of struct net_device's ``needs_free_netdev``
and ``priv_destructor`` members for freeing of state.
Example flow of netdev handling under ``rtnl_lock``:
.. code-block:: c
static void my_setup(struct net_device *dev)
{
dev->needs_free_netdev = true;
}
static void my_destructor(struct net_device *dev)
{
some_obj_destroy(priv->obj);
some_uninit(priv);
}
int create_link()
{
struct my_device_priv *priv;
int err;
ASSERT_RTNL();
dev = alloc_netdev(sizeof(*priv), "net%d", NET_NAME_UNKNOWN, my_setup);
if (!dev)
return -ENOMEM;
priv = netdev_priv(dev);
/* Implicit constructor */
err = some_init(priv);
if (err)
goto err_free_dev;
priv->obj = some_obj_create();
if (!priv->obj) {
err = -ENOMEM;
goto err_some_uninit;
}
/* End of constructor, set the destructor: */
dev->priv_destructor = my_destructor;
err = register_netdevice(dev);
if (err)
/* register_netdevice() calls destructor on failure */
goto err_free_dev;
/* If anything fails now unregister_netdevice() (or unregister_netdev())
* will take care of calling my_destructor and free_netdev().
*/
return 0;
err_some_uninit:
some_uninit(priv);
err_free_dev:
free_netdev(dev);
return err;
}
If struct net_device.priv_destructor is set it will be called by the core
some time after unregister_netdevice(), it will also be called if
register_netdevice() fails. The callback may be invoked with or without
``rtnl_lock`` held.
There is no explicit constructor callback, driver "constructs" the private
netdev state after allocating it and before registration.
Setting struct net_device.needs_free_netdev makes core call free_netdevice()
automatically after unregister_netdevice() when all references to the device
are gone. It only takes effect after a successful call to register_netdevice()
so if register_netdevice() fails driver is responsible for calling
free_netdev().
free_netdev() is safe to call on error paths right after unregister_netdevice()
or when register_netdevice() fails. Parts of netdev (de)registration process
happen after ``rtnl_lock`` is released, therefore in those cases free_netdev()
will defer some of the processing until ``rtnl_lock`` is released.
Devices spawned from struct rtnl_link_ops should never free the
struct net_device directly.
.ndo_init and .ndo_uninit
~~~~~~~~~~~~~~~~~~~~~~~~~
``.ndo_init`` and ``.ndo_uninit`` callbacks are called during net_device
registration and de-registration, under ``rtnl_lock``. Drivers can use
those e.g. when parts of their init process need to run under ``rtnl_lock``.
``.ndo_init`` runs before device is visible in the system, ``.ndo_uninit``
runs during de-registering after device is closed but other subsystems
may still have outstanding references to the netdevice.
MTU MTU
=== ===

View File

@ -284,9 +284,7 @@ static int register_vlan_device(struct net_device *real_dev, u16 vlan_id)
return 0; return 0;
out_free_newdev: out_free_newdev:
if (new_dev->reg_state == NETREG_UNINITIALIZED || free_netdev(new_dev);
new_dev->reg_state == NETREG_UNREGISTERED)
free_netdev(new_dev);
return err; return err;
} }

View File

@ -10077,17 +10077,11 @@ int register_netdevice(struct net_device *dev)
ret = call_netdevice_notifiers(NETDEV_REGISTER, dev); ret = call_netdevice_notifiers(NETDEV_REGISTER, dev);
ret = notifier_to_errno(ret); ret = notifier_to_errno(ret);
if (ret) { if (ret) {
/* Expect explicit free_netdev() on failure */
dev->needs_free_netdev = false;
rollback_registered(dev); rollback_registered(dev);
rcu_barrier(); net_set_todo(dev);
goto out;
dev->reg_state = NETREG_UNREGISTERED;
/* We should put the kobject that hold in
* netdev_unregister_kobject(), otherwise
* the net device cannot be freed when
* driver calls free_netdev(), because the
* kobject is being hold.
*/
kobject_put(&dev->dev.kobj);
} }
/* /*
* Prevent userspace races by waiting until the network * Prevent userspace races by waiting until the network
@ -10631,6 +10625,17 @@ void free_netdev(struct net_device *dev)
struct napi_struct *p, *n; struct napi_struct *p, *n;
might_sleep(); might_sleep();
/* When called immediately after register_netdevice() failed the unwind
* handling may still be dismantling the device. Handle that case by
* deferring the free.
*/
if (dev->reg_state == NETREG_UNREGISTERING) {
ASSERT_RTNL();
dev->needs_free_netdev = true;
return;
}
netif_free_tx_queues(dev); netif_free_tx_queues(dev);
netif_free_rx_queues(dev); netif_free_rx_queues(dev);

View File

@ -3439,26 +3439,15 @@ replay:
dev->ifindex = ifm->ifi_index; dev->ifindex = ifm->ifi_index;
if (ops->newlink) { if (ops->newlink)
err = ops->newlink(link_net ? : net, dev, tb, data, extack); err = ops->newlink(link_net ? : net, dev, tb, data, extack);
/* Drivers should call free_netdev() in ->destructor else
* and unregister it on failure after registration
* so that device could be finally freed in rtnl_unlock.
*/
if (err < 0) {
/* If device is not registered at all, free it now */
if (dev->reg_state == NETREG_UNINITIALIZED ||
dev->reg_state == NETREG_UNREGISTERED)
free_netdev(dev);
goto out;
}
} else {
err = register_netdevice(dev); err = register_netdevice(dev);
if (err < 0) { if (err < 0) {
free_netdev(dev); free_netdev(dev);
goto out; goto out;
}
} }
err = rtnl_configure_link(dev, ifm); err = rtnl_configure_link(dev, ifm);
if (err < 0) if (err < 0)
goto out_unregister; goto out_unregister;