1
0
Fork 0

staging: lustre: delete the filesystem from the tree.

The Lustre filesystem has been in the kernel tree for over 5 years now.
While it has been an endless source of enjoyment for new kernel
developers learning how to do basic codingstyle cleanups, as well as an
semi-entertaining source of bewilderment from the vfs developers any
time they have looked into the codebase to try to figure out how to port
their latest api changes to this filesystem, it has not really moved
forward into the "this is in shape to get out of staging" despite many
half-completed attempts.

And getting code out of staging is the main goal of that portion of the
kernel tree.  Code should not stagnate and it feels like having this
code in staging is only causing the development cycle of the filesystem
to take longer than it should.  There is a whole separate out-of-tree
copy of this codebase where the developers work on it, and then random
changes are thrown over the wall at staging at some later point in time.
This dual-tree development model has never worked, and the state of this
codebase is proof of that.

So, let's just delete the whole mess.  Now the lustre developers can go
off and work in their out-of-tree codebase and not have to worry about
providing valid changelog entries and breaking their patches up into
logical pieces.  They can take the time they have spend doing those
types of housekeeping chores and get the codebase into a much better
shape, and it can be submitted for inclusion into the real part of the
kernel tree when ready.

Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: James Simmons <jsimmons@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
hifive-unleashed-5.1
Greg Kroah-Hartman 2018-06-01 10:59:48 +02:00
parent 3b93c0f4b6
commit be65f9ed26
308 changed files with 0 additions and 195272 deletions

View File

@ -13329,15 +13329,6 @@ S: Odd Fixes
F: Documentation/devicetree/bindings/staging/iio/
F: drivers/staging/iio/
STAGING - LUSTRE PARALLEL FILESYSTEM
M: Oleg Drokin <oleg.drokin@intel.com>
M: Andreas Dilger <andreas.dilger@intel.com>
M: James Simmons <jsimmons@infradead.org>
L: lustre-devel@lists.lustre.org (moderated for non-subscribers)
W: http://wiki.lustre.org/
S: Maintained
F: drivers/staging/lustre
STAGING - NVIDIA COMPLIANT EMBEDDED CONTROLLER INTERFACE (nvec)
M: Marc Dietrich <marvin24@gmx.de>
L: ac100@lists.launchpad.net (moderated for non-subscribers)

View File

@ -84,8 +84,6 @@ source "drivers/staging/netlogic/Kconfig"
source "drivers/staging/mt29f_spinand/Kconfig"
source "drivers/staging/lustre/Kconfig"
source "drivers/staging/dgnc/Kconfig"
source "drivers/staging/gs_fpgaboot/Kconfig"

View File

@ -32,7 +32,6 @@ obj-$(CONFIG_STAGING_BOARD) += board/
obj-$(CONFIG_LTE_GDM724X) += gdm724x/
obj-$(CONFIG_FIREWIRE_SERIAL) += fwserial/
obj-$(CONFIG_GOLDFISH) += goldfish/
obj-$(CONFIG_LNET) += lustre/
obj-$(CONFIG_DGNC) += dgnc/
obj-$(CONFIG_MTD_SPINAND_MT29F) += mt29f_spinand/
obj-$(CONFIG_GS_FPGABOOT) += gs_fpgaboot/

View File

@ -1,3 +0,0 @@
source "drivers/staging/lustre/lnet/Kconfig"
source "drivers/staging/lustre/lustre/Kconfig"

View File

@ -1,2 +0,0 @@
obj-$(CONFIG_LNET) += lnet/
obj-$(CONFIG_LUSTRE_FS) += lustre/

View File

@ -1,83 +0,0 @@
Lustre Parallel Filesystem Client
=================================
The Lustre file system is an open-source, parallel file system
that supports many requirements of leadership class HPC simulation
environments.
Born from a research project at Carnegie Mellon University,
the Lustre file system is a widely-used option in HPC.
The Lustre file system provides a POSIX compliant file system interface,
can scale to thousands of clients, petabytes of storage and
hundreds of gigabytes per second of I/O bandwidth.
Unlike shared disk storage cluster filesystems (e.g. OCFS2, GFS, GPFS),
Lustre has independent Metadata and Data servers that clients can access
in parallel to maximize performance.
In order to use Lustre client you will need to download the "lustre-client"
package that contains the userspace tools from http://lustre.org/download/
You will need to install and configure your Lustre servers separately.
Mount Syntax
============
After you installed the lustre-client tools including mount.lustre binary
you can mount your Lustre filesystem with:
mount -t lustre mgs:/fsname mnt
where mgs is the host name or ip address of your Lustre MGS(management service)
fsname is the name of the filesystem you would like to mount.
Mount Options
=============
noflock
Disable posix file locking (Applications trying to use
the functionality will get ENOSYS)
localflock
Enable local flock support, using only client-local flock
(faster, for applications that require flock but do not run
on multiple nodes).
flock
Enable cluster-global posix file locking coherent across all
client nodes.
user_xattr, nouser_xattr
Support "user." extended attributes (or not)
user_fid2path, nouser_fid2path
Enable FID to path translation by regular users (or not)
checksum, nochecksum
Verify data consistency on the wire and in memory as it passes
between the layers (or not).
lruresize, nolruresize
Allow lock LRU to be controlled by memory pressure on the server
(or only 100 (default, controlled by lru_size proc parameter) locks
per CPU per server on this client).
lazystatfs, nolazystatfs
Do not block in statfs() if some of the servers are down.
32bitapi
Shrink inode numbers to fit into 32 bits. This is necessary
if you plan to reexport Lustre filesystem from this client via
NFSv4.
verbose, noverbose
Enable mount/umount console messages (or not)
More Information
================
You can get more information at the Lustre website: http://wiki.lustre.org/
Source for the userspace tools and out-of-tree client and server code
is available at: http://git.hpdd.intel.com/fs/lustre-release.git
Latest binary packages:
http://lustre.org/download/

View File

@ -1,302 +0,0 @@
Currently all the work directed toward the lustre upstream client is tracked
at the following link:
https://jira.hpdd.intel.com/browse/LU-9679
Under this ticket you will see the following work items that need to be
addressed:
******************************************************************************
* libcfs cleanup
*
* https://jira.hpdd.intel.com/browse/LU-9859
*
* Track all the cleanups and simplification of the libcfs module. Remove
* functions the kernel provides. Possibly integrate some of the functionality
* into the kernel proper.
*
******************************************************************************
https://jira.hpdd.intel.com/browse/LU-100086
LNET_MINOR conflicts with USERIO_MINOR
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-8130
Fix and simplify libcfs hash handling
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-8703
The current way we handle SMP is wrong. Platforms like ARM and KNL can have
core and NUMA setups with things like NUMA nodes with no cores. We need to
handle such cases. This work also greatly simplified the lustre SMP code.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9019
Replace libcfs time API with standard kernel APIs. Also migrate away from
jiffies. We found jiffies can vary on nodes which can lead to corner cases
that can break the file system due to nodes having inconsistent behavior.
So move to time64_t and ktime_t as much as possible.
******************************************************************************
* Proper IB support for ko2iblnd
******************************************************************************
https://jira.hpdd.intel.com/browse/LU-9179
Poor performance for the ko2iblnd driver. This is related to many of the
patches below that are missing from the linux client.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9886
Crash in upstream kiblnd_handle_early_rxs()
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-10394 / LU-10526 / LU-10089
Default to default to using MEM_REG
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-10459
throttle tx based on queue depth
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9943
correct WR fast reg accounting
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-10291
remove concurrent_sends tunable
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-10213
calculate qp max_send_wrs properly
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9810
use less CQ entries for each connection
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-10129 / LU-9180
rework map_on_demand behavior
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-10129
query device capabilities
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-10015
fix race at kiblnd_connect_peer
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9983
allow for discontiguous fragments
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9500
Don't Page Align remote_addr with FastReg
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9448
handle empty CPTs
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9507
Don't Assert On Reconnect with MultiQP
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9472
Fix FastReg map/unmap for MLX5
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9425
Turn on 2 sges by default
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-8943
Enable Multiple OPA Endpoints between Nodes
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-5718
multiple sges for work request
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9094
kill timedout txs from ibp_tx_queue
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9094
reconnect peer for REJ_INVALID_SERVICE_ID
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-8752
Stop MLX5 triggering a dump_cqe
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-8874
Move ko2iblnd to latest RDMA changes
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-8875 / LU-8874
Change to new RDMA done callback mechanism
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9164 / LU-8874
Incorporate RDMA map/unamp API's into ko2iblnd
******************************************************************************
* sysfs/debugfs fixes
*
* https://jira.hpdd.intel.com/browse/LU-8066
*
* The original migration to sysfs was done in haste without properly working
* utilities to test the changes. This covers the work to restore the proper
* behavior. Huge project to make this right.
*
******************************************************************************
https://jira.hpdd.intel.com/browse/LU-9431
The function class_process_proc_param was used for our mass updates of proc
tunables. It didn't work with sysfs and it was just ugly so it was removed.
In the process the ability to mass update thousands of clients was lost. This
work restores this in a sane way.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9091
One the major request of users is the ability to pass in parameters into a
sysfs file in various different units. For example we can set max_pages_per_rpc
but this can vary on platforms due to different platform sizes. So you can
set this like max_pages_per_rpc=16MiB. The original code to handle this written
before the string helpers were created so the code doesn't follow that format
but it would be easy to move to. Currently the string helpers does the reverse
of what we need, changing bytes to string. We need to change a string to bytes.
******************************************************************************
* Proper user land to kernel space interface for Lustre
*
* https://jira.hpdd.intel.com/browse/LU-9680
*
******************************************************************************
https://jira.hpdd.intel.com/browse/LU-8915
Don't use linux list structure as user land arguments for lnet selftest.
This code is pretty poor quality and really needs to be reworked.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-8834
The lustre ioctl LL_IOC_FUTIMES_3 is very generic. Need to either work with
other file systems with similar functionality and make a common syscall
interface or rework our server code to automagically do it for us.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-6202
Cleanup up ioctl handling. We have many obsolete ioctls. Also the way we do
ioctls can be changed over to netlink. This also has the benefit of working
better with HPC systems that do IO forwarding. Such systems don't like ioctls
very well.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9667
More cleanups by making our utilities use sysfs instead of ioctls for LNet.
Also it has been requested to move the remaining ioctls to the netlink API.
******************************************************************************
* Misc
******************************************************************************
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9855
Clean up obdclass preprocessor code. One of the major eye sores is the various
pointer redirections and macros used by the obdclass. This makes the code very
difficult to understand. It was requested by the Al Viro to clean this up before
we leave staging.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9633
Migrate to sphinx kernel-doc style comments. Add documents in Documentation.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-6142
Possible remaining coding style fix. Remove deadcode. Enforce kernel code
style. Other minor misc cleanups...
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-8837
Separate client/server functionality. Functions only used by server can be
removed from client. Most of this has been done but we need a inspect of the
code to make sure.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-8964
Lustre client readahead/writeback control needs to better suit kernel providings.
Currently its being explored. We could end up replacing the CLIO read ahead
abstract with the kernel proper version.
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9862
Patch that landed for LU-7890 leads to static checker errors
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-9868
dcache/namei fixes for lustre
------------------------------------------------------------------------------
https://jira.hpdd.intel.com/browse/LU-10467
use standard linux wait_events macros work by Neil Brown
------------------------------------------------------------------------------
Please send any patches to Greg Kroah-Hartman <greg@kroah.com>, Andreas Dilger
<andreas.dilger@intel.com>, James Simmons <jsimmons@infradead.org> and
Oleg Drokin <oleg.drokin@intel.com>.

View File

@ -1,76 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#ifndef __LIBCFS_LIBCFS_H__
#define __LIBCFS_LIBCFS_H__
#include <linux/notifier.h>
#include <linux/workqueue.h>
#include <linux/sysctl.h>
#include <linux/libcfs/libcfs_debug.h>
#include <linux/libcfs/libcfs_private.h>
#include <linux/libcfs/libcfs_fail.h>
#define LIBCFS_VERSION "0.7.0"
extern struct blocking_notifier_head libcfs_ioctl_list;
static inline int notifier_from_ioctl_errno(int err)
{
if (err == -EINVAL)
return NOTIFY_OK;
return notifier_from_errno(err) | NOTIFY_STOP_MASK;
}
int libcfs_setup(void);
extern struct workqueue_struct *cfs_rehash_wq;
void lustre_insert_debugfs(struct ctl_table *table);
int lprocfs_call_handler(void *data, int write, loff_t *ppos,
void __user *buffer, size_t *lenp,
int (*handler)(void *data, int write, loff_t pos,
void __user *buffer, int len));
/*
* Memory
*/
#if BITS_PER_LONG == 32
/* limit to lowmem on 32-bit systems */
#define NUM_CACHEPAGES \
min(totalram_pages, 1UL << (30 - PAGE_SHIFT) * 3 / 4)
#else
#define NUM_CACHEPAGES totalram_pages
#endif
#endif /* __LIBCFS_LIBCFS_H__ */

View File

@ -1,434 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* GPL HEADER END
*/
/*
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2012, 2015 Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* libcfs/include/libcfs/libcfs_cpu.h
*
* CPU partition
* . CPU partition is virtual processing unit
*
* . CPU partition can present 1-N cores, or 1-N NUMA nodes,
* in other words, CPU partition is a processors pool.
*
* CPU Partition Table (CPT)
* . a set of CPU partitions
*
* . There are two modes for CPT: CFS_CPU_MODE_NUMA and CFS_CPU_MODE_SMP
*
* . User can specify total number of CPU partitions while creating a
* CPT, ID of CPU partition is always start from 0.
*
* Example: if there are 8 cores on the system, while creating a CPT
* with cpu_npartitions=4:
* core[0, 1] = partition[0], core[2, 3] = partition[1]
* core[4, 5] = partition[2], core[6, 7] = partition[3]
*
* cpu_npartitions=1:
* core[0, 1, ... 7] = partition[0]
*
* . User can also specify CPU partitions by string pattern
*
* Examples: cpu_partitions="0[0,1], 1[2,3]"
* cpu_partitions="N 0[0-3], 1[4-8]"
*
* The first character "N" means following numbers are numa ID
*
* . NUMA allocators, CPU affinity threads are built over CPU partitions,
* instead of HW CPUs or HW nodes.
*
* . By default, Lustre modules should refer to the global cfs_cpt_tab,
* instead of accessing HW CPUs directly, so concurrency of Lustre can be
* configured by cpu_npartitions of the global cfs_cpt_tab
*
* . If cpu_npartitions=1(all CPUs in one pool), lustre should work the
* same way as 2.2 or earlier versions
*
* Author: liang@whamcloud.com
*/
#ifndef __LIBCFS_CPU_H__
#define __LIBCFS_CPU_H__
#include <linux/cpu.h>
#include <linux/cpuset.h>
#include <linux/topology.h>
/* any CPU partition */
#define CFS_CPT_ANY (-1)
#ifdef CONFIG_SMP
/** virtual processing unit */
struct cfs_cpu_partition {
/* CPUs mask for this partition */
cpumask_var_t cpt_cpumask;
/* nodes mask for this partition */
nodemask_t *cpt_nodemask;
/* spread rotor for NUMA allocator */
unsigned int cpt_spread_rotor;
};
/** descriptor for CPU partitions */
struct cfs_cpt_table {
/* version, reserved for hotplug */
unsigned int ctb_version;
/* spread rotor for NUMA allocator */
unsigned int ctb_spread_rotor;
/* # of CPU partitions */
unsigned int ctb_nparts;
/* partitions tables */
struct cfs_cpu_partition *ctb_parts;
/* shadow HW CPU to CPU partition ID */
int *ctb_cpu2cpt;
/* all cpus in this partition table */
cpumask_var_t ctb_cpumask;
/* all nodes in this partition table */
nodemask_t *ctb_nodemask;
};
extern struct cfs_cpt_table *cfs_cpt_tab;
/**
* return cpumask of CPU partition \a cpt
*/
cpumask_var_t *cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt);
/**
* print string information of cpt-table
*/
int cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len);
/**
* return total number of CPU partitions in \a cptab
*/
int
cfs_cpt_number(struct cfs_cpt_table *cptab);
/**
* return number of HW cores or hyper-threadings in a CPU partition \a cpt
*/
int cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt);
/**
* is there any online CPU in CPU partition \a cpt
*/
int cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt);
/**
* return nodemask of CPU partition \a cpt
*/
nodemask_t *cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt);
/**
* shadow current HW processor ID to CPU-partition ID of \a cptab
*/
int cfs_cpt_current(struct cfs_cpt_table *cptab, int remap);
/**
* shadow HW processor ID \a CPU to CPU-partition ID by \a cptab
*/
int cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu);
/**
* bind current thread on a CPU-partition \a cpt of \a cptab
*/
int cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt);
/**
* add \a cpu to CPU partition @cpt of \a cptab, return 1 for success,
* otherwise 0 is returned
*/
int cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu);
/**
* remove \a cpu from CPU partition \a cpt of \a cptab
*/
void cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu);
/**
* add all cpus in \a mask to CPU partition \a cpt
* return 1 if successfully set all CPUs, otherwise return 0
*/
int cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab,
int cpt, cpumask_t *mask);
/**
* remove all cpus in \a mask from CPU partition \a cpt
*/
void cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab,
int cpt, cpumask_t *mask);
/**
* add all cpus in NUMA node \a node to CPU partition \a cpt
* return 1 if successfully set all CPUs, otherwise return 0
*/
int cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node);
/**
* remove all cpus in NUMA node \a node from CPU partition \a cpt
*/
void cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node);
/**
* add all cpus in node mask \a mask to CPU partition \a cpt
* return 1 if successfully set all CPUs, otherwise return 0
*/
int cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab,
int cpt, nodemask_t *mask);
/**
* remove all cpus in node mask \a mask from CPU partition \a cpt
*/
void cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab,
int cpt, nodemask_t *mask);
/**
* unset all cpus for CPU partition \a cpt
*/
void cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt);
/**
* convert partition id \a cpt to numa node id, if there are more than one
* nodes in this partition, it might return a different node id each time.
*/
int cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt);
/**
* return number of HTs in the same core of \a cpu
*/
int cfs_cpu_ht_nsiblings(int cpu);
int cfs_cpu_init(void);
void cfs_cpu_fini(void);
#else /* !CONFIG_SMP */
struct cfs_cpt_table;
#define cfs_cpt_tab ((struct cfs_cpt_table *)NULL)
static inline cpumask_var_t *
cfs_cpt_cpumask(struct cfs_cpt_table *cptab, int cpt)
{
return NULL;
}
static inline int
cfs_cpt_table_print(struct cfs_cpt_table *cptab, char *buf, int len)
{
return 0;
}
static inline int
cfs_cpt_number(struct cfs_cpt_table *cptab)
{
return 1;
}
static inline int
cfs_cpt_weight(struct cfs_cpt_table *cptab, int cpt)
{
return 1;
}
static inline int
cfs_cpt_online(struct cfs_cpt_table *cptab, int cpt)
{
return 1;
}
static inline nodemask_t *
cfs_cpt_nodemask(struct cfs_cpt_table *cptab, int cpt)
{
return NULL;
}
static inline int
cfs_cpt_set_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
return 1;
}
static inline void
cfs_cpt_unset_cpu(struct cfs_cpt_table *cptab, int cpt, int cpu)
{
}
static inline int
cfs_cpt_set_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
{
return 1;
}
static inline void
cfs_cpt_unset_cpumask(struct cfs_cpt_table *cptab, int cpt, cpumask_t *mask)
{
}
static inline int
cfs_cpt_set_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
return 1;
}
static inline void
cfs_cpt_unset_node(struct cfs_cpt_table *cptab, int cpt, int node)
{
}
static inline int
cfs_cpt_set_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
{
return 1;
}
static inline void
cfs_cpt_unset_nodemask(struct cfs_cpt_table *cptab, int cpt, nodemask_t *mask)
{
}
static inline void
cfs_cpt_clear(struct cfs_cpt_table *cptab, int cpt)
{
}
static inline int
cfs_cpt_spread_node(struct cfs_cpt_table *cptab, int cpt)
{
return 0;
}
static inline int
cfs_cpu_ht_nsiblings(int cpu)
{
return 1;
}
static inline int
cfs_cpt_current(struct cfs_cpt_table *cptab, int remap)
{
return 0;
}
static inline int
cfs_cpt_of_cpu(struct cfs_cpt_table *cptab, int cpu)
{
return 0;
}
static inline int
cfs_cpt_bind(struct cfs_cpt_table *cptab, int cpt)
{
return 0;
}
static inline int
cfs_cpu_init(void)
{
return 0;
}
static inline void cfs_cpu_fini(void)
{
}
#endif /* CONFIG_SMP */
/**
* destroy a CPU partition table
*/
void cfs_cpt_table_free(struct cfs_cpt_table *cptab);
/**
* create a cfs_cpt_table with \a ncpt number of partitions
*/
struct cfs_cpt_table *cfs_cpt_table_alloc(unsigned int ncpt);
/*
* allocate per-cpu-partition data, returned value is an array of pointers,
* variable can be indexed by CPU ID.
* cptab != NULL: size of array is number of CPU partitions
* cptab == NULL: size of array is number of HW cores
*/
void *cfs_percpt_alloc(struct cfs_cpt_table *cptab, unsigned int size);
/*
* destroy per-cpu-partition variable
*/
void cfs_percpt_free(void *vars);
int cfs_percpt_number(void *vars);
#define cfs_percpt_for_each(var, i, vars) \
for (i = 0; i < cfs_percpt_number(vars) && \
((var) = (vars)[i]) != NULL; i++)
/*
* percpu partition lock
*
* There are some use-cases like this in Lustre:
* . each CPU partition has it's own private data which is frequently changed,
* and mostly by the local CPU partition.
* . all CPU partitions share some global data, these data are rarely changed.
*
* LNet is typical example.
* CPU partition lock is designed for this kind of use-cases:
* . each CPU partition has it's own private lock
* . change on private data just needs to take the private lock
* . read on shared data just needs to take _any_ of private locks
* . change on shared data needs to take _all_ private locks,
* which is slow and should be really rare.
*/
enum {
CFS_PERCPT_LOCK_EX = -1, /* negative */
};
struct cfs_percpt_lock {
/* cpu-partition-table for this lock */
struct cfs_cpt_table *pcl_cptab;
/* exclusively locked */
unsigned int pcl_locked;
/* private lock table */
spinlock_t **pcl_locks;
};
/* return number of private locks */
#define cfs_percpt_lock_num(pcl) cfs_cpt_number(pcl->pcl_cptab)
/*
* create a cpu-partition lock based on CPU partition table \a cptab,
* each private lock has extra \a psize bytes padding data
*/
struct cfs_percpt_lock *cfs_percpt_lock_create(struct cfs_cpt_table *cptab,
struct lock_class_key *keys);
/* destroy a cpu-partition lock */
void cfs_percpt_lock_free(struct cfs_percpt_lock *pcl);
/* lock private lock \a index of \a pcl */
void cfs_percpt_lock(struct cfs_percpt_lock *pcl, int index);
/* unlock private lock \a index of \a pcl */
void cfs_percpt_unlock(struct cfs_percpt_lock *pcl, int index);
#define CFS_PERCPT_LOCK_KEYS 256
/* NB: don't allocate keys dynamically, lockdep needs them to be in ".data" */
#define cfs_percpt_lock_alloc(cptab) \
({ \
static struct lock_class_key ___keys[CFS_PERCPT_LOCK_KEYS]; \
struct cfs_percpt_lock *___lk; \
\
if (cfs_cpt_number(cptab) > CFS_PERCPT_LOCK_KEYS) \
___lk = cfs_percpt_lock_create(cptab, NULL); \
else \
___lk = cfs_percpt_lock_create(cptab, ___keys); \
___lk; \
})
/**
* iterate over all CPU partitions in \a cptab
*/
#define cfs_cpt_for_each(i, cptab) \
for (i = 0; i < cfs_cpt_number(cptab); i++)
#endif /* __LIBCFS_CPU_H__ */

View File

@ -1,208 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see http://www.gnu.org/licenses
*
* Please visit http://www.xyratex.com/contact if you need additional
* information or have any questions.
*
* GPL HEADER END
*/
/*
* Copyright 2012 Xyratex Technology Limited
*/
#ifndef _LIBCFS_CRYPTO_H
#define _LIBCFS_CRYPTO_H
#include <linux/string.h>
struct page;
struct cfs_crypto_hash_type {
char *cht_name; /*< hash algorithm name, equal to
* format name for crypto api
*/
unsigned int cht_key; /*< init key by default (valid for
* 4 bytes context like crc32, adler
*/
unsigned int cht_size; /**< hash digest size */
};
enum cfs_crypto_hash_alg {
CFS_HASH_ALG_NULL = 0,
CFS_HASH_ALG_ADLER32,
CFS_HASH_ALG_CRC32,
CFS_HASH_ALG_MD5,
CFS_HASH_ALG_SHA1,
CFS_HASH_ALG_SHA256,
CFS_HASH_ALG_SHA384,
CFS_HASH_ALG_SHA512,
CFS_HASH_ALG_CRC32C,
CFS_HASH_ALG_MAX,
CFS_HASH_ALG_UNKNOWN = 0xff
};
static struct cfs_crypto_hash_type hash_types[] = {
[CFS_HASH_ALG_NULL] = {
.cht_name = "null",
.cht_key = 0,
.cht_size = 0
},
[CFS_HASH_ALG_ADLER32] = {
.cht_name = "adler32",
.cht_key = 1,
.cht_size = 4
},
[CFS_HASH_ALG_CRC32] = {
.cht_name = "crc32",
.cht_key = ~0,
.cht_size = 4
},
[CFS_HASH_ALG_CRC32C] = {
.cht_name = "crc32c",
.cht_key = ~0,
.cht_size = 4
},
[CFS_HASH_ALG_MD5] = {
.cht_name = "md5",
.cht_key = 0,
.cht_size = 16
},
[CFS_HASH_ALG_SHA1] = {
.cht_name = "sha1",
.cht_key = 0,
.cht_size = 20
},
[CFS_HASH_ALG_SHA256] = {
.cht_name = "sha256",
.cht_key = 0,
.cht_size = 32
},
[CFS_HASH_ALG_SHA384] = {
.cht_name = "sha384",
.cht_key = 0,
.cht_size = 48
},
[CFS_HASH_ALG_SHA512] = {
.cht_name = "sha512",
.cht_key = 0,
.cht_size = 64
},
[CFS_HASH_ALG_MAX] = {
.cht_name = NULL,
.cht_key = 0,
.cht_size = 64
},
};
/* Maximum size of hash_types[].cht_size */
#define CFS_CRYPTO_HASH_DIGESTSIZE_MAX 64
/**
* Return hash algorithm information for the specified algorithm identifier
*
* Hash information includes algorithm name, initial seed, hash size.
*
* \retval cfs_crypto_hash_type for valid ID (CFS_HASH_ALG_*)
* \retval NULL for unknown algorithm identifier
*/
static inline const struct cfs_crypto_hash_type *
cfs_crypto_hash_type(enum cfs_crypto_hash_alg hash_alg)
{
struct cfs_crypto_hash_type *ht;
if (hash_alg < CFS_HASH_ALG_MAX) {
ht = &hash_types[hash_alg];
if (ht->cht_name)
return ht;
}
return NULL;
}
/**
* Return hash name for hash algorithm identifier
*
* \param[in] hash_alg hash alrgorithm id (CFS_HASH_ALG_*)
*
* \retval string name of known hash algorithm
* \retval "unknown" if hash algorithm is unknown
*/
static inline const char *
cfs_crypto_hash_name(enum cfs_crypto_hash_alg hash_alg)
{
const struct cfs_crypto_hash_type *ht;
ht = cfs_crypto_hash_type(hash_alg);
if (ht)
return ht->cht_name;
return "unknown";
}
/**
* Return digest size for hash algorithm type
*
* \param[in] hash_alg hash alrgorithm id (CFS_HASH_ALG_*)
*
* \retval hash algorithm digest size in bytes
* \retval 0 if hash algorithm type is unknown
*/
static inline int cfs_crypto_hash_digestsize(enum cfs_crypto_hash_alg hash_alg)
{
const struct cfs_crypto_hash_type *ht;
ht = cfs_crypto_hash_type(hash_alg);
if (ht)
return ht->cht_size;
return 0;
}
/**
* Find hash algorithm ID for the specified algorithm name
*
* \retval hash algorithm ID for valid ID (CFS_HASH_ALG_*)
* \retval CFS_HASH_ALG_UNKNOWN for unknown algorithm name
*/
static inline unsigned char cfs_crypto_hash_alg(const char *algname)
{
enum cfs_crypto_hash_alg hash_alg;
for (hash_alg = 0; hash_alg < CFS_HASH_ALG_MAX; hash_alg++)
if (!strcmp(hash_types[hash_alg].cht_name, algname))
return hash_alg;
return CFS_HASH_ALG_UNKNOWN;
}
int cfs_crypto_hash_digest(enum cfs_crypto_hash_alg hash_alg,
const void *buf, unsigned int buf_len,
unsigned char *key, unsigned int key_len,
unsigned char *hash, unsigned int *hash_len);
struct ahash_request *
cfs_crypto_hash_init(enum cfs_crypto_hash_alg hash_alg,
unsigned char *key, unsigned int key_len);
int cfs_crypto_hash_update_page(struct ahash_request *desc,
struct page *page, unsigned int offset,
unsigned int len);
int cfs_crypto_hash_update(struct ahash_request *desc, const void *buf,
unsigned int buf_len);
int cfs_crypto_hash_final(struct ahash_request *desc,
unsigned char *hash, unsigned int *hash_len);
int cfs_crypto_register(void);
void cfs_crypto_unregister(void);
int cfs_crypto_hash_speed(enum cfs_crypto_hash_alg hash_alg);
#endif

View File

@ -1,207 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* libcfs/include/libcfs/libcfs_debug.h
*
* Debug messages and assertions
*
*/
#ifndef __LIBCFS_DEBUG_H__
#define __LIBCFS_DEBUG_H__
#include <linux/limits.h>
#include <uapi/linux/lnet/libcfs_debug.h>
/*
* Debugging
*/
extern unsigned int libcfs_subsystem_debug;
extern unsigned int libcfs_stack;
extern unsigned int libcfs_debug;
extern unsigned int libcfs_printk;
extern unsigned int libcfs_console_ratelimit;
extern unsigned int libcfs_console_max_delay;
extern unsigned int libcfs_console_min_delay;
extern unsigned int libcfs_console_backoff;
extern unsigned int libcfs_debug_binary;
extern char libcfs_debug_file_path_arr[PATH_MAX];
int libcfs_debug_mask2str(char *str, int size, int mask, int is_subsys);
int libcfs_debug_str2mask(int *mask, const char *str, int is_subsys);
/* Has there been an LBUG? */
extern unsigned int libcfs_catastrophe;
extern unsigned int libcfs_panic_on_lbug;
/* Enable debug-checks on stack size - except on x86_64 */
#if !defined(__x86_64__)
# ifdef __ia64__
# define CDEBUG_STACK() (THREAD_SIZE - \
((unsigned long)__builtin_dwarf_cfa() & \
(THREAD_SIZE - 1)))
# else
# define CDEBUG_STACK() (THREAD_SIZE - \
((unsigned long)__builtin_frame_address(0) & \
(THREAD_SIZE - 1)))
# endif /* __ia64__ */
#define __CHECK_STACK(msgdata, mask, cdls) \
do { \
if (unlikely(CDEBUG_STACK() > libcfs_stack)) { \
LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL); \
libcfs_stack = CDEBUG_STACK(); \
libcfs_debug_msg(msgdata, \
"maximum lustre stack %lu\n", \
CDEBUG_STACK()); \
(msgdata)->msg_mask = mask; \
(msgdata)->msg_cdls = cdls; \
dump_stack(); \
/*panic("LBUG");*/ \
} \
} while (0)
#define CFS_CHECK_STACK(msgdata, mask, cdls) __CHECK_STACK(msgdata, mask, cdls)
#else /* __x86_64__ */
#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
#define CDEBUG_STACK() (0L)
#endif /* __x86_64__ */
#ifndef DEBUG_SUBSYSTEM
# define DEBUG_SUBSYSTEM S_UNDEFINED
#endif
#define CDEBUG_DEFAULT_MAX_DELAY (600 * HZ) /* jiffies */
#define CDEBUG_DEFAULT_MIN_DELAY ((HZ + 1) / 2) /* jiffies */
#define CDEBUG_DEFAULT_BACKOFF 2
struct cfs_debug_limit_state {
unsigned long cdls_next;
unsigned int cdls_delay;
int cdls_count;
};
struct libcfs_debug_msg_data {
const char *msg_file;
const char *msg_fn;
int msg_subsys;
int msg_line;
int msg_mask;
struct cfs_debug_limit_state *msg_cdls;
};
#define LIBCFS_DEBUG_MSG_DATA_INIT(data, mask, cdls) \
do { \
(data)->msg_subsys = DEBUG_SUBSYSTEM; \
(data)->msg_file = __FILE__; \
(data)->msg_fn = __func__; \
(data)->msg_line = __LINE__; \
(data)->msg_cdls = (cdls); \
(data)->msg_mask = (mask); \
} while (0)
#define LIBCFS_DEBUG_MSG_DATA_DECL(dataname, mask, cdls) \
static struct libcfs_debug_msg_data dataname = { \
.msg_subsys = DEBUG_SUBSYSTEM, \
.msg_file = __FILE__, \
.msg_fn = __func__, \
.msg_line = __LINE__, \
.msg_cdls = (cdls) }; \
dataname.msg_mask = (mask)
/**
* Filters out logging messages based on mask and subsystem.
*/
static inline int cfs_cdebug_show(unsigned int mask, unsigned int subsystem)
{
return mask & D_CANTMASK ||
((libcfs_debug & mask) && (libcfs_subsystem_debug & subsystem));
}
#define __CDEBUG(cdls, mask, format, ...) \
do { \
static struct libcfs_debug_msg_data msgdata; \
\
CFS_CHECK_STACK(&msgdata, mask, cdls); \
\
if (cfs_cdebug_show(mask, DEBUG_SUBSYSTEM)) { \
LIBCFS_DEBUG_MSG_DATA_INIT(&msgdata, mask, cdls); \
libcfs_debug_msg(&msgdata, format, ## __VA_ARGS__); \
} \
} while (0)
#define CDEBUG(mask, format, ...) __CDEBUG(NULL, mask, format, ## __VA_ARGS__)
#define CDEBUG_LIMIT(mask, format, ...) \
do { \
static struct cfs_debug_limit_state cdls; \
\
__CDEBUG(&cdls, mask, format, ## __VA_ARGS__); \
} while (0)
/*
* Lustre Error Checksum: calculates checksum
* of Hex number by XORing the nybbles.
*/
#define LERRCHKSUM(hexnum) (((hexnum) & 0xf) ^ ((hexnum) >> 4 & 0xf) ^ \
((hexnum) >> 8 & 0xf))
#define CWARN(format, ...) CDEBUG_LIMIT(D_WARNING, format, ## __VA_ARGS__)
#define CERROR(format, ...) CDEBUG_LIMIT(D_ERROR, format, ## __VA_ARGS__)
#define CNETERR(format, a...) CDEBUG_LIMIT(D_NETERROR, format, ## a)
#define CEMERG(format, ...) CDEBUG_LIMIT(D_EMERG, format, ## __VA_ARGS__)
#define LCONSOLE(mask, format, ...) CDEBUG(D_CONSOLE | (mask), format, ## __VA_ARGS__)
#define LCONSOLE_INFO(format, ...) CDEBUG_LIMIT(D_CONSOLE, format, ## __VA_ARGS__)
#define LCONSOLE_WARN(format, ...) CDEBUG_LIMIT(D_CONSOLE | D_WARNING, format, ## __VA_ARGS__)
#define LCONSOLE_ERROR_MSG(errnum, format, ...) CDEBUG_LIMIT(D_CONSOLE | D_ERROR, \
"%x-%x: " format, errnum, LERRCHKSUM(errnum), ## __VA_ARGS__)
#define LCONSOLE_ERROR(format, ...) LCONSOLE_ERROR_MSG(0x00, format, ## __VA_ARGS__)
#define LCONSOLE_EMERG(format, ...) CDEBUG(D_CONSOLE | D_EMERG, format, ## __VA_ARGS__)
int libcfs_debug_msg(struct libcfs_debug_msg_data *msgdata,
const char *format1, ...)
__printf(2, 3);
int libcfs_debug_vmsg2(struct libcfs_debug_msg_data *msgdata,
const char *format1,
va_list args, const char *format2, ...)
__printf(4, 5);
/* other external symbols that tracefile provides: */
int cfs_trace_copyin_string(char *knl_buffer, int knl_buffer_nob,
const char __user *usr_buffer, int usr_buffer_nob);
int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob,
const char *knl_buffer, char *append);
#define LIBCFS_DEBUG_FILE_PATH_DEFAULT "/tmp/lustre-log"
#endif /* __LIBCFS_DEBUG_H__ */

View File

@ -1,194 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see http://www.gnu.org/licenses
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Oracle Corporation, Inc.
*/
#ifndef _LIBCFS_FAIL_H
#define _LIBCFS_FAIL_H
#include <linux/sched.h>
#include <linux/wait.h>
extern unsigned long cfs_fail_loc;
extern unsigned int cfs_fail_val;
extern int cfs_fail_err;
extern wait_queue_head_t cfs_race_waitq;
extern int cfs_race_state;
int __cfs_fail_check_set(u32 id, u32 value, int set);
int __cfs_fail_timeout_set(u32 id, u32 value, int ms, int set);
enum {
CFS_FAIL_LOC_NOSET = 0,
CFS_FAIL_LOC_ORSET = 1,
CFS_FAIL_LOC_RESET = 2,
CFS_FAIL_LOC_VALUE = 3
};
/* Failure injection control */
#define CFS_FAIL_MASK_SYS 0x0000FF00
#define CFS_FAIL_MASK_LOC (0x000000FF | CFS_FAIL_MASK_SYS)
#define CFS_FAILED_BIT 30
/* CFS_FAILED is 0x40000000 */
#define CFS_FAILED BIT(CFS_FAILED_BIT)
#define CFS_FAIL_ONCE_BIT 31
/* CFS_FAIL_ONCE is 0x80000000 */
#define CFS_FAIL_ONCE BIT(CFS_FAIL_ONCE_BIT)
/* The following flags aren't made to be combined */
#define CFS_FAIL_SKIP 0x20000000 /* skip N times then fail */
#define CFS_FAIL_SOME 0x10000000 /* only fail N times */
#define CFS_FAIL_RAND 0x08000000 /* fail 1/N of the times */
#define CFS_FAIL_USR1 0x04000000 /* user flag */
#define CFS_FAULT 0x02000000 /* match any CFS_FAULT_CHECK */
static inline bool CFS_FAIL_PRECHECK(u32 id)
{
return cfs_fail_loc &&
((cfs_fail_loc & CFS_FAIL_MASK_LOC) == (id & CFS_FAIL_MASK_LOC) ||
(cfs_fail_loc & id & CFS_FAULT));
}
static inline int cfs_fail_check_set(u32 id, u32 value,
int set, int quiet)
{
int ret = 0;
if (unlikely(CFS_FAIL_PRECHECK(id))) {
ret = __cfs_fail_check_set(id, value, set);
if (ret) {
if (quiet) {
CDEBUG(D_INFO, "*** cfs_fail_loc=%x, val=%u***\n",
id, value);
} else {
LCONSOLE_INFO("*** cfs_fail_loc=%x, val=%u***\n",
id, value);
}
}
}
return ret;
}
/* If id hit cfs_fail_loc, return 1, otherwise return 0 */
#define CFS_FAIL_CHECK(id) \
cfs_fail_check_set(id, 0, CFS_FAIL_LOC_NOSET, 0)
#define CFS_FAIL_CHECK_QUIET(id) \
cfs_fail_check_set(id, 0, CFS_FAIL_LOC_NOSET, 1)
/*
* If id hit cfs_fail_loc and cfs_fail_val == (-1 or value) return 1,
* otherwise return 0
*/
#define CFS_FAIL_CHECK_VALUE(id, value) \
cfs_fail_check_set(id, value, CFS_FAIL_LOC_VALUE, 0)
#define CFS_FAIL_CHECK_VALUE_QUIET(id, value) \
cfs_fail_check_set(id, value, CFS_FAIL_LOC_VALUE, 1)
/*
* If id hit cfs_fail_loc, cfs_fail_loc |= value and return 1,
* otherwise return 0
*/
#define CFS_FAIL_CHECK_ORSET(id, value) \
cfs_fail_check_set(id, value, CFS_FAIL_LOC_ORSET, 0)
#define CFS_FAIL_CHECK_ORSET_QUIET(id, value) \
cfs_fail_check_set(id, value, CFS_FAIL_LOC_ORSET, 1)
/*
* If id hit cfs_fail_loc, cfs_fail_loc = value and return 1,
* otherwise return 0
*/
#define CFS_FAIL_CHECK_RESET(id, value) \
cfs_fail_check_set(id, value, CFS_FAIL_LOC_RESET, 0)
#define CFS_FAIL_CHECK_RESET_QUIET(id, value) \
cfs_fail_check_set(id, value, CFS_FAIL_LOC_RESET, 1)
static inline int cfs_fail_timeout_set(u32 id, u32 value, int ms, int set)
{
if (unlikely(CFS_FAIL_PRECHECK(id)))
return __cfs_fail_timeout_set(id, value, ms, set);
return 0;
}
/* If id hit cfs_fail_loc, sleep for seconds or milliseconds */
#define CFS_FAIL_TIMEOUT(id, secs) \
cfs_fail_timeout_set(id, 0, (secs) * 1000, CFS_FAIL_LOC_NOSET)
#define CFS_FAIL_TIMEOUT_MS(id, ms) \
cfs_fail_timeout_set(id, 0, ms, CFS_FAIL_LOC_NOSET)
/*
* If id hit cfs_fail_loc, cfs_fail_loc |= value and
* sleep seconds or milliseconds
*/
#define CFS_FAIL_TIMEOUT_ORSET(id, value, secs) \
cfs_fail_timeout_set(id, value, (secs) * 1000, CFS_FAIL_LOC_ORSET)
#define CFS_FAIL_TIMEOUT_RESET(id, value, secs) \
cfs_fail_timeout_set(id, value, (secs) * 1000, CFS_FAIL_LOC_RESET)
#define CFS_FAIL_TIMEOUT_MS_ORSET(id, value, ms) \
cfs_fail_timeout_set(id, value, ms, CFS_FAIL_LOC_ORSET)
#define CFS_FAULT_CHECK(id) \
CFS_FAIL_CHECK(CFS_FAULT | (id))
/*
* The idea here is to synchronise two threads to force a race. The
* first thread that calls this with a matching fail_loc is put to
* sleep. The next thread that calls with the same fail_loc wakes up
* the first and continues.
*/
static inline void cfs_race(u32 id)
{
if (CFS_FAIL_PRECHECK(id)) {
if (unlikely(__cfs_fail_check_set(id, 0, CFS_FAIL_LOC_NOSET))) {
int rc;
cfs_race_state = 0;
CERROR("cfs_race id %x sleeping\n", id);
rc = wait_event_interruptible(cfs_race_waitq,
!!cfs_race_state);
CERROR("cfs_fail_race id %x awake, rc=%d\n", id, rc);
} else {
CERROR("cfs_fail_race id %x waking\n", id);
cfs_race_state = 1;
wake_up(&cfs_race_waitq);
}
}
}
#define CFS_RACE(id) cfs_race(id)
#endif /* _LIBCFS_FAIL_H */

View File

@ -1,869 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, 2015 Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* libcfs/include/libcfs/libcfs_hash.h
*
* Hashing routines
*
*/
#ifndef __LIBCFS_HASH_H__
#define __LIBCFS_HASH_H__
#include <linux/hash.h>
#include <linux/spinlock.h>
#include <linux/workqueue.h>
#include <linux/libcfs/libcfs.h>
/*
* Knuth recommends primes in approximately golden ratio to the maximum
* integer representable by a machine word for multiplicative hashing.
* Chuck Lever verified the effectiveness of this technique:
* http://www.citi.umich.edu/techreports/reports/citi-tr-00-1.pdf
*
* These primes are chosen to be bit-sparse, that is operations on
* them can use shifts and additions instead of multiplications for
* machines where multiplications are slow.
*/
/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
#define CFS_GOLDEN_RATIO_PRIME_32 0x9e370001UL
/* 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
#define CFS_GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001ULL
/** disable debug */
#define CFS_HASH_DEBUG_NONE 0
/*
* record hash depth and output to console when it's too deep,
* computing overhead is low but consume more memory
*/
#define CFS_HASH_DEBUG_1 1
/** expensive, check key validation */
#define CFS_HASH_DEBUG_2 2
#define CFS_HASH_DEBUG_LEVEL CFS_HASH_DEBUG_NONE
struct cfs_hash_ops;
struct cfs_hash_lock_ops;
struct cfs_hash_hlist_ops;
union cfs_hash_lock {
rwlock_t rw; /**< rwlock */
spinlock_t spin; /**< spinlock */
};
/**
* cfs_hash_bucket is a container of:
* - lock, counter ...
* - array of hash-head starting from hsb_head[0], hash-head can be one of
* . struct cfs_hash_head
* . struct cfs_hash_head_dep
* . struct cfs_hash_dhead
* . struct cfs_hash_dhead_dep
* which depends on requirement of user
* - some extra bytes (caller can require it while creating hash)
*/
struct cfs_hash_bucket {
union cfs_hash_lock hsb_lock; /**< bucket lock */
u32 hsb_count; /**< current entries */
u32 hsb_version; /**< change version */
unsigned int hsb_index; /**< index of bucket */
int hsb_depmax; /**< max depth on bucket */
long hsb_head[0]; /**< hash-head array */
};
/**
* cfs_hash bucket descriptor, it's normally in stack of caller
*/
struct cfs_hash_bd {
/* address of bucket */
struct cfs_hash_bucket *bd_bucket;
/* offset in bucket */
unsigned int bd_offset;
};
#define CFS_HASH_NAME_LEN 16 /**< default name length */
#define CFS_HASH_BIGNAME_LEN 64 /**< bigname for param tree */
#define CFS_HASH_BKT_BITS 3 /**< default bits of bucket */
#define CFS_HASH_BITS_MAX 30 /**< max bits of bucket */
#define CFS_HASH_BITS_MIN CFS_HASH_BKT_BITS
/**
* common hash attributes.
*/
enum cfs_hash_tag {
/**
* don't need any lock, caller will protect operations with it's
* own lock. With this flag:
* . CFS_HASH_NO_BKTLOCK, CFS_HASH_RW_BKTLOCK, CFS_HASH_SPIN_BKTLOCK
* will be ignored.
* . Some functions will be disabled with this flag, i.e:
* cfs_hash_for_each_empty, cfs_hash_rehash
*/
CFS_HASH_NO_LOCK = BIT(0),
/** no bucket lock, use one spinlock to protect the whole hash */
CFS_HASH_NO_BKTLOCK = BIT(1),
/** rwlock to protect bucket */
CFS_HASH_RW_BKTLOCK = BIT(2),
/** spinlock to protect bucket */
CFS_HASH_SPIN_BKTLOCK = BIT(3),
/** always add new item to tail */
CFS_HASH_ADD_TAIL = BIT(4),
/** hash-table doesn't have refcount on item */
CFS_HASH_NO_ITEMREF = BIT(5),
/** big name for param-tree */
CFS_HASH_BIGNAME = BIT(6),
/** track global count */
CFS_HASH_COUNTER = BIT(7),
/** rehash item by new key */
CFS_HASH_REHASH_KEY = BIT(8),
/** Enable dynamic hash resizing */
CFS_HASH_REHASH = BIT(9),
/** can shrink hash-size */
CFS_HASH_SHRINK = BIT(10),
/** assert hash is empty on exit */
CFS_HASH_ASSERT_EMPTY = BIT(11),
/** record hlist depth */
CFS_HASH_DEPTH = BIT(12),
/**
* rehash is always scheduled in a different thread, so current
* change on hash table is non-blocking
*/
CFS_HASH_NBLK_CHANGE = BIT(13),
/**
* NB, we typed hs_flags as u16, please change it
* if you need to extend >=16 flags
*/
};
/** most used attributes */
#define CFS_HASH_DEFAULT (CFS_HASH_RW_BKTLOCK | \
CFS_HASH_COUNTER | CFS_HASH_REHASH)
/**
* cfs_hash is a hash-table implementation for general purpose, it can support:
* . two refcount modes
* hash-table with & without refcount
* . four lock modes
* nolock, one-spinlock, rw-bucket-lock, spin-bucket-lock
* . general operations
* lookup, add(add_tail or add_head), delete
* . rehash
* grows or shrink
* . iteration
* locked iteration and unlocked iteration
* . bigname
* support long name hash
* . debug
* trace max searching depth
*
* Rehash:
* When the htable grows or shrinks, a separate task (cfs_hash_rehash_worker)
* is spawned to handle the rehash in the background, it's possible that other
* processes can concurrently perform additions, deletions, and lookups
* without being blocked on rehash completion, because rehash will release
* the global wrlock for each bucket.
*
* rehash and iteration can't run at the same time because it's too tricky
* to keep both of them safe and correct.
* As they are relatively rare operations, so:
* . if iteration is in progress while we try to launch rehash, then
* it just giveup, iterator will launch rehash at the end.
* . if rehash is in progress while we try to iterate the hash table,
* then we just wait (shouldn't be very long time), anyway, nobody
* should expect iteration of whole hash-table to be non-blocking.
*
* During rehashing, a (key,object) pair may be in one of two buckets,
* depending on whether the worker task has yet to transfer the object
* to its new location in the table. Lookups and deletions need to search both
* locations; additions must take care to only insert into the new bucket.
*/
struct cfs_hash {
/**
* serialize with rehash, or serialize all operations if
* the hash-table has CFS_HASH_NO_BKTLOCK
*/
union cfs_hash_lock hs_lock;
/** hash operations */
struct cfs_hash_ops *hs_ops;
/** hash lock operations */
struct cfs_hash_lock_ops *hs_lops;
/** hash list operations */
struct cfs_hash_hlist_ops *hs_hops;
/** hash buckets-table */
struct cfs_hash_bucket **hs_buckets;
/** total number of items on this hash-table */
atomic_t hs_count;
/** hash flags, see cfs_hash_tag for detail */
u16 hs_flags;
/** # of extra-bytes for bucket, for user saving extended attributes */
u16 hs_extra_bytes;
/** wants to iterate */
u8 hs_iterating;
/** hash-table is dying */
u8 hs_exiting;
/** current hash bits */
u8 hs_cur_bits;
/** min hash bits */
u8 hs_min_bits;
/** max hash bits */
u8 hs_max_bits;
/** bits for rehash */
u8 hs_rehash_bits;
/** bits for each bucket */
u8 hs_bkt_bits;
/** resize min threshold */
u16 hs_min_theta;
/** resize max threshold */
u16 hs_max_theta;
/** resize count */
u32 hs_rehash_count;
/** # of iterators (caller of cfs_hash_for_each_*) */
u32 hs_iterators;
/** rehash workitem */
struct work_struct hs_rehash_work;
/** refcount on this hash table */
atomic_t hs_refcount;
/** rehash buckets-table */
struct cfs_hash_bucket **hs_rehash_buckets;
#if CFS_HASH_DEBUG_LEVEL >= CFS_HASH_DEBUG_1
/** serialize debug members */
spinlock_t hs_dep_lock;
/** max depth */
unsigned int hs_dep_max;
/** id of the deepest bucket */
unsigned int hs_dep_bkt;
/** offset in the deepest bucket */
unsigned int hs_dep_off;
/** bits when we found the max depth */
unsigned int hs_dep_bits;
/** workitem to output max depth */
struct work_struct hs_dep_work;
#endif
/** name of htable */
char hs_name[0];
};
struct cfs_hash_lock_ops {
/** lock the hash table */
void (*hs_lock)(union cfs_hash_lock *lock, int exclusive);
/** unlock the hash table */
void (*hs_unlock)(union cfs_hash_lock *lock, int exclusive);
/** lock the hash bucket */
void (*hs_bkt_lock)(union cfs_hash_lock *lock, int exclusive);
/** unlock the hash bucket */
void (*hs_bkt_unlock)(union cfs_hash_lock *lock, int exclusive);
};
struct cfs_hash_hlist_ops {
/** return hlist_head of hash-head of @bd */
struct hlist_head *(*hop_hhead)(struct cfs_hash *hs,
struct cfs_hash_bd *bd);
/** return hash-head size */
int (*hop_hhead_size)(struct cfs_hash *hs);
/** add @hnode to hash-head of @bd */
int (*hop_hnode_add)(struct cfs_hash *hs, struct cfs_hash_bd *bd,
struct hlist_node *hnode);
/** remove @hnode from hash-head of @bd */
int (*hop_hnode_del)(struct cfs_hash *hs, struct cfs_hash_bd *bd,
struct hlist_node *hnode);
};
struct cfs_hash_ops {
/** return hashed value from @key */
unsigned int (*hs_hash)(struct cfs_hash *hs, const void *key,
unsigned int mask);
/** return key address of @hnode */
void * (*hs_key)(struct hlist_node *hnode);
/** copy key from @hnode to @key */
void (*hs_keycpy)(struct hlist_node *hnode, void *key);
/**
* compare @key with key of @hnode
* returns 1 on a match
*/
int (*hs_keycmp)(const void *key, struct hlist_node *hnode);
/** return object address of @hnode, i.e: container_of(...hnode) */
void * (*hs_object)(struct hlist_node *hnode);
/** get refcount of item, always called with holding bucket-lock */
void (*hs_get)(struct cfs_hash *hs, struct hlist_node *hnode);
/** release refcount of item */
void (*hs_put)(struct cfs_hash *hs, struct hlist_node *hnode);
/** release refcount of item, always called with holding bucket-lock */
void (*hs_put_locked)(struct cfs_hash *hs,
struct hlist_node *hnode);
/** it's called before removing of @hnode */
void (*hs_exit)(struct cfs_hash *hs, struct hlist_node *hnode);
};
/** total number of buckets in @hs */
#define CFS_HASH_NBKT(hs) \
BIT((hs)->hs_cur_bits - (hs)->hs_bkt_bits)
/** total number of buckets in @hs while rehashing */
#define CFS_HASH_RH_NBKT(hs) \
BIT((hs)->hs_rehash_bits - (hs)->hs_bkt_bits)
/** number of hlist for in bucket */
#define CFS_HASH_BKT_NHLIST(hs) BIT((hs)->hs_bkt_bits)
/** total number of hlist in @hs */
#define CFS_HASH_NHLIST(hs) BIT((hs)->hs_cur_bits)
/** total number of hlist in @hs while rehashing */
#define CFS_HASH_RH_NHLIST(hs) BIT((hs)->hs_rehash_bits)
static inline int
cfs_hash_with_no_lock(struct cfs_hash *hs)
{
/* caller will serialize all operations for this hash-table */
return hs->hs_flags & CFS_HASH_NO_LOCK;
}
static inline int
cfs_hash_with_no_bktlock(struct cfs_hash *hs)
{
/* no bucket lock, one single lock to protect the hash-table */
return hs->hs_flags & CFS_HASH_NO_BKTLOCK;
}
static inline int
cfs_hash_with_rw_bktlock(struct cfs_hash *hs)
{
/* rwlock to protect hash bucket */
return hs->hs_flags & CFS_HASH_RW_BKTLOCK;
}
static inline int
cfs_hash_with_spin_bktlock(struct cfs_hash *hs)
{
/* spinlock to protect hash bucket */
return hs->hs_flags & CFS_HASH_SPIN_BKTLOCK;
}
static inline int
cfs_hash_with_add_tail(struct cfs_hash *hs)
{
return hs->hs_flags & CFS_HASH_ADD_TAIL;
}
static inline int
cfs_hash_with_no_itemref(struct cfs_hash *hs)
{
/*
* hash-table doesn't keep refcount on item,
* item can't be removed from hash unless it's
* ZERO refcount
*/
return hs->hs_flags & CFS_HASH_NO_ITEMREF;
}
static inline int
cfs_hash_with_bigname(struct cfs_hash *hs)
{
return hs->hs_flags & CFS_HASH_BIGNAME;
}
static inline int
cfs_hash_with_counter(struct cfs_hash *hs)
{
return hs->hs_flags & CFS_HASH_COUNTER;
}
static inline int
cfs_hash_with_rehash(struct cfs_hash *hs)
{
return hs->hs_flags & CFS_HASH_REHASH;
}
static inline int
cfs_hash_with_rehash_key(struct cfs_hash *hs)
{
return hs->hs_flags & CFS_HASH_REHASH_KEY;
}
static inline int
cfs_hash_with_shrink(struct cfs_hash *hs)
{
return hs->hs_flags & CFS_HASH_SHRINK;
}
static inline int
cfs_hash_with_assert_empty(struct cfs_hash *hs)
{
return hs->hs_flags & CFS_HASH_ASSERT_EMPTY;
}
static inline int
cfs_hash_with_depth(struct cfs_hash *hs)
{
return hs->hs_flags & CFS_HASH_DEPTH;
}
static inline int
cfs_hash_with_nblk_change(struct cfs_hash *hs)
{
return hs->hs_flags & CFS_HASH_NBLK_CHANGE;
}
static inline int
cfs_hash_is_exiting(struct cfs_hash *hs)
{
/* cfs_hash_destroy is called */
return hs->hs_exiting;
}
static inline int
cfs_hash_is_rehashing(struct cfs_hash *hs)
{
/* rehash is launched */
return !!hs->hs_rehash_bits;
}
static inline int
cfs_hash_is_iterating(struct cfs_hash *hs)
{
/* someone is calling cfs_hash_for_each_* */
return hs->hs_iterating || hs->hs_iterators;
}
static inline int
cfs_hash_bkt_size(struct cfs_hash *hs)
{
return offsetof(struct cfs_hash_bucket, hsb_head[0]) +
hs->hs_hops->hop_hhead_size(hs) * CFS_HASH_BKT_NHLIST(hs) +
hs->hs_extra_bytes;
}
static inline unsigned
cfs_hash_id(struct cfs_hash *hs, const void *key, unsigned int mask)
{
return hs->hs_ops->hs_hash(hs, key, mask);
}
static inline void *
cfs_hash_key(struct cfs_hash *hs, struct hlist_node *hnode)
{
return hs->hs_ops->hs_key(hnode);
}
static inline void
cfs_hash_keycpy(struct cfs_hash *hs, struct hlist_node *hnode, void *key)
{
if (hs->hs_ops->hs_keycpy)
hs->hs_ops->hs_keycpy(hnode, key);
}
/**
* Returns 1 on a match,
*/
static inline int
cfs_hash_keycmp(struct cfs_hash *hs, const void *key, struct hlist_node *hnode)
{
return hs->hs_ops->hs_keycmp(key, hnode);
}
static inline void *
cfs_hash_object(struct cfs_hash *hs, struct hlist_node *hnode)
{
return hs->hs_ops->hs_object(hnode);
}
static inline void
cfs_hash_get(struct cfs_hash *hs, struct hlist_node *hnode)
{
return hs->hs_ops->hs_get(hs, hnode);
}
static inline void
cfs_hash_put_locked(struct cfs_hash *hs, struct hlist_node *hnode)
{
return hs->hs_ops->hs_put_locked(hs, hnode);
}
static inline void
cfs_hash_put(struct cfs_hash *hs, struct hlist_node *hnode)
{
return hs->hs_ops->hs_put(hs, hnode);
}
static inline void
cfs_hash_exit(struct cfs_hash *hs, struct hlist_node *hnode)
{
if (hs->hs_ops->hs_exit)
hs->hs_ops->hs_exit(hs, hnode);
}
static inline void cfs_hash_lock(struct cfs_hash *hs, int excl)
{
hs->hs_lops->hs_lock(&hs->hs_lock, excl);
}
static inline void cfs_hash_unlock(struct cfs_hash *hs, int excl)
{
hs->hs_lops->hs_unlock(&hs->hs_lock, excl);
}
static inline int cfs_hash_dec_and_lock(struct cfs_hash *hs,
atomic_t *condition)
{
LASSERT(cfs_hash_with_no_bktlock(hs));
return atomic_dec_and_lock(condition, &hs->hs_lock.spin);
}
static inline void cfs_hash_bd_lock(struct cfs_hash *hs,
struct cfs_hash_bd *bd, int excl)
{
hs->hs_lops->hs_bkt_lock(&bd->bd_bucket->hsb_lock, excl);
}
static inline void cfs_hash_bd_unlock(struct cfs_hash *hs,
struct cfs_hash_bd *bd, int excl)
{
hs->hs_lops->hs_bkt_unlock(&bd->bd_bucket->hsb_lock, excl);
}
/**
* operations on cfs_hash bucket (bd: bucket descriptor),
* they are normally for hash-table without rehash
*/
void cfs_hash_bd_get(struct cfs_hash *hs, const void *key,
struct cfs_hash_bd *bd);
static inline void
cfs_hash_bd_get_and_lock(struct cfs_hash *hs, const void *key,
struct cfs_hash_bd *bd, int excl)
{
cfs_hash_bd_get(hs, key, bd);
cfs_hash_bd_lock(hs, bd, excl);
}
static inline unsigned
cfs_hash_bd_index_get(struct cfs_hash *hs, struct cfs_hash_bd *bd)
{
return bd->bd_offset | (bd->bd_bucket->hsb_index << hs->hs_bkt_bits);
}
static inline void
cfs_hash_bd_index_set(struct cfs_hash *hs, unsigned int index,
struct cfs_hash_bd *bd)
{
bd->bd_bucket = hs->hs_buckets[index >> hs->hs_bkt_bits];
bd->bd_offset = index & (CFS_HASH_BKT_NHLIST(hs) - 1U);
}
static inline void *
cfs_hash_bd_extra_get(struct cfs_hash *hs, struct cfs_hash_bd *bd)
{
return (void *)bd->bd_bucket +
cfs_hash_bkt_size(hs) - hs->hs_extra_bytes;
}
static inline u32
cfs_hash_bd_version_get(struct cfs_hash_bd *bd)
{
/* need hold cfs_hash_bd_lock */
return bd->bd_bucket->hsb_version;
}
static inline u32
cfs_hash_bd_count_get(struct cfs_hash_bd *bd)
{
/* need hold cfs_hash_bd_lock */
return bd->bd_bucket->hsb_count;
}
static inline int
cfs_hash_bd_depmax_get(struct cfs_hash_bd *bd)
{
return bd->bd_bucket->hsb_depmax;
}
static inline int
cfs_hash_bd_compare(struct cfs_hash_bd *bd1, struct cfs_hash_bd *bd2)
{
if (bd1->bd_bucket->hsb_index != bd2->bd_bucket->hsb_index)
return bd1->bd_bucket->hsb_index - bd2->bd_bucket->hsb_index;
if (bd1->bd_offset != bd2->bd_offset)
return bd1->bd_offset - bd2->bd_offset;
return 0;
}
void cfs_hash_bd_add_locked(struct cfs_hash *hs, struct cfs_hash_bd *bd,
struct hlist_node *hnode);
void cfs_hash_bd_del_locked(struct cfs_hash *hs, struct cfs_hash_bd *bd,
struct hlist_node *hnode);
void cfs_hash_bd_move_locked(struct cfs_hash *hs, struct cfs_hash_bd *bd_old,
struct cfs_hash_bd *bd_new,
struct hlist_node *hnode);
static inline int
cfs_hash_bd_dec_and_lock(struct cfs_hash *hs, struct cfs_hash_bd *bd,
atomic_t *condition)
{
LASSERT(cfs_hash_with_spin_bktlock(hs));
return atomic_dec_and_lock(condition, &bd->bd_bucket->hsb_lock.spin);
}
static inline struct hlist_head *
cfs_hash_bd_hhead(struct cfs_hash *hs, struct cfs_hash_bd *bd)
{
return hs->hs_hops->hop_hhead(hs, bd);
}
struct hlist_node *
cfs_hash_bd_lookup_locked(struct cfs_hash *hs, struct cfs_hash_bd *bd,
const void *key);
struct hlist_node *
cfs_hash_bd_peek_locked(struct cfs_hash *hs, struct cfs_hash_bd *bd,
const void *key);
/**
* operations on cfs_hash bucket (bd: bucket descriptor),
* they are safe for hash-table with rehash
*/
void cfs_hash_dual_bd_get(struct cfs_hash *hs, const void *key,
struct cfs_hash_bd *bds);
void cfs_hash_dual_bd_lock(struct cfs_hash *hs, struct cfs_hash_bd *bds,
int excl);
void cfs_hash_dual_bd_unlock(struct cfs_hash *hs, struct cfs_hash_bd *bds,
int excl);
static inline void
cfs_hash_dual_bd_get_and_lock(struct cfs_hash *hs, const void *key,
struct cfs_hash_bd *bds, int excl)
{
cfs_hash_dual_bd_get(hs, key, bds);
cfs_hash_dual_bd_lock(hs, bds, excl);
}
struct hlist_node *
cfs_hash_dual_bd_lookup_locked(struct cfs_hash *hs, struct cfs_hash_bd *bds,
const void *key);
struct hlist_node *
cfs_hash_dual_bd_findadd_locked(struct cfs_hash *hs, struct cfs_hash_bd *bds,
const void *key, struct hlist_node *hnode,
int insist_add);
struct hlist_node *
cfs_hash_dual_bd_finddel_locked(struct cfs_hash *hs, struct cfs_hash_bd *bds,
const void *key, struct hlist_node *hnode);
/* Hash init/cleanup functions */
struct cfs_hash *
cfs_hash_create(char *name, unsigned int cur_bits, unsigned int max_bits,
unsigned int bkt_bits, unsigned int extra_bytes,
unsigned int min_theta, unsigned int max_theta,
struct cfs_hash_ops *ops, unsigned int flags);
struct cfs_hash *cfs_hash_getref(struct cfs_hash *hs);
void cfs_hash_putref(struct cfs_hash *hs);
/* Hash addition functions */
void cfs_hash_add(struct cfs_hash *hs, const void *key,
struct hlist_node *hnode);
int cfs_hash_add_unique(struct cfs_hash *hs, const void *key,
struct hlist_node *hnode);
void *cfs_hash_findadd_unique(struct cfs_hash *hs, const void *key,
struct hlist_node *hnode);
/* Hash deletion functions */
void *cfs_hash_del(struct cfs_hash *hs, const void *key,
struct hlist_node *hnode);
void *cfs_hash_del_key(struct cfs_hash *hs, const void *key);
/* Hash lookup/for_each functions */
#define CFS_HASH_LOOP_HOG 1024
typedef int (*cfs_hash_for_each_cb_t)(struct cfs_hash *hs,
struct cfs_hash_bd *bd,
struct hlist_node *node,
void *data);
void *
cfs_hash_lookup(struct cfs_hash *hs, const void *key);
void
cfs_hash_for_each(struct cfs_hash *hs, cfs_hash_for_each_cb_t cb, void *data);
void
cfs_hash_for_each_safe(struct cfs_hash *hs, cfs_hash_for_each_cb_t cb,
void *data);
int
cfs_hash_for_each_nolock(struct cfs_hash *hs, cfs_hash_for_each_cb_t cb,
void *data, int start);
int
cfs_hash_for_each_empty(struct cfs_hash *hs, cfs_hash_for_each_cb_t cb,
void *data);
void
cfs_hash_for_each_key(struct cfs_hash *hs, const void *key,
cfs_hash_for_each_cb_t cb, void *data);
typedef int (*cfs_hash_cond_opt_cb_t)(void *obj, void *data);
void
cfs_hash_cond_del(struct cfs_hash *hs, cfs_hash_cond_opt_cb_t cb, void *data);
void
cfs_hash_hlist_for_each(struct cfs_hash *hs, unsigned int hindex,
cfs_hash_for_each_cb_t cb, void *data);
int cfs_hash_is_empty(struct cfs_hash *hs);
u64 cfs_hash_size_get(struct cfs_hash *hs);
/*
* Rehash - Theta is calculated to be the average chained
* hash depth assuming a perfectly uniform hash function.
*/
void cfs_hash_rehash_cancel_locked(struct cfs_hash *hs);
void cfs_hash_rehash_cancel(struct cfs_hash *hs);
void cfs_hash_rehash(struct cfs_hash *hs, int do_rehash);
void cfs_hash_rehash_key(struct cfs_hash *hs, const void *old_key,
void *new_key, struct hlist_node *hnode);
#if CFS_HASH_DEBUG_LEVEL > CFS_HASH_DEBUG_1
/* Validate hnode references the correct key */
static inline void
cfs_hash_key_validate(struct cfs_hash *hs, const void *key,
struct hlist_node *hnode)
{
LASSERT(cfs_hash_keycmp(hs, key, hnode));
}
/* Validate hnode is in the correct bucket */
static inline void
cfs_hash_bucket_validate(struct cfs_hash *hs, struct cfs_hash_bd *bd,
struct hlist_node *hnode)
{
struct cfs_hash_bd bds[2];
cfs_hash_dual_bd_get(hs, cfs_hash_key(hs, hnode), bds);
LASSERT(bds[0].bd_bucket == bd->bd_bucket ||
bds[1].bd_bucket == bd->bd_bucket);
}
#else /* CFS_HASH_DEBUG_LEVEL > CFS_HASH_DEBUG_1 */
static inline void
cfs_hash_key_validate(struct cfs_hash *hs, const void *key,
struct hlist_node *hnode) {}
static inline void
cfs_hash_bucket_validate(struct cfs_hash *hs, struct cfs_hash_bd *bd,
struct hlist_node *hnode) {}
#endif /* CFS_HASH_DEBUG_LEVEL */
#define CFS_HASH_THETA_BITS 10
#define CFS_HASH_MIN_THETA BIT(CFS_HASH_THETA_BITS - 1)
#define CFS_HASH_MAX_THETA BIT(CFS_HASH_THETA_BITS + 1)
/* Return integer component of theta */
static inline int __cfs_hash_theta_int(int theta)
{
return (theta >> CFS_HASH_THETA_BITS);
}
/* Return a fractional value between 0 and 999 */
static inline int __cfs_hash_theta_frac(int theta)
{
return ((theta * 1000) >> CFS_HASH_THETA_BITS) -
(__cfs_hash_theta_int(theta) * 1000);
}
static inline int __cfs_hash_theta(struct cfs_hash *hs)
{
return (atomic_read(&hs->hs_count) <<
CFS_HASH_THETA_BITS) >> hs->hs_cur_bits;
}
static inline void
__cfs_hash_set_theta(struct cfs_hash *hs, int min, int max)
{
LASSERT(min < max);
hs->hs_min_theta = (u16)min;
hs->hs_max_theta = (u16)max;
}
/* Generic debug formatting routines mainly for proc handler */
struct seq_file;
void cfs_hash_debug_header(struct seq_file *m);
void cfs_hash_debug_str(struct cfs_hash *hs, struct seq_file *m);
/*
* Generic djb2 hash algorithm for character arrays.
*/
static inline unsigned
cfs_hash_djb2_hash(const void *key, size_t size, unsigned int mask)
{
unsigned int i, hash = 5381;
LASSERT(key);
for (i = 0; i < size; i++)
hash = hash * 33 + ((char *)key)[i];
return (hash & mask);
}
/*
* Generic u32 hash algorithm.
*/
static inline unsigned
cfs_hash_u32_hash(const u32 key, unsigned int mask)
{
return ((key * CFS_GOLDEN_RATIO_PRIME_32) & mask);
}
/*
* Generic u64 hash algorithm.
*/
static inline unsigned
cfs_hash_u64_hash(const u64 key, unsigned int mask)
{
return ((unsigned int)(key * CFS_GOLDEN_RATIO_PRIME_64) & mask);
}
/** iterate over all buckets in @bds (array of struct cfs_hash_bd) */
#define cfs_hash_for_each_bd(bds, n, i) \
for (i = 0; i < n && (bds)[i].bd_bucket != NULL; i++)
/** iterate over all buckets of @hs */
#define cfs_hash_for_each_bucket(hs, bd, pos) \
for (pos = 0; \
pos < CFS_HASH_NBKT(hs) && \
((bd)->bd_bucket = (hs)->hs_buckets[pos]) != NULL; pos++)
/** iterate over all hlist of bucket @bd */
#define cfs_hash_bd_for_each_hlist(hs, bd, hlist) \
for ((bd)->bd_offset = 0; \
(bd)->bd_offset < CFS_HASH_BKT_NHLIST(hs) && \
(hlist = cfs_hash_bd_hhead(hs, bd)) != NULL; \
(bd)->bd_offset++)
/* !__LIBCFS__HASH_H__ */
#endif

View File

@ -1,200 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* libcfs/include/libcfs/libcfs_private.h
*
* Various defines for libcfs.
*
*/
#ifndef __LIBCFS_PRIVATE_H__
#define __LIBCFS_PRIVATE_H__
#ifndef DEBUG_SUBSYSTEM
# define DEBUG_SUBSYSTEM S_UNDEFINED
#endif
#define LASSERTF(cond, fmt, ...) \
do { \
if (unlikely(!(cond))) { \
LIBCFS_DEBUG_MSG_DATA_DECL(__msg_data, D_EMERG, NULL); \
libcfs_debug_msg(&__msg_data, \
"ASSERTION( %s ) failed: " fmt, #cond, \
## __VA_ARGS__); \
lbug_with_loc(&__msg_data); \
} \
} while (0)
#define LASSERT(cond) LASSERTF(cond, "\n")
#ifdef CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK
/**
* This is for more expensive checks that one doesn't want to be enabled all
* the time. LINVRNT() has to be explicitly enabled by
* CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK option.
*/
# define LINVRNT(exp) LASSERT(exp)
#else
# define LINVRNT(exp) ((void)sizeof !!(exp))
#endif
void __noreturn lbug_with_loc(struct libcfs_debug_msg_data *msg);
#define LBUG() \
do { \
LIBCFS_DEBUG_MSG_DATA_DECL(msgdata, D_EMERG, NULL); \
lbug_with_loc(&msgdata); \
} while (0)
/*
* Use #define rather than inline, as lnet_cpt_table() might
* not be defined yet
*/
#define kmalloc_cpt(size, flags, cpt) \
kmalloc_node(size, flags, cfs_cpt_spread_node(lnet_cpt_table(), cpt))
#define kzalloc_cpt(size, flags, cpt) \
kmalloc_node(size, flags | __GFP_ZERO, \
cfs_cpt_spread_node(lnet_cpt_table(), cpt))
#define kvmalloc_cpt(size, flags, cpt) \
kvmalloc_node(size, flags, \
cfs_cpt_spread_node(lnet_cpt_table(), cpt))
#define kvzalloc_cpt(size, flags, cpt) \
kvmalloc_node(size, flags | __GFP_ZERO, \
cfs_cpt_spread_node(lnet_cpt_table(), cpt))
/******************************************************************************/
void libcfs_debug_dumplog(void);
int libcfs_debug_init(unsigned long bufsize);
int libcfs_debug_cleanup(void);
int libcfs_debug_clear_buffer(void);
int libcfs_debug_mark_buffer(const char *text);
/*
* allocate a variable array, returned value is an array of pointers.
* Caller can specify length of array by count.
*/
void *cfs_array_alloc(int count, unsigned int size);
void cfs_array_free(void *vars);
#define LASSERT_ATOMIC_ENABLED (1)
#if LASSERT_ATOMIC_ENABLED
/** assert value of @a is equal to @v */
#define LASSERT_ATOMIC_EQ(a, v) \
LASSERTF(atomic_read(a) == v, "value: %d\n", atomic_read((a)))
/** assert value of @a is unequal to @v */
#define LASSERT_ATOMIC_NE(a, v) \
LASSERTF(atomic_read(a) != v, "value: %d\n", atomic_read((a)))
/** assert value of @a is little than @v */
#define LASSERT_ATOMIC_LT(a, v) \
LASSERTF(atomic_read(a) < v, "value: %d\n", atomic_read((a)))
/** assert value of @a is little/equal to @v */
#define LASSERT_ATOMIC_LE(a, v) \
LASSERTF(atomic_read(a) <= v, "value: %d\n", atomic_read((a)))
/** assert value of @a is great than @v */
#define LASSERT_ATOMIC_GT(a, v) \
LASSERTF(atomic_read(a) > v, "value: %d\n", atomic_read((a)))
/** assert value of @a is great/equal to @v */
#define LASSERT_ATOMIC_GE(a, v) \
LASSERTF(atomic_read(a) >= v, "value: %d\n", atomic_read((a)))
/** assert value of @a is great than @v1 and little than @v2 */
#define LASSERT_ATOMIC_GT_LT(a, v1, v2) \
do { \
int __v = atomic_read(a); \
LASSERTF(__v > v1 && __v < v2, "value: %d\n", __v); \
} while (0)
/** assert value of @a is great than @v1 and little/equal to @v2 */
#define LASSERT_ATOMIC_GT_LE(a, v1, v2) \
do { \
int __v = atomic_read(a); \
LASSERTF(__v > v1 && __v <= v2, "value: %d\n", __v); \
} while (0)
/** assert value of @a is great/equal to @v1 and little than @v2 */
#define LASSERT_ATOMIC_GE_LT(a, v1, v2) \
do { \
int __v = atomic_read(a); \
LASSERTF(__v >= v1 && __v < v2, "value: %d\n", __v); \
} while (0)
/** assert value of @a is great/equal to @v1 and little/equal to @v2 */
#define LASSERT_ATOMIC_GE_LE(a, v1, v2) \
do { \
int __v = atomic_read(a); \
LASSERTF(__v >= v1 && __v <= v2, "value: %d\n", __v); \
} while (0)
#else /* !LASSERT_ATOMIC_ENABLED */
#define LASSERT_ATOMIC_EQ(a, v) do {} while (0)
#define LASSERT_ATOMIC_NE(a, v) do {} while (0)
#define LASSERT_ATOMIC_LT(a, v) do {} while (0)
#define LASSERT_ATOMIC_LE(a, v) do {} while (0)
#define LASSERT_ATOMIC_GT(a, v) do {} while (0)
#define LASSERT_ATOMIC_GE(a, v) do {} while (0)
#define LASSERT_ATOMIC_GT_LT(a, v1, v2) do {} while (0)
#define LASSERT_ATOMIC_GT_LE(a, v1, v2) do {} while (0)
#define LASSERT_ATOMIC_GE_LT(a, v1, v2) do {} while (0)
#define LASSERT_ATOMIC_GE_LE(a, v1, v2) do {} while (0)
#endif /* LASSERT_ATOMIC_ENABLED */
#define LASSERT_ATOMIC_ZERO(a) LASSERT_ATOMIC_EQ(a, 0)
#define LASSERT_ATOMIC_POS(a) LASSERT_ATOMIC_GT(a, 0)
/* implication */
#define ergo(a, b) (!(a) || (b))
/* logical equivalence */
#define equi(a, b) (!!(a) == !!(b))
#ifndef HAVE_CFS_SIZE_ROUND
static inline size_t cfs_size_round(int val)
{
return round_up(val, 8);
}
#define HAVE_CFS_SIZE_ROUND
#endif
#endif

View File

@ -1,102 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* libcfs/include/libcfs/libcfs_string.h
*
* Generic string manipulation functions.
*
* Author: Nathan Rutman <nathan.rutman@sun.com>
*/
#ifndef __LIBCFS_STRING_H__
#define __LIBCFS_STRING_H__
#include <linux/mm.h>
/* libcfs_string.c */
/* Convert a text string to a bitmask */
int cfs_str2mask(const char *str, const char *(*bit2str)(int bit),
int *oldmask, int minmask, int allmask);
/* trim leading and trailing space characters */
char *cfs_firststr(char *str, size_t size);
/**
* Structure to represent NULL-less strings.
*/
struct cfs_lstr {
char *ls_str;
int ls_len;
};
/*
* Structure to represent \<range_expr\> token of the syntax.
*/
struct cfs_range_expr {
/*
* Link to cfs_expr_list::el_exprs.
*/
struct list_head re_link;
u32 re_lo;
u32 re_hi;
u32 re_stride;
};
struct cfs_expr_list {
struct list_head el_link;
struct list_head el_exprs;
};
int cfs_gettok(struct cfs_lstr *next, char delim, struct cfs_lstr *res);
int cfs_str2num_check(char *str, int nob, unsigned int *num,
unsigned int min, unsigned int max);
int cfs_expr_list_match(u32 value, struct cfs_expr_list *expr_list);
int cfs_expr_list_print(char *buffer, int count,
struct cfs_expr_list *expr_list);
int cfs_expr_list_values(struct cfs_expr_list *expr_list,
int max, u32 **values);
static inline void
cfs_expr_list_values_free(u32 *values, int num)
{
/*
* This array is allocated by kvalloc(), so it shouldn't be freed
* by OBD_FREE() if it's called by module other than libcfs & LNet,
* otherwise we will see fake memory leak
*/
kvfree(values);
}
void cfs_expr_list_free(struct cfs_expr_list *expr_list);
int cfs_expr_list_parse(char *str, int len, unsigned int min, unsigned int max,
struct cfs_expr_list **elpp);
void cfs_expr_list_free_list(struct list_head *list);
#endif

View File

@ -1,212 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011 - 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Seagate, Inc.
*/
#ifndef __LNET_API_H__
#define __LNET_API_H__
/** \defgroup lnet LNet
*
* The Lustre Networking subsystem.
*
* LNet is an asynchronous message-passing API, which provides an unreliable
* connectionless service that can't guarantee any order. It supports OFA IB,
* TCP/IP, and Cray Interconnects, and routes between heterogeneous networks.
*
* @{
*/
#include <uapi/linux/lnet/lnet-types.h>
/** \defgroup lnet_init_fini Initialization and cleanup
* The LNet must be properly initialized before any LNet calls can be made.
* @{
*/
int LNetNIInit(lnet_pid_t requested_pid);
int LNetNIFini(void);
/** @} lnet_init_fini */
/** \defgroup lnet_addr LNet addressing and basic types
*
* Addressing scheme and basic data types of LNet.
*
* The LNet API is memory-oriented, so LNet must be able to address not only
* end-points but also memory region within a process address space.
* An ::lnet_nid_t addresses an end-point. An ::lnet_pid_t identifies a process
* in a node. A portal represents an opening in the address space of a
* process. Match bits is criteria to identify a region of memory inside a
* portal, and offset specifies an offset within the memory region.
*
* LNet creates a table of portals for each process during initialization.
* This table has MAX_PORTALS entries and its size can't be dynamically
* changed. A portal stays empty until the owning process starts to add
* memory regions to it. A portal is sometimes called an index because
* it's an entry in the portals table of a process.
*
* \see LNetMEAttach
* @{
*/
int LNetGetId(unsigned int index, struct lnet_process_id *id);
int LNetDist(lnet_nid_t nid, lnet_nid_t *srcnid, __u32 *order);
/** @} lnet_addr */
/** \defgroup lnet_me Match entries
*
* A match entry (abbreviated as ME) describes a set of criteria to accept
* incoming requests.
*
* A portal is essentially a match list plus a set of attributes. A match
* list is a chain of MEs. Each ME includes a pointer to a memory descriptor
* and a set of match criteria. The match criteria can be used to reject
* incoming requests based on process ID or the match bits provided in the
* request. MEs can be dynamically inserted into a match list by LNetMEAttach()
* and LNetMEInsert(), and removed from its list by LNetMEUnlink().
* @{
*/
int LNetMEAttach(unsigned int portal,
struct lnet_process_id match_id_in,
__u64 match_bits_in,
__u64 ignore_bits_in,
enum lnet_unlink unlink_in,
enum lnet_ins_pos pos_in,
struct lnet_handle_me *handle_out);
int LNetMEInsert(struct lnet_handle_me current_in,
struct lnet_process_id match_id_in,
__u64 match_bits_in,
__u64 ignore_bits_in,
enum lnet_unlink unlink_in,
enum lnet_ins_pos position_in,
struct lnet_handle_me *handle_out);
int LNetMEUnlink(struct lnet_handle_me current_in);
/** @} lnet_me */
/** \defgroup lnet_md Memory descriptors
*
* A memory descriptor contains information about a region of a user's
* memory (either in kernel or user space) and optionally points to an
* event queue where information about the operations performed on the
* memory descriptor are recorded. Memory descriptor is abbreviated as
* MD and can be used interchangeably with the memory region it describes.
*
* The LNet API provides two operations to create MDs: LNetMDAttach()
* and LNetMDBind(); one operation to unlink and release the resources
* associated with a MD: LNetMDUnlink().
* @{
*/
int LNetMDAttach(struct lnet_handle_me current_in,
struct lnet_md md_in,
enum lnet_unlink unlink_in,
struct lnet_handle_md *md_handle_out);
int LNetMDBind(struct lnet_md md_in,
enum lnet_unlink unlink_in,
struct lnet_handle_md *md_handle_out);
int LNetMDUnlink(struct lnet_handle_md md_in);
/** @} lnet_md */
/** \defgroup lnet_eq Events and event queues
*
* Event queues (abbreviated as EQ) are used to log operations performed on
* local MDs. In particular, they signal the completion of a data transmission
* into or out of a MD. They can also be used to hold acknowledgments for
* completed PUT operations and indicate when a MD has been unlinked. Multiple
* MDs can share a single EQ. An EQ may have an optional event handler
* associated with it. If an event handler exists, it will be run for each
* event that is deposited into the EQ.
*
* In addition to the lnet_handle_eq, the LNet API defines two types
* associated with events: The ::lnet_event_kind defines the kinds of events
* that can be stored in an EQ. The lnet_event defines a structure that
* holds the information about with an event.
*
* There are five functions for dealing with EQs: LNetEQAlloc() is used to
* create an EQ and allocate the resources needed, while LNetEQFree()
* releases these resources and free the EQ. LNetEQGet() retrieves the next
* event from an EQ, and LNetEQWait() can be used to block a process until
* an EQ has at least one event. LNetEQPoll() can be used to test or wait
* on multiple EQs.
* @{
*/
int LNetEQAlloc(unsigned int count_in,
lnet_eq_handler_t handler,
struct lnet_handle_eq *handle_out);
int LNetEQFree(struct lnet_handle_eq eventq_in);
int LNetEQPoll(struct lnet_handle_eq *eventqs_in,
int neq_in,
int timeout_ms,
int interruptible,
struct lnet_event *event_out,
int *which_eq_out);
/** @} lnet_eq */
/** \defgroup lnet_data Data movement operations
*
* The LNet API provides two data movement operations: LNetPut()
* and LNetGet().
* @{
*/
int LNetPut(lnet_nid_t self,
struct lnet_handle_md md_in,
enum lnet_ack_req ack_req_in,
struct lnet_process_id target_in,
unsigned int portal_in,
__u64 match_bits_in,
unsigned int offset_in,
__u64 hdr_data_in);
int LNetGet(lnet_nid_t self,
struct lnet_handle_md md_in,
struct lnet_process_id target_in,
unsigned int portal_in,
__u64 match_bits_in,
unsigned int offset_in);
/** @} lnet_data */
/** \defgroup lnet_misc Miscellaneous operations.
* Miscellaneous operations.
* @{
*/
int LNetSetLazyPortal(int portal);
int LNetClearLazyPortal(int portal);
int LNetCtl(unsigned int cmd, void *arg);
void LNetDebugPeer(struct lnet_process_id id);
/** @} lnet_misc */
/** @} lnet */
#endif

View File

@ -1,652 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Seagate, Inc.
*
* lnet/include/lnet/lib-lnet.h
*/
#ifndef __LNET_LIB_LNET_H__
#define __LNET_LIB_LNET_H__
#include <linux/libcfs/libcfs.h>
#include <linux/libcfs/libcfs_cpu.h>
#include <linux/libcfs/libcfs_string.h>
#include <net/sock.h>
#include <linux/lnet/api.h>
#include <linux/lnet/lib-types.h>
#include <uapi/linux/lnet/lnet-dlc.h>
#include <uapi/linux/lnet/lnet-types.h>
#include <uapi/linux/lnet/lnetctl.h>
#include <uapi/linux/lnet/nidstr.h>
extern struct lnet the_lnet; /* THE network */
#if (BITS_PER_LONG == 32)
/* 2 CPTs, allowing more CPTs might make us under memory pressure */
#define LNET_CPT_MAX_BITS 1
#else /* 64-bit system */
/*
* 256 CPTs for thousands of CPUs, allowing more CPTs might make us
* under risk of consuming all lh_cookie.
*/
#define LNET_CPT_MAX_BITS 8
#endif /* BITS_PER_LONG == 32 */
/* max allowed CPT number */
#define LNET_CPT_MAX (1 << LNET_CPT_MAX_BITS)
#define LNET_CPT_NUMBER (the_lnet.ln_cpt_number)
#define LNET_CPT_BITS (the_lnet.ln_cpt_bits)
#define LNET_CPT_MASK ((1ULL << LNET_CPT_BITS) - 1)
/** exclusive lock */
#define LNET_LOCK_EX CFS_PERCPT_LOCK_EX
/* need both kernel and user-land acceptor */
#define LNET_ACCEPTOR_MIN_RESERVED_PORT 512
#define LNET_ACCEPTOR_MAX_RESERVED_PORT 1023
static inline int lnet_is_route_alive(struct lnet_route *route)
{
/* gateway is down */
if (!route->lr_gateway->lp_alive)
return 0;
/* no NI status, assume it's alive */
if ((route->lr_gateway->lp_ping_feats &
LNET_PING_FEAT_NI_STATUS) == 0)
return 1;
/* has NI status, check # down NIs */
return route->lr_downis == 0;
}
static inline int lnet_is_wire_handle_none(struct lnet_handle_wire *wh)
{
return (wh->wh_interface_cookie == LNET_WIRE_HANDLE_COOKIE_NONE &&
wh->wh_object_cookie == LNET_WIRE_HANDLE_COOKIE_NONE);
}
static inline int lnet_md_exhausted(struct lnet_libmd *md)
{
return (!md->md_threshold ||
((md->md_options & LNET_MD_MAX_SIZE) &&
md->md_offset + md->md_max_size > md->md_length));
}
static inline int lnet_md_unlinkable(struct lnet_libmd *md)
{
/*
* Should unlink md when its refcount is 0 and either:
* - md has been flagged for deletion (by auto unlink or
* LNetM[DE]Unlink, in the latter case md may not be exhausted).
* - auto unlink is on and md is exhausted.
*/
if (md->md_refcount)
return 0;
if (md->md_flags & LNET_MD_FLAG_ZOMBIE)
return 1;
return ((md->md_flags & LNET_MD_FLAG_AUTO_UNLINK) &&
lnet_md_exhausted(md));
}
#define lnet_cpt_table() (the_lnet.ln_cpt_table)
#define lnet_cpt_current() cfs_cpt_current(the_lnet.ln_cpt_table, 1)
static inline int
lnet_cpt_of_cookie(__u64 cookie)
{
unsigned int cpt = (cookie >> LNET_COOKIE_TYPE_BITS) & LNET_CPT_MASK;
/*
* LNET_CPT_NUMBER doesn't have to be power2, which means we can
* get illegal cpt from it's invalid cookie
*/
return cpt < LNET_CPT_NUMBER ? cpt : cpt % LNET_CPT_NUMBER;
}
static inline void
lnet_res_lock(int cpt)
{
cfs_percpt_lock(the_lnet.ln_res_lock, cpt);
}
static inline void
lnet_res_unlock(int cpt)
{
cfs_percpt_unlock(the_lnet.ln_res_lock, cpt);
}
static inline int
lnet_res_lock_current(void)
{
int cpt = lnet_cpt_current();
lnet_res_lock(cpt);
return cpt;
}
static inline void
lnet_net_lock(int cpt)
{
cfs_percpt_lock(the_lnet.ln_net_lock, cpt);
}
static inline void
lnet_net_unlock(int cpt)
{
cfs_percpt_unlock(the_lnet.ln_net_lock, cpt);
}
static inline int
lnet_net_lock_current(void)
{
int cpt = lnet_cpt_current();
lnet_net_lock(cpt);
return cpt;
}
#define LNET_LOCK() lnet_net_lock(LNET_LOCK_EX)
#define LNET_UNLOCK() lnet_net_unlock(LNET_LOCK_EX)
#define lnet_ptl_lock(ptl) spin_lock(&(ptl)->ptl_lock)
#define lnet_ptl_unlock(ptl) spin_unlock(&(ptl)->ptl_lock)
#define lnet_eq_wait_lock() spin_lock(&the_lnet.ln_eq_wait_lock)
#define lnet_eq_wait_unlock() spin_unlock(&the_lnet.ln_eq_wait_lock)
#define lnet_ni_lock(ni) spin_lock(&(ni)->ni_lock)
#define lnet_ni_unlock(ni) spin_unlock(&(ni)->ni_lock)
#define MAX_PORTALS 64
static inline struct lnet_libmd *
lnet_md_alloc(struct lnet_md *umd)
{
struct lnet_libmd *md;
unsigned int size;
unsigned int niov;
if (umd->options & LNET_MD_KIOV) {
niov = umd->length;
size = offsetof(struct lnet_libmd, md_iov.kiov[niov]);
} else {
niov = umd->options & LNET_MD_IOVEC ? umd->length : 1;
size = offsetof(struct lnet_libmd, md_iov.iov[niov]);
}
md = kzalloc(size, GFP_NOFS);
if (md) {
/* Set here in case of early free */
md->md_options = umd->options;
md->md_niov = niov;
INIT_LIST_HEAD(&md->md_list);
}
return md;
}
struct lnet_libhandle *lnet_res_lh_lookup(struct lnet_res_container *rec,
__u64 cookie);
void lnet_res_lh_initialize(struct lnet_res_container *rec,
struct lnet_libhandle *lh);
static inline void
lnet_res_lh_invalidate(struct lnet_libhandle *lh)
{
/* NB: cookie is still useful, don't reset it */
list_del(&lh->lh_hash_chain);
}
static inline void
lnet_eq2handle(struct lnet_handle_eq *handle, struct lnet_eq *eq)
{
if (!eq) {
LNetInvalidateEQHandle(handle);
return;
}
handle->cookie = eq->eq_lh.lh_cookie;
}
static inline struct lnet_eq *
lnet_handle2eq(struct lnet_handle_eq *handle)
{
struct lnet_libhandle *lh;
lh = lnet_res_lh_lookup(&the_lnet.ln_eq_container, handle->cookie);
if (!lh)
return NULL;
return lh_entry(lh, struct lnet_eq, eq_lh);
}
static inline void
lnet_md2handle(struct lnet_handle_md *handle, struct lnet_libmd *md)
{
handle->cookie = md->md_lh.lh_cookie;
}
static inline struct lnet_libmd *
lnet_handle2md(struct lnet_handle_md *handle)
{
/* ALWAYS called with resource lock held */
struct lnet_libhandle *lh;
int cpt;
cpt = lnet_cpt_of_cookie(handle->cookie);
lh = lnet_res_lh_lookup(the_lnet.ln_md_containers[cpt],
handle->cookie);
if (!lh)
return NULL;
return lh_entry(lh, struct lnet_libmd, md_lh);
}
static inline struct lnet_libmd *
lnet_wire_handle2md(struct lnet_handle_wire *wh)
{
/* ALWAYS called with resource lock held */
struct lnet_libhandle *lh;
int cpt;
if (wh->wh_interface_cookie != the_lnet.ln_interface_cookie)
return NULL;
cpt = lnet_cpt_of_cookie(wh->wh_object_cookie);
lh = lnet_res_lh_lookup(the_lnet.ln_md_containers[cpt],
wh->wh_object_cookie);
if (!lh)
return NULL;
return lh_entry(lh, struct lnet_libmd, md_lh);
}
static inline void
lnet_me2handle(struct lnet_handle_me *handle, struct lnet_me *me)
{
handle->cookie = me->me_lh.lh_cookie;
}
static inline struct lnet_me *
lnet_handle2me(struct lnet_handle_me *handle)
{
/* ALWAYS called with resource lock held */
struct lnet_libhandle *lh;
int cpt;
cpt = lnet_cpt_of_cookie(handle->cookie);
lh = lnet_res_lh_lookup(the_lnet.ln_me_containers[cpt],
handle->cookie);
if (!lh)
return NULL;
return lh_entry(lh, struct lnet_me, me_lh);
}
static inline void
lnet_peer_addref_locked(struct lnet_peer *lp)
{
LASSERT(lp->lp_refcount > 0);
lp->lp_refcount++;
}
void lnet_destroy_peer_locked(struct lnet_peer *lp);
static inline void
lnet_peer_decref_locked(struct lnet_peer *lp)
{
LASSERT(lp->lp_refcount > 0);
lp->lp_refcount--;
if (!lp->lp_refcount)
lnet_destroy_peer_locked(lp);
}
static inline int
lnet_isrouter(struct lnet_peer *lp)
{
return lp->lp_rtr_refcount ? 1 : 0;
}
static inline void
lnet_ni_addref_locked(struct lnet_ni *ni, int cpt)
{
LASSERT(cpt >= 0 && cpt < LNET_CPT_NUMBER);
LASSERT(*ni->ni_refs[cpt] >= 0);
(*ni->ni_refs[cpt])++;
}
static inline void
lnet_ni_addref(struct lnet_ni *ni)
{
lnet_net_lock(0);
lnet_ni_addref_locked(ni, 0);
lnet_net_unlock(0);
}
static inline void
lnet_ni_decref_locked(struct lnet_ni *ni, int cpt)
{
LASSERT(cpt >= 0 && cpt < LNET_CPT_NUMBER);
LASSERT(*ni->ni_refs[cpt] > 0);
(*ni->ni_refs[cpt])--;
}
static inline void
lnet_ni_decref(struct lnet_ni *ni)
{
lnet_net_lock(0);
lnet_ni_decref_locked(ni, 0);
lnet_net_unlock(0);
}
void lnet_ni_free(struct lnet_ni *ni);
struct lnet_ni *
lnet_ni_alloc(__u32 net, struct cfs_expr_list *el, struct list_head *nilist);
static inline int
lnet_nid2peerhash(lnet_nid_t nid)
{
return hash_long(nid, LNET_PEER_HASH_BITS);
}
static inline struct list_head *
lnet_net2rnethash(__u32 net)
{
return &the_lnet.ln_remote_nets_hash[(LNET_NETNUM(net) +
LNET_NETTYP(net)) &
((1U << the_lnet.ln_remote_nets_hbits) - 1)];
}
extern struct lnet_lnd the_lolnd;
extern int avoid_asym_router_failure;
int lnet_cpt_of_nid_locked(lnet_nid_t nid);
int lnet_cpt_of_nid(lnet_nid_t nid);
struct lnet_ni *lnet_nid2ni_locked(lnet_nid_t nid, int cpt);
struct lnet_ni *lnet_net2ni_locked(__u32 net, int cpt);
struct lnet_ni *lnet_net2ni(__u32 net);
extern int portal_rotor;
int lnet_lib_init(void);
void lnet_lib_exit(void);
int lnet_notify(struct lnet_ni *ni, lnet_nid_t peer, int alive,
unsigned long when);
void lnet_notify_locked(struct lnet_peer *lp, int notifylnd, int alive,
unsigned long when);
int lnet_add_route(__u32 net, __u32 hops, lnet_nid_t gateway_nid,
unsigned int priority);
int lnet_check_routes(void);
int lnet_del_route(__u32 net, lnet_nid_t gw_nid);
void lnet_destroy_routes(void);
int lnet_get_route(int idx, __u32 *net, __u32 *hops,
lnet_nid_t *gateway, __u32 *alive, __u32 *priority);
int lnet_get_rtr_pool_cfg(int idx, struct lnet_ioctl_pool_cfg *pool_cfg);
void lnet_router_debugfs_init(void);
void lnet_router_debugfs_fini(void);
int lnet_rtrpools_alloc(int im_a_router);
void lnet_destroy_rtrbuf(struct lnet_rtrbuf *rb, int npages);
int lnet_rtrpools_adjust(int tiny, int small, int large);
int lnet_rtrpools_enable(void);
void lnet_rtrpools_disable(void);
void lnet_rtrpools_free(int keep_pools);
struct lnet_remotenet *lnet_find_net_locked(__u32 net);
int lnet_dyn_add_ni(lnet_pid_t requested_pid,
struct lnet_ioctl_config_data *conf);
int lnet_dyn_del_ni(__u32 net);
int lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason);
int lnet_islocalnid(lnet_nid_t nid);
int lnet_islocalnet(__u32 net);
void lnet_msg_attach_md(struct lnet_msg *msg, struct lnet_libmd *md,
unsigned int offset, unsigned int mlen);
void lnet_msg_detach_md(struct lnet_msg *msg, int status);
void lnet_build_unlink_event(struct lnet_libmd *md, struct lnet_event *ev);
void lnet_build_msg_event(struct lnet_msg *msg, enum lnet_event_kind ev_type);
void lnet_msg_commit(struct lnet_msg *msg, int cpt);
void lnet_msg_decommit(struct lnet_msg *msg, int cpt, int status);
void lnet_eq_enqueue_event(struct lnet_eq *eq, struct lnet_event *ev);
void lnet_prep_send(struct lnet_msg *msg, int type,
struct lnet_process_id target, unsigned int offset,
unsigned int len);
int lnet_send(lnet_nid_t nid, struct lnet_msg *msg, lnet_nid_t rtr_nid);
void lnet_return_tx_credits_locked(struct lnet_msg *msg);
void lnet_return_rx_credits_locked(struct lnet_msg *msg);
void lnet_schedule_blocked_locked(struct lnet_rtrbufpool *rbp);
void lnet_drop_routed_msgs_locked(struct list_head *list, int cpt);
/* portals functions */
/* portals attributes */
static inline int
lnet_ptl_is_lazy(struct lnet_portal *ptl)
{
return !!(ptl->ptl_options & LNET_PTL_LAZY);
}
static inline int
lnet_ptl_is_unique(struct lnet_portal *ptl)
{
return !!(ptl->ptl_options & LNET_PTL_MATCH_UNIQUE);
}
static inline int
lnet_ptl_is_wildcard(struct lnet_portal *ptl)
{
return !!(ptl->ptl_options & LNET_PTL_MATCH_WILDCARD);
}
static inline void
lnet_ptl_setopt(struct lnet_portal *ptl, int opt)
{
ptl->ptl_options |= opt;
}
static inline void
lnet_ptl_unsetopt(struct lnet_portal *ptl, int opt)
{
ptl->ptl_options &= ~opt;
}
/* match-table functions */
struct list_head *lnet_mt_match_head(struct lnet_match_table *mtable,
struct lnet_process_id id, __u64 mbits);
struct lnet_match_table *lnet_mt_of_attach(unsigned int index,
struct lnet_process_id id,
__u64 mbits, __u64 ignore_bits,
enum lnet_ins_pos pos);
int lnet_mt_match_md(struct lnet_match_table *mtable,
struct lnet_match_info *info, struct lnet_msg *msg);
/* portals match/attach functions */
void lnet_ptl_attach_md(struct lnet_me *me, struct lnet_libmd *md,
struct list_head *matches, struct list_head *drops);
void lnet_ptl_detach_md(struct lnet_me *me, struct lnet_libmd *md);
int lnet_ptl_match_md(struct lnet_match_info *info, struct lnet_msg *msg);
/* initialized and finalize portals */
int lnet_portals_create(void);
void lnet_portals_destroy(void);
/* message functions */
int lnet_parse(struct lnet_ni *ni, struct lnet_hdr *hdr,
lnet_nid_t fromnid, void *private, int rdma_req);
int lnet_parse_local(struct lnet_ni *ni, struct lnet_msg *msg);
int lnet_parse_forward_locked(struct lnet_ni *ni, struct lnet_msg *msg);
void lnet_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
int delayed, unsigned int offset, unsigned int mlen,
unsigned int rlen);
void lnet_ni_recv(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
int delayed, unsigned int offset,
unsigned int mlen, unsigned int rlen);
struct lnet_msg *lnet_create_reply_msg(struct lnet_ni *ni,
struct lnet_msg *get_msg);
void lnet_set_reply_msg_len(struct lnet_ni *ni, struct lnet_msg *msg,
unsigned int len);
void lnet_finalize(struct lnet_ni *ni, struct lnet_msg *msg, int rc);
void lnet_drop_message(struct lnet_ni *ni, int cpt, void *private,
unsigned int nob);
void lnet_drop_delayed_msg_list(struct list_head *head, char *reason);
void lnet_recv_delayed_msg_list(struct list_head *head);
int lnet_msg_container_setup(struct lnet_msg_container *container, int cpt);
void lnet_msg_container_cleanup(struct lnet_msg_container *container);
void lnet_msg_containers_destroy(void);
int lnet_msg_containers_create(void);
char *lnet_msgtyp2str(int type);
void lnet_print_hdr(struct lnet_hdr *hdr);
int lnet_fail_nid(lnet_nid_t nid, unsigned int threshold);
/** \addtogroup lnet_fault_simulation @{ */
int lnet_fault_ctl(int cmd, struct libcfs_ioctl_data *data);
int lnet_fault_init(void);
void lnet_fault_fini(void);
bool lnet_drop_rule_match(struct lnet_hdr *hdr);
int lnet_delay_rule_add(struct lnet_fault_attr *attr);
int lnet_delay_rule_del(lnet_nid_t src, lnet_nid_t dst, bool shutdown);
int lnet_delay_rule_list(int pos, struct lnet_fault_attr *attr,
struct lnet_fault_stat *stat);
void lnet_delay_rule_reset(void);
void lnet_delay_rule_check(void);
bool lnet_delay_rule_match_locked(struct lnet_hdr *hdr, struct lnet_msg *msg);
/** @} lnet_fault_simulation */
void lnet_counters_get(struct lnet_counters *counters);
void lnet_counters_reset(void);
unsigned int lnet_iov_nob(unsigned int niov, struct kvec *iov);
int lnet_extract_iov(int dst_niov, struct kvec *dst,
int src_niov, const struct kvec *src,
unsigned int offset, unsigned int len);
unsigned int lnet_kiov_nob(unsigned int niov, struct bio_vec *iov);
int lnet_extract_kiov(int dst_niov, struct bio_vec *dst,
int src_niov, const struct bio_vec *src,
unsigned int offset, unsigned int len);
void lnet_copy_iov2iter(struct iov_iter *to,
unsigned int nsiov, const struct kvec *siov,
unsigned int soffset, unsigned int nob);
void lnet_copy_kiov2iter(struct iov_iter *to,
unsigned int nkiov, const struct bio_vec *kiov,
unsigned int kiovoffset, unsigned int nob);
void lnet_me_unlink(struct lnet_me *me);
void lnet_md_unlink(struct lnet_libmd *md);
void lnet_md_deconstruct(struct lnet_libmd *lmd, struct lnet_md *umd);
void lnet_register_lnd(struct lnet_lnd *lnd);
void lnet_unregister_lnd(struct lnet_lnd *lnd);
int lnet_connect(struct socket **sockp, lnet_nid_t peer_nid,
__u32 local_ip, __u32 peer_ip, int peer_port);
void lnet_connect_console_error(int rc, lnet_nid_t peer_nid,
__u32 peer_ip, int port);
int lnet_count_acceptor_nis(void);
int lnet_acceptor_timeout(void);
int lnet_acceptor_port(void);
int lnet_count_acceptor_nis(void);
int lnet_acceptor_port(void);
int lnet_acceptor_start(void);
void lnet_acceptor_stop(void);
int lnet_ipif_query(char *name, int *up, __u32 *ip, __u32 *mask);
int lnet_ipif_enumerate(char ***names);
void lnet_ipif_free_enumeration(char **names, int n);
int lnet_sock_setbuf(struct socket *socket, int txbufsize, int rxbufsize);
int lnet_sock_getbuf(struct socket *socket, int *txbufsize, int *rxbufsize);
int lnet_sock_getaddr(struct socket *socket, bool remote, __u32 *ip, int *port);
int lnet_sock_write(struct socket *sock, void *buffer, int nob, int timeout);
int lnet_sock_read(struct socket *sock, void *buffer, int nob, int timeout);
int lnet_sock_listen(struct socket **sockp, __u32 ip, int port, int backlog);
int lnet_sock_accept(struct socket **newsockp, struct socket *sock);
int lnet_sock_connect(struct socket **sockp, int *fatal,
__u32 local_ip, int local_port,
__u32 peer_ip, int peer_port);
void libcfs_sock_release(struct socket *sock);
int lnet_peers_start_down(void);
int lnet_peer_buffer_credits(struct lnet_ni *ni);
int lnet_router_checker_start(void);
void lnet_router_checker_stop(void);
void lnet_router_ni_update_locked(struct lnet_peer *gw, __u32 net);
void lnet_swap_pinginfo(struct lnet_ping_info *info);
int lnet_parse_ip2nets(char **networksp, char *ip2nets);
int lnet_parse_routes(char *route_str, int *im_a_router);
int lnet_parse_networks(struct list_head *nilist, char *networks);
int lnet_net_unique(__u32 net, struct list_head *nilist);
int lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt);
struct lnet_peer *lnet_find_peer_locked(struct lnet_peer_table *ptable,
lnet_nid_t nid);
void lnet_peer_tables_cleanup(struct lnet_ni *ni);
void lnet_peer_tables_destroy(void);
int lnet_peer_tables_create(void);
void lnet_debug_peer(lnet_nid_t nid);
int lnet_get_peer_info(__u32 peer_index, __u64 *nid,
char alivness[LNET_MAX_STR_LEN],
__u32 *cpt_iter, __u32 *refcount,
__u32 *ni_peer_tx_credits, __u32 *peer_tx_credits,
__u32 *peer_rtr_credits, __u32 *peer_min_rtr_credtis,
__u32 *peer_tx_qnob);
static inline void
lnet_peer_set_alive(struct lnet_peer *lp)
{
lp->lp_last_query = jiffies;
lp->lp_last_alive = jiffies;
if (!lp->lp_alive)
lnet_notify_locked(lp, 0, 1, lp->lp_last_alive);
}
#endif

View File

@ -1,666 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Seagate, Inc.
*
* lnet/include/lnet/lib-types.h
*/
#ifndef __LNET_LIB_TYPES_H__
#define __LNET_LIB_TYPES_H__
#include <linux/kthread.h>
#include <linux/uio.h>
#include <linux/types.h>
#include <linux/completion.h>
#include <uapi/linux/lnet/lnet-types.h>
#include <uapi/linux/lnet/lnetctl.h>
/* Max payload size */
#define LNET_MAX_PAYLOAD CONFIG_LNET_MAX_PAYLOAD
#if (LNET_MAX_PAYLOAD < LNET_MTU)
# error "LNET_MAX_PAYLOAD too small - error in configure --with-max-payload-mb"
#elif (LNET_MAX_PAYLOAD > (PAGE_SIZE * LNET_MAX_IOV))
# error "LNET_MAX_PAYLOAD too large - error in configure --with-max-payload-mb"
#endif
/* forward refs */
struct lnet_libmd;
struct lnet_msg {
struct list_head msg_activelist;
struct list_head msg_list; /* Q for credits/MD */
struct lnet_process_id msg_target;
/* where is it from, it's only for building event */
lnet_nid_t msg_from;
__u32 msg_type;
/* committed for sending */
unsigned int msg_tx_committed:1;
/* CPT # this message committed for sending */
unsigned int msg_tx_cpt:15;
/* committed for receiving */
unsigned int msg_rx_committed:1;
/* CPT # this message committed for receiving */
unsigned int msg_rx_cpt:15;
/* queued for tx credit */
unsigned int msg_tx_delayed:1;
/* queued for RX buffer */
unsigned int msg_rx_delayed:1;
/* ready for pending on RX delay list */
unsigned int msg_rx_ready_delay:1;
unsigned int msg_vmflush:1; /* VM trying to free memory */
unsigned int msg_target_is_router:1; /* sending to a router */
unsigned int msg_routing:1; /* being forwarded */
unsigned int msg_ack:1; /* ack on finalize (PUT) */
unsigned int msg_sending:1; /* outgoing message */
unsigned int msg_receiving:1; /* being received */
unsigned int msg_txcredit:1; /* taken an NI send credit */
unsigned int msg_peertxcredit:1; /* taken a peer send credit */
unsigned int msg_rtrcredit:1; /* taken a global router credit */
unsigned int msg_peerrtrcredit:1; /* taken a peer router credit */
unsigned int msg_onactivelist:1; /* on the activelist */
unsigned int msg_rdma_get:1;
struct lnet_peer *msg_txpeer; /* peer I'm sending to */
struct lnet_peer *msg_rxpeer; /* peer I received from */
void *msg_private;
struct lnet_libmd *msg_md;
unsigned int msg_len;
unsigned int msg_wanted;
unsigned int msg_offset;
unsigned int msg_niov;
struct kvec *msg_iov;
struct bio_vec *msg_kiov;
struct lnet_event msg_ev;
struct lnet_hdr msg_hdr;
};
struct lnet_libhandle {
struct list_head lh_hash_chain;
__u64 lh_cookie;
};
#define lh_entry(ptr, type, member) \
((type *)((char *)(ptr) - (char *)(&((type *)0)->member)))
struct lnet_eq {
struct list_head eq_list;
struct lnet_libhandle eq_lh;
unsigned long eq_enq_seq;
unsigned long eq_deq_seq;
unsigned int eq_size;
lnet_eq_handler_t eq_callback;
struct lnet_event *eq_events;
int **eq_refs; /* percpt refcount for EQ */
};
struct lnet_me {
struct list_head me_list;
struct lnet_libhandle me_lh;
struct lnet_process_id me_match_id;
unsigned int me_portal;
unsigned int me_pos; /* hash offset in mt_hash */
__u64 me_match_bits;
__u64 me_ignore_bits;
enum lnet_unlink me_unlink;
struct lnet_libmd *me_md;
};
struct lnet_libmd {
struct list_head md_list;
struct lnet_libhandle md_lh;
struct lnet_me *md_me;
char *md_start;
unsigned int md_offset;
unsigned int md_length;
unsigned int md_max_size;
int md_threshold;
int md_refcount;
unsigned int md_options;
unsigned int md_flags;
void *md_user_ptr;
struct lnet_eq *md_eq;
unsigned int md_niov; /* # frags */
union {
struct kvec iov[LNET_MAX_IOV];
struct bio_vec kiov[LNET_MAX_IOV];
} md_iov;
};
#define LNET_MD_FLAG_ZOMBIE BIT(0)
#define LNET_MD_FLAG_AUTO_UNLINK BIT(1)
#define LNET_MD_FLAG_ABORTED BIT(2)
struct lnet_test_peer {
/* info about peers we are trying to fail */
struct list_head tp_list; /* ln_test_peers */
lnet_nid_t tp_nid; /* matching nid */
unsigned int tp_threshold; /* # failures to simulate */
};
#define LNET_COOKIE_TYPE_MD 1
#define LNET_COOKIE_TYPE_ME 2
#define LNET_COOKIE_TYPE_EQ 3
#define LNET_COOKIE_TYPE_BITS 2
#define LNET_COOKIE_MASK ((1ULL << LNET_COOKIE_TYPE_BITS) - 1ULL)
struct lnet_ni; /* forward ref */
struct lnet_lnd {
/* fields managed by portals */
struct list_head lnd_list; /* stash in the LND table */
int lnd_refcount; /* # active instances */
/* fields initialised by the LND */
__u32 lnd_type;
int (*lnd_startup)(struct lnet_ni *ni);
void (*lnd_shutdown)(struct lnet_ni *ni);
int (*lnd_ctl)(struct lnet_ni *ni, unsigned int cmd, void *arg);
/*
* In data movement APIs below, payload buffers are described as a set
* of 'niov' fragments which are...
* EITHER
* in virtual memory (struct iovec *iov != NULL)
* OR
* in pages (kernel only: plt_kiov_t *kiov != NULL).
* The LND may NOT overwrite these fragment descriptors.
* An 'offset' and may specify a byte offset within the set of
* fragments to start from
*/
/*
* Start sending a preformatted message. 'private' is NULL for PUT and
* GET messages; otherwise this is a response to an incoming message
* and 'private' is the 'private' passed to lnet_parse(). Return
* non-zero for immediate failure, otherwise complete later with
* lnet_finalize()
*/
int (*lnd_send)(struct lnet_ni *ni, void *private,
struct lnet_msg *msg);
/*
* Start receiving 'mlen' bytes of payload data, skipping the following
* 'rlen' - 'mlen' bytes. 'private' is the 'private' passed to
* lnet_parse(). Return non-zero for immediate failure, otherwise
* complete later with lnet_finalize(). This also gives back a receive
* credit if the LND does flow control.
*/
int (*lnd_recv)(struct lnet_ni *ni, void *private, struct lnet_msg *msg,
int delayed, struct iov_iter *to, unsigned int rlen);
/*
* lnet_parse() has had to delay processing of this message
* (e.g. waiting for a forwarding buffer or send credits). Give the
* LND a chance to free urgently needed resources. If called, return 0
* for success and do NOT give back a receive credit; that has to wait
* until lnd_recv() gets called. On failure return < 0 and
* release resources; lnd_recv() will not be called.
*/
int (*lnd_eager_recv)(struct lnet_ni *ni, void *private,
struct lnet_msg *msg, void **new_privatep);
/* notification of peer health */
void (*lnd_notify)(struct lnet_ni *ni, lnet_nid_t peer, int alive);
/* query of peer aliveness */
void (*lnd_query)(struct lnet_ni *ni, lnet_nid_t peer,
unsigned long *when);
/* accept a new connection */
int (*lnd_accept)(struct lnet_ni *ni, struct socket *sock);
};
struct lnet_tx_queue {
int tq_credits; /* # tx credits free */
int tq_credits_min; /* lowest it's been */
int tq_credits_max; /* total # tx credits */
struct list_head tq_delayed; /* delayed TXs */
};
struct lnet_ni {
spinlock_t ni_lock;
struct list_head ni_list; /* chain on ln_nis */
struct list_head ni_cptlist; /* chain on ln_nis_cpt */
int ni_maxtxcredits; /* # tx credits */
/* # per-peer send credits */
int ni_peertxcredits;
/* # per-peer router buffer credits */
int ni_peerrtrcredits;
/* seconds to consider peer dead */
int ni_peertimeout;
int ni_ncpts; /* number of CPTs */
__u32 *ni_cpts; /* bond NI on some CPTs */
lnet_nid_t ni_nid; /* interface's NID */
void *ni_data; /* instance-specific data */
struct lnet_lnd *ni_lnd; /* procedural interface */
struct lnet_tx_queue **ni_tx_queues; /* percpt TX queues */
int **ni_refs; /* percpt reference count */
time64_t ni_last_alive;/* when I was last alive */
struct lnet_ni_status *ni_status; /* my health status */
/* per NI LND tunables */
struct lnet_ioctl_config_lnd_tunables *ni_lnd_tunables;
/* equivalent interfaces to use */
char *ni_interfaces[LNET_MAX_INTERFACES];
/* original net namespace */
struct net *ni_net_ns;
};
#define LNET_PROTO_PING_MATCHBITS 0x8000000000000000LL
/*
* NB: value of these features equal to LNET_PROTO_PING_VERSION_x
* of old LNet, so there shouldn't be any compatibility issue
*/
#define LNET_PING_FEAT_INVAL (0) /* no feature */
#define LNET_PING_FEAT_BASE BIT(0) /* just a ping */
#define LNET_PING_FEAT_NI_STATUS BIT(1) /* return NI status */
#define LNET_PING_FEAT_RTE_DISABLED BIT(2) /* Routing enabled */
#define LNET_PING_FEAT_MASK (LNET_PING_FEAT_BASE | \
LNET_PING_FEAT_NI_STATUS)
/* router checker data, per router */
#define LNET_MAX_RTR_NIS 16
#define LNET_PINGINFO_SIZE offsetof(struct lnet_ping_info, pi_ni[LNET_MAX_RTR_NIS])
struct lnet_rc_data {
/* chain on the_lnet.ln_zombie_rcd or ln_deathrow_rcd */
struct list_head rcd_list;
struct lnet_handle_md rcd_mdh; /* ping buffer MD */
struct lnet_peer *rcd_gateway; /* reference to gateway */
struct lnet_ping_info *rcd_pinginfo; /* ping buffer */
};
struct lnet_peer {
struct list_head lp_hashlist; /* chain on peer hash */
struct list_head lp_txq; /* messages blocking for
* tx credits
*/
struct list_head lp_rtrq; /* messages blocking for
* router credits
*/
struct list_head lp_rtr_list; /* chain on router list */
int lp_txcredits; /* # tx credits available */
int lp_mintxcredits; /* low water mark */
int lp_rtrcredits; /* # router credits */
int lp_minrtrcredits; /* low water mark */
unsigned int lp_alive:1; /* alive/dead? */
unsigned int lp_notify:1; /* notification outstanding? */
unsigned int lp_notifylnd:1;/* outstanding notification
* for LND?
*/
unsigned int lp_notifying:1; /* some thread is handling
* notification
*/
unsigned int lp_ping_notsent;/* SEND event outstanding
* from ping
*/
int lp_alive_count; /* # times router went
* dead<->alive
*/
long lp_txqnob; /* ytes queued for sending */
unsigned long lp_timestamp; /* time of last aliveness
* news
*/
unsigned long lp_ping_timestamp;/* time of last ping
* attempt
*/
unsigned long lp_ping_deadline; /* != 0 if ping reply
* expected
*/
unsigned long lp_last_alive; /* when I was last alive */
unsigned long lp_last_query; /* when lp_ni was queried
* last time
*/
struct lnet_ni *lp_ni; /* interface peer is on */
lnet_nid_t lp_nid; /* peer's NID */
int lp_refcount; /* # refs */
int lp_cpt; /* CPT this peer attached on */
/* # refs from lnet_route::lr_gateway */
int lp_rtr_refcount;
/* returned RC ping features */
unsigned int lp_ping_feats;
struct list_head lp_routes; /* routers on this peer */
struct lnet_rc_data *lp_rcd; /* router checker state */
};
/* peer hash size */
#define LNET_PEER_HASH_BITS 9
#define LNET_PEER_HASH_SIZE (1 << LNET_PEER_HASH_BITS)
/* peer hash table */
struct lnet_peer_table {
int pt_version; /* /proc validity stamp */
int pt_number; /* # peers extant */
/* # zombies to go to deathrow (and not there yet) */
int pt_zombies;
struct list_head pt_deathrow; /* zombie peers */
struct list_head *pt_hash; /* NID->peer hash */
};
/*
* peer aliveness is enabled only on routers for peers in a network where the
* lnet_ni::ni_peertimeout has been set to a positive value
*/
#define lnet_peer_aliveness_enabled(lp) (the_lnet.ln_routing && \
(lp)->lp_ni->ni_peertimeout > 0)
struct lnet_route {
struct list_head lr_list; /* chain on net */
struct list_head lr_gwlist; /* chain on gateway */
struct lnet_peer *lr_gateway; /* router node */
__u32 lr_net; /* remote network number */
int lr_seq; /* sequence for round-robin */
unsigned int lr_downis; /* number of down NIs */
__u32 lr_hops; /* how far I am */
unsigned int lr_priority; /* route priority */
};
#define LNET_REMOTE_NETS_HASH_DEFAULT (1U << 7)
#define LNET_REMOTE_NETS_HASH_MAX (1U << 16)
#define LNET_REMOTE_NETS_HASH_SIZE (1 << the_lnet.ln_remote_nets_hbits)
struct lnet_remotenet {
struct list_head lrn_list; /* chain on
* ln_remote_nets_hash
*/
struct list_head lrn_routes; /* routes to me */
__u32 lrn_net; /* my net number */
};
/** lnet message has credit and can be submitted to lnd for send/receive */
#define LNET_CREDIT_OK 0
/** lnet message is waiting for credit */
#define LNET_CREDIT_WAIT 1
struct lnet_rtrbufpool {
struct list_head rbp_bufs; /* my free buffer pool */
struct list_head rbp_msgs; /* messages blocking
* for a buffer
*/
int rbp_npages; /* # pages in each buffer */
/* requested number of buffers */
int rbp_req_nbuffers;
/* # buffers actually allocated */
int rbp_nbuffers;
int rbp_credits; /* # free buffers
* blocked messages
*/
int rbp_mincredits; /* low water mark */
};
struct lnet_rtrbuf {
struct list_head rb_list; /* chain on rbp_bufs */
struct lnet_rtrbufpool *rb_pool; /* owning pool */
struct bio_vec rb_kiov[0]; /* the buffer space */
};
#define LNET_PEER_HASHSIZE 503 /* prime! */
#define LNET_TINY_BUF_IDX 0
#define LNET_SMALL_BUF_IDX 1
#define LNET_LARGE_BUF_IDX 2
/* # different router buffer pools */
#define LNET_NRBPOOLS (LNET_LARGE_BUF_IDX + 1)
enum lnet_match_flags {
/* Didn't match anything */
LNET_MATCHMD_NONE = BIT(0),
/* Matched OK */
LNET_MATCHMD_OK = BIT(1),
/* Must be discarded */
LNET_MATCHMD_DROP = BIT(2),
/* match and buffer is exhausted */
LNET_MATCHMD_EXHAUSTED = BIT(3),
/* match or drop */
LNET_MATCHMD_FINISH = (LNET_MATCHMD_OK | LNET_MATCHMD_DROP),
};
/* Options for lnet_portal::ptl_options */
#define LNET_PTL_LAZY BIT(0)
#define LNET_PTL_MATCH_UNIQUE BIT(1) /* unique match, for RDMA */
#define LNET_PTL_MATCH_WILDCARD BIT(2) /* wildcard match, request portal */
/* parameter for matching operations (GET, PUT) */
struct lnet_match_info {
__u64 mi_mbits;
struct lnet_process_id mi_id;
unsigned int mi_opc;
unsigned int mi_portal;
unsigned int mi_rlength;
unsigned int mi_roffset;
};
/* ME hash of RDMA portal */
#define LNET_MT_HASH_BITS 8
#define LNET_MT_HASH_SIZE (1 << LNET_MT_HASH_BITS)
#define LNET_MT_HASH_MASK (LNET_MT_HASH_SIZE - 1)
/*
* we allocate (LNET_MT_HASH_SIZE + 1) entries for lnet_match_table::mt_hash,
* the last entry is reserved for MEs with ignore-bits
*/
#define LNET_MT_HASH_IGNORE LNET_MT_HASH_SIZE
/*
* __u64 has 2^6 bits, so need 2^(LNET_MT_HASH_BITS - LNET_MT_BITS_U64) which
* is 4 __u64s as bit-map, and add an extra __u64 (only use one bit) for the
* ME-list with ignore-bits, which is mtable::mt_hash[LNET_MT_HASH_IGNORE]
*/
#define LNET_MT_BITS_U64 6 /* 2^6 bits */
#define LNET_MT_EXHAUSTED_BITS (LNET_MT_HASH_BITS - LNET_MT_BITS_U64)
#define LNET_MT_EXHAUSTED_BMAP ((1 << LNET_MT_EXHAUSTED_BITS) + 1)
/* portal match table */
struct lnet_match_table {
/* reserved for upcoming patches, CPU partition ID */
unsigned int mt_cpt;
unsigned int mt_portal; /* portal index */
/*
* match table is set as "enabled" if there's non-exhausted MD
* attached on mt_mhash, it's only valid for wildcard portal
*/
unsigned int mt_enabled;
/* bitmap to flag whether MEs on mt_hash are exhausted or not */
__u64 mt_exhausted[LNET_MT_EXHAUSTED_BMAP];
struct list_head *mt_mhash; /* matching hash */
};
/* these are only useful for wildcard portal */
/* Turn off message rotor for wildcard portals */
#define LNET_PTL_ROTOR_OFF 0
/* round-robin dispatch all PUT messages for wildcard portals */
#define LNET_PTL_ROTOR_ON 1
/* round-robin dispatch routed PUT message for wildcard portals */
#define LNET_PTL_ROTOR_RR_RT 2
/* dispatch routed PUT message by hashing source NID for wildcard portals */
#define LNET_PTL_ROTOR_HASH_RT 3
struct lnet_portal {
spinlock_t ptl_lock;
unsigned int ptl_index; /* portal ID, reserved */
/* flags on this portal: lazy, unique... */
unsigned int ptl_options;
/* list of messages which are stealing buffer */
struct list_head ptl_msg_stealing;
/* messages blocking for MD */
struct list_head ptl_msg_delayed;
/* Match table for each CPT */
struct lnet_match_table **ptl_mtables;
/* spread rotor of incoming "PUT" */
unsigned int ptl_rotor;
/* # active entries for this portal */
int ptl_mt_nmaps;
/* array of active entries' cpu-partition-id */
int ptl_mt_maps[0];
};
#define LNET_LH_HASH_BITS 12
#define LNET_LH_HASH_SIZE (1ULL << LNET_LH_HASH_BITS)
#define LNET_LH_HASH_MASK (LNET_LH_HASH_SIZE - 1)
/* resource container (ME, MD, EQ) */
struct lnet_res_container {
unsigned int rec_type; /* container type */
__u64 rec_lh_cookie; /* cookie generator */
struct list_head rec_active; /* active resource list */
struct list_head *rec_lh_hash; /* handle hash */
};
/* message container */
struct lnet_msg_container {
int msc_init; /* initialized or not */
/* max # threads finalizing */
int msc_nfinalizers;
/* msgs waiting to complete finalizing */
struct list_head msc_finalizing;
struct list_head msc_active; /* active message list */
/* threads doing finalization */
void **msc_finalizers;
};
/* Router Checker states */
#define LNET_RC_STATE_SHUTDOWN 0 /* not started */
#define LNET_RC_STATE_RUNNING 1 /* started up OK */
#define LNET_RC_STATE_STOPPING 2 /* telling thread to stop */
struct lnet {
/* CPU partition table of LNet */
struct cfs_cpt_table *ln_cpt_table;
/* number of CPTs in ln_cpt_table */
unsigned int ln_cpt_number;
unsigned int ln_cpt_bits;
/* protect LNet resources (ME/MD/EQ) */
struct cfs_percpt_lock *ln_res_lock;
/* # portals */
int ln_nportals;
/* the vector of portals */
struct lnet_portal **ln_portals;
/* percpt ME containers */
struct lnet_res_container **ln_me_containers;
/* percpt MD container */
struct lnet_res_container **ln_md_containers;
/* Event Queue container */
struct lnet_res_container ln_eq_container;
wait_queue_head_t ln_eq_waitq;
spinlock_t ln_eq_wait_lock;
unsigned int ln_remote_nets_hbits;
/* protect NI, peer table, credits, routers, rtrbuf... */
struct cfs_percpt_lock *ln_net_lock;
/* percpt message containers for active/finalizing/freed message */
struct lnet_msg_container **ln_msg_containers;
struct lnet_counters **ln_counters;
struct lnet_peer_table **ln_peer_tables;
/* failure simulation */
struct list_head ln_test_peers;
struct list_head ln_drop_rules;
struct list_head ln_delay_rules;
struct list_head ln_nis; /* LND instances */
/* NIs bond on specific CPT(s) */
struct list_head ln_nis_cpt;
/* dying LND instances */
struct list_head ln_nis_zombie;
struct lnet_ni *ln_loni; /* the loopback NI */
/* remote networks with routes to them */
struct list_head *ln_remote_nets_hash;
/* validity stamp */
__u64 ln_remote_nets_version;
/* list of all known routers */
struct list_head ln_routers;
/* validity stamp */
__u64 ln_routers_version;
/* percpt router buffer pools */
struct lnet_rtrbufpool **ln_rtrpools;
struct lnet_handle_md ln_ping_target_md;
struct lnet_handle_eq ln_ping_target_eq;
struct lnet_ping_info *ln_ping_info;
/* router checker startup/shutdown state */
int ln_rc_state;
/* router checker's event queue */
struct lnet_handle_eq ln_rc_eqh;
/* rcd still pending on net */
struct list_head ln_rcd_deathrow;
/* rcd ready for free */
struct list_head ln_rcd_zombie;
/* serialise startup/shutdown */
struct completion ln_rc_signal;
struct mutex ln_api_mutex;
struct mutex ln_lnd_mutex;
struct mutex ln_delay_mutex;
/* Have I called LNetNIInit myself? */
int ln_niinit_self;
/* LNetNIInit/LNetNIFini counter */
int ln_refcount;
/* shutdown in progress */
int ln_shutdown;
int ln_routing; /* am I a router? */
lnet_pid_t ln_pid; /* requested pid */
/* uniquely identifies this ni in this epoch */
__u64 ln_interface_cookie;
/* registered LNDs */
struct list_head ln_lnds;
/* test protocol compatibility flags */
int ln_testprotocompat;
/*
* 0 - load the NIs from the mod params
* 1 - do not load the NIs from the mod params
* Reverse logic to ensure that other calls to LNetNIInit
* need no change
*/
bool ln_nis_from_mod_params;
/*
* waitq for router checker. As long as there are no routes in
* the list, the router checker will sleep on this queue. when
* routes are added the thread will wake up
*/
wait_queue_head_t ln_rc_waitq;
};
#endif

View File

@ -1,87 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012 - 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Seagate, Inc.
*
* lnet/include/lnet/socklnd.h
*/
#ifndef __LNET_LNET_SOCKLND_H__
#define __LNET_LNET_SOCKLND_H__
#include <uapi/linux/lnet/lnet-types.h>
#include <uapi/linux/lnet/socklnd.h>
struct ksock_hello_msg {
__u32 kshm_magic; /* magic number of socklnd message */
__u32 kshm_version; /* version of socklnd message */
lnet_nid_t kshm_src_nid; /* sender's nid */
lnet_nid_t kshm_dst_nid; /* destination nid */
lnet_pid_t kshm_src_pid; /* sender's pid */
lnet_pid_t kshm_dst_pid; /* destination pid */
__u64 kshm_src_incarnation; /* sender's incarnation */
__u64 kshm_dst_incarnation; /* destination's incarnation */
__u32 kshm_ctype; /* connection type */
__u32 kshm_nips; /* # IP addrs */
__u32 kshm_ips[0]; /* IP addrs */
} WIRE_ATTR;
struct ksock_lnet_msg {
struct lnet_hdr ksnm_hdr; /* lnet hdr */
/*
* ksnm_payload is removed because of winnt compiler's limitation:
* zero-sized array can only be placed at the tail of [nested]
* structure definitions. lnet payload will be stored just after
* the body of structure ksock_lnet_msg_t
*/
} WIRE_ATTR;
struct ksock_msg {
__u32 ksm_type; /* type of socklnd message */
__u32 ksm_csum; /* checksum if != 0 */
__u64 ksm_zc_cookies[2]; /* Zero-Copy request/ACK cookie */
union {
struct ksock_lnet_msg lnetmsg; /* lnet message, it's empty if
* it's NOOP
*/
} WIRE_ATTR ksm_u;
} WIRE_ATTR;
#define KSOCK_MSG_NOOP 0xC0 /* ksm_u empty */
#define KSOCK_MSG_LNET 0xC1 /* lnet msg */
/*
* We need to know this number to parse hello msg from ksocklnd in
* other LND (usocklnd, for example)
*/
#define KSOCK_PROTO_V2 2
#define KSOCK_PROTO_V3 3
#endif

View File

@ -1,149 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, 2014, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* libcfs/include/libcfs/libcfs_debug.h
*
* Debug messages and assertions
*
*/
#ifndef __UAPI_LIBCFS_DEBUG_H__
#define __UAPI_LIBCFS_DEBUG_H__
/**
* Format for debug message headers
*/
struct ptldebug_header {
__u32 ph_len;
__u32 ph_flags;
__u32 ph_subsys;
__u32 ph_mask;
__u16 ph_cpu_id;
__u16 ph_type;
/* time_t overflow in 2106 */
__u32 ph_sec;
__u64 ph_usec;
__u32 ph_stack;
__u32 ph_pid;
__u32 ph_extern_pid;
__u32 ph_line_num;
} __attribute__((packed));
#define PH_FLAG_FIRST_RECORD 1
/* Debugging subsystems (32 bits, non-overlapping) */
#define S_UNDEFINED 0x00000001
#define S_MDC 0x00000002
#define S_MDS 0x00000004
#define S_OSC 0x00000008
#define S_OST 0x00000010
#define S_CLASS 0x00000020
#define S_LOG 0x00000040
#define S_LLITE 0x00000080
#define S_RPC 0x00000100
#define S_MGMT 0x00000200
#define S_LNET 0x00000400
#define S_LND 0x00000800 /* ALL LNDs */
#define S_PINGER 0x00001000
#define S_FILTER 0x00002000
#define S_LIBCFS 0x00004000
#define S_ECHO 0x00008000
#define S_LDLM 0x00010000
#define S_LOV 0x00020000
#define S_LQUOTA 0x00040000
#define S_OSD 0x00080000
#define S_LFSCK 0x00100000
#define S_SNAPSHOT 0x00200000
/* unused */
#define S_LMV 0x00800000 /* b_new_cmd */
/* unused */
#define S_SEC 0x02000000 /* upcall cache */
#define S_GSS 0x04000000 /* b_new_cmd */
/* unused */
#define S_MGC 0x10000000
#define S_MGS 0x20000000
#define S_FID 0x40000000 /* b_new_cmd */
#define S_FLD 0x80000000 /* b_new_cmd */
#define LIBCFS_DEBUG_SUBSYS_NAMES { \
"undefined", "mdc", "mds", "osc", "ost", "class", "log", \
"llite", "rpc", "mgmt", "lnet", "lnd", "pinger", "filter", \
"libcfs", "echo", "ldlm", "lov", "lquota", "osd", "lfsck", \
"snapshot", "", "lmv", "", "sec", "gss", "", "mgc", "mgs", \
"fid", "fld", NULL }
/* Debugging masks (32 bits, non-overlapping) */
#define D_TRACE 0x00000001 /* ENTRY/EXIT markers */
#define D_INODE 0x00000002
#define D_SUPER 0x00000004
#define D_EXT2 0x00000008 /* anything from ext2_debug */
#define D_MALLOC 0x00000010 /* print malloc, free information */
#define D_CACHE 0x00000020 /* cache-related items */
#define D_INFO 0x00000040 /* general information */
#define D_IOCTL 0x00000080 /* ioctl related information */
#define D_NETERROR 0x00000100 /* network errors */
#define D_NET 0x00000200 /* network communications */
#define D_WARNING 0x00000400 /* CWARN(...) == CDEBUG (D_WARNING, ...) */
#define D_BUFFS 0x00000800
#define D_OTHER 0x00001000
#define D_DENTRY 0x00002000
#define D_NETTRACE 0x00004000
#define D_PAGE 0x00008000 /* bulk page handling */
#define D_DLMTRACE 0x00010000
#define D_ERROR 0x00020000 /* CERROR(...) == CDEBUG (D_ERROR, ...) */
#define D_EMERG 0x00040000 /* CEMERG(...) == CDEBUG (D_EMERG, ...) */
#define D_HA 0x00080000 /* recovery and failover */
#define D_RPCTRACE 0x00100000 /* for distributed debugging */
#define D_VFSTRACE 0x00200000
#define D_READA 0x00400000 /* read-ahead */
#define D_MMAP 0x00800000
#define D_CONFIG 0x01000000
#define D_CONSOLE 0x02000000
#define D_QUOTA 0x04000000
#define D_SEC 0x08000000
#define D_LFSCK 0x10000000 /* For both OI scrub and LFSCK */
#define D_HSM 0x20000000
#define D_SNAPSHOT 0x40000000 /* snapshot */
#define D_LAYOUT 0x80000000
#define LIBCFS_DEBUG_MASKS_NAMES { \
"trace", "inode", "super", "ext2", "malloc", "cache", "info", \
"ioctl", "neterror", "net", "warning", "buffs", "other", \
"dentry", "nettrace", "page", "dlmtrace", "error", "emerg", \
"ha", "rpctrace", "vfstrace", "reada", "mmap", "config", \
"console", "quota", "sec", "lfsck", "hsm", "snapshot", "layout",\
NULL }
#define D_CANTMASK (D_ERROR | D_EMERG | D_WARNING | D_CONSOLE)
#define LIBCFS_DEBUG_FILE_PATH_DEFAULT "/tmp/lustre-log"
#endif /* __UAPI_LIBCFS_DEBUG_H__ */

View File

@ -1,141 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* libcfs/include/libcfs/libcfs_ioctl.h
*
* Low-level ioctl data structures. Kernel ioctl functions declared here,
* and user space functions are in libcfs/util/ioctl.h.
*
*/
#ifndef __LIBCFS_IOCTL_H__
#define __LIBCFS_IOCTL_H__
#include <linux/types.h>
#include <linux/ioctl.h>
#define LIBCFS_IOCTL_VERSION 0x0001000a
#define LIBCFS_IOCTL_VERSION2 0x0001000b
struct libcfs_ioctl_hdr {
__u32 ioc_len;
__u32 ioc_version;
};
/** max size to copy from userspace */
#define LIBCFS_IOC_DATA_MAX (128 * 1024)
struct libcfs_ioctl_data {
struct libcfs_ioctl_hdr ioc_hdr;
__u64 ioc_nid;
__u64 ioc_u64[1];
__u32 ioc_flags;
__u32 ioc_count;
__u32 ioc_net;
__u32 ioc_u32[7];
__u32 ioc_inllen1;
char *ioc_inlbuf1;
__u32 ioc_inllen2;
char *ioc_inlbuf2;
__u32 ioc_plen1; /* buffers in userspace */
void __user *ioc_pbuf1;
__u32 ioc_plen2; /* buffers in userspace */
void __user *ioc_pbuf2;
char ioc_bulk[0];
};
struct libcfs_debug_ioctl_data {
struct libcfs_ioctl_hdr hdr;
unsigned int subs;
unsigned int debug;
};
/* 'f' ioctls are defined in lustre_ioctl.h and lustre_user.h except for: */
#define LIBCFS_IOC_DEBUG_MASK _IOWR('f', 250, long)
#define IOCTL_LIBCFS_TYPE long
#define IOC_LIBCFS_TYPE ('e')
#define IOC_LIBCFS_MIN_NR 30
/* libcfs ioctls */
/* IOC_LIBCFS_PANIC obsolete in 2.8.0, was _IOWR('e', 30, IOCTL_LIBCFS_TYPE) */
#define IOC_LIBCFS_CLEAR_DEBUG _IOWR('e', 31, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_MARK_DEBUG _IOWR('e', 32, IOCTL_LIBCFS_TYPE)
/* IOC_LIBCFS_MEMHOG obsolete in 2.8.0, was _IOWR('e', 36, IOCTL_LIBCFS_TYPE) */
/* lnet ioctls */
#define IOC_LIBCFS_GET_NI _IOWR('e', 50, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_FAIL_NID _IOWR('e', 51, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_NOTIFY_ROUTER _IOWR('e', 55, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_UNCONFIGURE _IOWR('e', 56, IOCTL_LIBCFS_TYPE)
/* IOC_LIBCFS_PORTALS_COMPATIBILITY _IOWR('e', 57, IOCTL_LIBCFS_TYPE) */
#define IOC_LIBCFS_LNET_DIST _IOWR('e', 58, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_CONFIGURE _IOWR('e', 59, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_TESTPROTOCOMPAT _IOWR('e', 60, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_PING _IOWR('e', 61, IOCTL_LIBCFS_TYPE)
/* IOC_LIBCFS_DEBUG_PEER _IOWR('e', 62, IOCTL_LIBCFS_TYPE) */
#define IOC_LIBCFS_LNETST _IOWR('e', 63, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_LNET_FAULT _IOWR('e', 64, IOCTL_LIBCFS_TYPE)
/* lnd ioctls */
#define IOC_LIBCFS_REGISTER_MYNID _IOWR('e', 70, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_CLOSE_CONNECTION _IOWR('e', 71, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_PUSH_CONNECTION _IOWR('e', 72, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_GET_CONN _IOWR('e', 73, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_DEL_PEER _IOWR('e', 74, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_ADD_PEER _IOWR('e', 75, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_GET_PEER _IOWR('e', 76, IOCTL_LIBCFS_TYPE)
/* ioctl 77 is free for use */
#define IOC_LIBCFS_ADD_INTERFACE _IOWR('e', 78, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_DEL_INTERFACE _IOWR('e', 79, IOCTL_LIBCFS_TYPE)
#define IOC_LIBCFS_GET_INTERFACE _IOWR('e', 80, IOCTL_LIBCFS_TYPE)
/*
* DLC Specific IOCTL numbers.
* In order to maintain backward compatibility with any possible external
* tools which might be accessing the IOCTL numbers, a new group of IOCTL
* number have been allocated.
*/
#define IOCTL_CONFIG_SIZE struct lnet_ioctl_config_data
#define IOC_LIBCFS_ADD_ROUTE _IOWR(IOC_LIBCFS_TYPE, 81, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_DEL_ROUTE _IOWR(IOC_LIBCFS_TYPE, 82, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_GET_ROUTE _IOWR(IOC_LIBCFS_TYPE, 83, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_ADD_NET _IOWR(IOC_LIBCFS_TYPE, 84, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_DEL_NET _IOWR(IOC_LIBCFS_TYPE, 85, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_GET_NET _IOWR(IOC_LIBCFS_TYPE, 86, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_CONFIG_RTR _IOWR(IOC_LIBCFS_TYPE, 87, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_ADD_BUF _IOWR(IOC_LIBCFS_TYPE, 88, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_GET_BUF _IOWR(IOC_LIBCFS_TYPE, 89, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_GET_PEER_INFO _IOWR(IOC_LIBCFS_TYPE, 90, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_GET_LNET_STATS _IOWR(IOC_LIBCFS_TYPE, 91, IOCTL_CONFIG_SIZE)
#define IOC_LIBCFS_MAX_NR 91
#endif /* __LIBCFS_IOCTL_H__ */

View File

@ -1,150 +0,0 @@
/*
* LGPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library.
*
* LGPL HEADER END
*
*/
/*
* Copyright (c) 2014, Intel Corporation.
*/
/*
* Author: Amir Shehata <amir.shehata@intel.com>
*/
#ifndef LNET_DLC_H
#define LNET_DLC_H
#include <uapi/linux/lnet/libcfs_ioctl.h>
#include <uapi/linux/lnet/lnet-types.h>
#define MAX_NUM_SHOW_ENTRIES 32
#define LNET_MAX_STR_LEN 128
#define LNET_MAX_SHOW_NUM_CPT 128
#define LNET_UNDEFINED_HOPS ((__u32)(-1))
struct lnet_ioctl_config_lnd_cmn_tunables {
__u32 lct_version;
__u32 lct_peer_timeout;
__u32 lct_peer_tx_credits;
__u32 lct_peer_rtr_credits;
__u32 lct_max_tx_credits;
};
struct lnet_ioctl_config_o2iblnd_tunables {
__u32 lnd_version;
__u32 lnd_peercredits_hiw;
__u32 lnd_map_on_demand;
__u32 lnd_concurrent_sends;
__u32 lnd_fmr_pool_size;
__u32 lnd_fmr_flush_trigger;
__u32 lnd_fmr_cache;
__u16 lnd_conns_per_peer;
__u16 pad;
};
struct lnet_ioctl_config_lnd_tunables {
struct lnet_ioctl_config_lnd_cmn_tunables lt_cmn;
union {
struct lnet_ioctl_config_o2iblnd_tunables lt_o2ib;
} lt_tun_u;
};
struct lnet_ioctl_net_config {
char ni_interfaces[LNET_MAX_INTERFACES][LNET_MAX_STR_LEN];
__u32 ni_status;
__u32 ni_cpts[LNET_MAX_SHOW_NUM_CPT];
char cfg_bulk[0];
};
#define LNET_TINY_BUF_IDX 0
#define LNET_SMALL_BUF_IDX 1
#define LNET_LARGE_BUF_IDX 2
/* # different router buffer pools */
#define LNET_NRBPOOLS (LNET_LARGE_BUF_IDX + 1)
struct lnet_ioctl_pool_cfg {
struct {
__u32 pl_npages;
__u32 pl_nbuffers;
__u32 pl_credits;
__u32 pl_mincredits;
} pl_pools[LNET_NRBPOOLS];
__u32 pl_routing;
};
struct lnet_ioctl_config_data {
struct libcfs_ioctl_hdr cfg_hdr;
__u32 cfg_net;
__u32 cfg_count;
__u64 cfg_nid;
__u32 cfg_ncpts;
union {
struct {
__u32 rtr_hop;
__u32 rtr_priority;
__u32 rtr_flags;
} cfg_route;
struct {
char net_intf[LNET_MAX_STR_LEN];
__s32 net_peer_timeout;
__s32 net_peer_tx_credits;
__s32 net_peer_rtr_credits;
__s32 net_max_tx_credits;
__u32 net_cksum_algo;
__u32 net_interface_count;
} cfg_net;
struct {
__u32 buf_enable;
__s32 buf_tiny;
__s32 buf_small;
__s32 buf_large;
} cfg_buffers;
} cfg_config_u;
char cfg_bulk[0];
};
struct lnet_ioctl_peer {
struct libcfs_ioctl_hdr pr_hdr;
__u32 pr_count;
__u32 pr_pad;
__u64 pr_nid;
union {
struct {
char cr_aliveness[LNET_MAX_STR_LEN];
__u32 cr_refcount;
__u32 cr_ni_peer_tx_credits;
__u32 cr_peer_tx_credits;
__u32 cr_peer_rtr_credits;
__u32 cr_peer_min_rtr_credits;
__u32 cr_peer_tx_qnob;
__u32 cr_ncpt;
} pr_peer_credits;
} pr_lnd_u;
};
struct lnet_ioctl_lnet_stats {
struct libcfs_ioctl_hdr st_hdr;
struct lnet_counters st_cntrs;
};
#endif /* LNET_DLC_H */

View File

@ -1,669 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012 - 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Seagate, Inc.
*/
#ifndef __LNET_TYPES_H__
#define __LNET_TYPES_H__
#include <linux/types.h>
#include <linux/bvec.h>
/** \addtogroup lnet
* @{
*/
#define LNET_VERSION "0.6.0"
/** \addtogroup lnet_addr
* @{
*/
/** Portal reserved for LNet's own use.
* \see lustre/include/lustre/lustre_idl.h for Lustre portal assignments.
*/
#define LNET_RESERVED_PORTAL 0
/**
* Address of an end-point in an LNet network.
*
* A node can have multiple end-points and hence multiple addresses.
* An LNet network can be a simple network (e.g. tcp0) or a network of
* LNet networks connected by LNet routers. Therefore an end-point address
* has two parts: network ID, and address within a network.
*
* \see LNET_NIDNET, LNET_NIDADDR, and LNET_MKNID.
*/
typedef __u64 lnet_nid_t;
/**
* ID of a process in a node. Shortened as PID to distinguish from
* lnet_process_id, the global process ID.
*/
typedef __u32 lnet_pid_t;
/** wildcard NID that matches any end-point address */
#define LNET_NID_ANY ((lnet_nid_t)(-1))
/** wildcard PID that matches any lnet_pid_t */
#define LNET_PID_ANY ((lnet_pid_t)(-1))
#define LNET_PID_RESERVED 0xf0000000 /* reserved bits in PID */
#define LNET_PID_USERFLAG 0x80000000 /* set in userspace peers */
#define LNET_PID_LUSTRE 12345
#define LNET_TIME_FOREVER (-1)
/* how an LNET NID encodes net:address */
/** extract the address part of an lnet_nid_t */
static inline __u32 LNET_NIDADDR(lnet_nid_t nid)
{
return nid & 0xffffffff;
}
static inline __u32 LNET_NIDNET(lnet_nid_t nid)
{
return (nid >> 32) & 0xffffffff;
}
static inline lnet_nid_t LNET_MKNID(__u32 net, __u32 addr)
{
return (((__u64)net) << 32) | addr;
}
static inline __u32 LNET_NETNUM(__u32 net)
{
return net & 0xffff;
}
static inline __u32 LNET_NETTYP(__u32 net)
{
return (net >> 16) & 0xffff;
}
static inline __u32 LNET_MKNET(__u32 type, __u32 num)
{
return (type << 16) | num;
}
#define WIRE_ATTR __packed
/* Packed version of lnet_process_id to transfer via network */
struct lnet_process_id_packed {
/* node id / process id */
lnet_nid_t nid;
lnet_pid_t pid;
} WIRE_ATTR;
/*
* The wire handle's interface cookie only matches one network interface in
* one epoch (i.e. new cookie when the interface restarts or the node
* reboots). The object cookie only matches one object on that interface
* during that object's lifetime (i.e. no cookie re-use).
*/
struct lnet_handle_wire {
__u64 wh_interface_cookie;
__u64 wh_object_cookie;
} WIRE_ATTR;
enum lnet_msg_type {
LNET_MSG_ACK = 0,
LNET_MSG_PUT,
LNET_MSG_GET,
LNET_MSG_REPLY,
LNET_MSG_HELLO,
};
/*
* The variant fields of the portals message header are aligned on an 8
* byte boundary in the message header. Note that all types used in these
* wire structs MUST be fixed size and the smaller types are placed at the
* end.
*/
struct lnet_ack {
struct lnet_handle_wire dst_wmd;
__u64 match_bits;
__u32 mlength;
} WIRE_ATTR;
struct lnet_put {
struct lnet_handle_wire ack_wmd;
__u64 match_bits;
__u64 hdr_data;
__u32 ptl_index;
__u32 offset;
} WIRE_ATTR;
struct lnet_get {
struct lnet_handle_wire return_wmd;
__u64 match_bits;
__u32 ptl_index;
__u32 src_offset;
__u32 sink_length;
} WIRE_ATTR;
struct lnet_reply {
struct lnet_handle_wire dst_wmd;
} WIRE_ATTR;
struct lnet_hello {
__u64 incarnation;
__u32 type;
} WIRE_ATTR;
struct lnet_hdr {
lnet_nid_t dest_nid;
lnet_nid_t src_nid;
lnet_pid_t dest_pid;
lnet_pid_t src_pid;
__u32 type; /* enum lnet_msg_type */
__u32 payload_length; /* payload data to follow */
/*<------__u64 aligned------->*/
union {
struct lnet_ack ack;
struct lnet_put put;
struct lnet_get get;
struct lnet_reply reply;
struct lnet_hello hello;
} msg;
} WIRE_ATTR;
/*
* A HELLO message contains a magic number and protocol version
* code in the header's dest_nid, the peer's NID in the src_nid, and
* LNET_MSG_HELLO in the type field. All other common fields are zero
* (including payload_size; i.e. no payload).
* This is for use by byte-stream LNDs (e.g. TCP/IP) to check the peer is
* running the same protocol and to find out its NID. These LNDs should
* exchange HELLO messages when a connection is first established. Individual
* LNDs can put whatever else they fancy in struct lnet_hdr::msg.
*/
struct lnet_magicversion {
__u32 magic; /* LNET_PROTO_TCP_MAGIC */
__u16 version_major; /* increment on incompatible change */
__u16 version_minor; /* increment on compatible change */
} WIRE_ATTR;
/* PROTO MAGIC for LNDs */
#define LNET_PROTO_IB_MAGIC 0x0be91b91
#define LNET_PROTO_GNI_MAGIC 0xb00fbabe /* ask Kim */
#define LNET_PROTO_TCP_MAGIC 0xeebc0ded
#define LNET_PROTO_ACCEPTOR_MAGIC 0xacce7100
#define LNET_PROTO_PING_MAGIC 0x70696E67 /* 'ping' */
/* Placeholder for a future "unified" protocol across all LNDs */
/*
* Current LNDs that receive a request with this magic will respond with a
* "stub" reply using their current protocol
*/
#define LNET_PROTO_MAGIC 0x45726963 /* ! */
#define LNET_PROTO_TCP_VERSION_MAJOR 1
#define LNET_PROTO_TCP_VERSION_MINOR 0
/* Acceptor connection request */
struct lnet_acceptor_connreq {
__u32 acr_magic; /* PTL_ACCEPTOR_PROTO_MAGIC */
__u32 acr_version; /* protocol version */
__u64 acr_nid; /* target NID */
} WIRE_ATTR;
#define LNET_PROTO_ACCEPTOR_VERSION 1
struct lnet_ni_status {
lnet_nid_t ns_nid;
__u32 ns_status;
__u32 ns_unused;
} WIRE_ATTR;
struct lnet_ping_info {
__u32 pi_magic;
__u32 pi_features;
lnet_pid_t pi_pid;
__u32 pi_nnis;
struct lnet_ni_status pi_ni[0];
} WIRE_ATTR;
struct lnet_counters {
__u32 msgs_alloc;
__u32 msgs_max;
__u32 errors;
__u32 send_count;
__u32 recv_count;
__u32 route_count;
__u32 drop_count;
__u64 send_length;
__u64 recv_length;
__u64 route_length;
__u64 drop_length;
} WIRE_ATTR;
#define LNET_NI_STATUS_UP 0x15aac0de
#define LNET_NI_STATUS_DOWN 0xdeadface
#define LNET_NI_STATUS_INVALID 0x00000000
#define LNET_MAX_INTERFACES 16
/**
* Objects maintained by the LNet are accessed through handles. Handle types
* have names of the form lnet_handle_xx, where xx is one of the two letter
* object type codes ('eq' for event queue, 'md' for memory descriptor, and
* 'me' for match entry). Each type of object is given a unique handle type
* to enhance type checking.
*/
#define LNET_WIRE_HANDLE_COOKIE_NONE (-1)
struct lnet_handle_eq {
u64 cookie;
};
/**
* Invalidate eq handle @h.
*/
static inline void LNetInvalidateEQHandle(struct lnet_handle_eq *h)
{
h->cookie = LNET_WIRE_HANDLE_COOKIE_NONE;
}
/**
* Check whether eq handle @h is invalid.
*
* @return 1 if handle is invalid, 0 if valid.
*/
static inline int LNetEQHandleIsInvalid(struct lnet_handle_eq h)
{
return (LNET_WIRE_HANDLE_COOKIE_NONE == h.cookie);
}
struct lnet_handle_md {
u64 cookie;
};
/**
* Invalidate md handle @h.
*/
static inline void LNetInvalidateMDHandle(struct lnet_handle_md *h)
{
h->cookie = LNET_WIRE_HANDLE_COOKIE_NONE;
}
/**
* Check whether eq handle @h is invalid.
*
* @return 1 if handle is invalid, 0 if valid.
*/
static inline int LNetMDHandleIsInvalid(struct lnet_handle_md h)
{
return (LNET_WIRE_HANDLE_COOKIE_NONE == h.cookie);
}
struct lnet_handle_me {
u64 cookie;
};
/**
* Global process ID.
*/
struct lnet_process_id {
/** node id */
lnet_nid_t nid;
/** process id */
lnet_pid_t pid;
};
/** @} lnet_addr */
/** \addtogroup lnet_me
* @{
*/
/**
* Specifies whether the match entry or memory descriptor should be unlinked
* automatically (LNET_UNLINK) or not (LNET_RETAIN).
*/
enum lnet_unlink {
LNET_RETAIN = 0,
LNET_UNLINK
};
/**
* Values of the type lnet_ins_pos are used to control where a new match
* entry is inserted. The value LNET_INS_BEFORE is used to insert the new
* entry before the current entry or before the head of the list. The value
* LNET_INS_AFTER is used to insert the new entry after the current entry
* or after the last item in the list.
*/
enum lnet_ins_pos {
/** insert ME before current position or head of the list */
LNET_INS_BEFORE,
/** insert ME after current position or tail of the list */
LNET_INS_AFTER,
/** attach ME at tail of local CPU partition ME list */
LNET_INS_LOCAL
};
/** @} lnet_me */
/** \addtogroup lnet_md
* @{
*/
/**
* Defines the visible parts of a memory descriptor. Values of this type
* are used to initialize memory descriptors.
*/
struct lnet_md {
/**
* Specify the memory region associated with the memory descriptor.
* If the options field has:
* - LNET_MD_KIOV bit set: The start field points to the starting
* address of an array of struct bio_vec and the length field specifies
* the number of entries in the array. The length can't be bigger
* than LNET_MAX_IOV. The struct bio_vec is used to describe page-based
* fragments that are not necessarily mapped in virtual memory.
* - LNET_MD_IOVEC bit set: The start field points to the starting
* address of an array of struct iovec and the length field specifies
* the number of entries in the array. The length can't be bigger
* than LNET_MAX_IOV. The struct iovec is used to describe fragments
* that have virtual addresses.
* - Otherwise: The memory region is contiguous. The start field
* specifies the starting address for the memory region and the
* length field specifies its length.
*
* When the memory region is fragmented, all fragments but the first
* one must start on page boundary, and all but the last must end on
* page boundary.
*/
void *start;
unsigned int length;
/**
* Specifies the maximum number of operations that can be performed
* on the memory descriptor. An operation is any action that could
* possibly generate an event. In the usual case, the threshold value
* is decremented for each operation on the MD. When the threshold
* drops to zero, the MD becomes inactive and does not respond to
* operations. A threshold value of LNET_MD_THRESH_INF indicates that
* there is no bound on the number of operations that may be applied
* to a MD.
*/
int threshold;
/**
* Specifies the largest incoming request that the memory descriptor
* should respond to. When the unused portion of a MD (length -
* local offset) falls below this value, the MD becomes inactive and
* does not respond to further operations. This value is only used
* if the LNET_MD_MAX_SIZE option is set.
*/
int max_size;
/**
* Specifies the behavior of the memory descriptor. A bitwise OR
* of the following values can be used:
* - LNET_MD_OP_PUT: The LNet PUT operation is allowed on this MD.
* - LNET_MD_OP_GET: The LNet GET operation is allowed on this MD.
* - LNET_MD_MANAGE_REMOTE: The offset used in accessing the memory
* region is provided by the incoming request. By default, the
* offset is maintained locally. When maintained locally, the
* offset is incremented by the length of the request so that
* the next operation (PUT or GET) will access the next part of
* the memory region. Note that only one offset variable exists
* per memory descriptor. If both PUT and GET operations are
* performed on a memory descriptor, the offset is updated each time.
* - LNET_MD_TRUNCATE: The length provided in the incoming request can
* be reduced to match the memory available in the region (determined
* by subtracting the offset from the length of the memory region).
* By default, if the length in the incoming operation is greater
* than the amount of memory available, the operation is rejected.
* - LNET_MD_ACK_DISABLE: An acknowledgment should not be sent for
* incoming PUT operations, even if requested. By default,
* acknowledgments are sent for PUT operations that request an
* acknowledgment. Acknowledgments are never sent for GET operations.
* The data sent in the REPLY serves as an implicit acknowledgment.
* - LNET_MD_KIOV: The start and length fields specify an array of
* struct bio_vec.
* - LNET_MD_IOVEC: The start and length fields specify an array of
* struct iovec.
* - LNET_MD_MAX_SIZE: The max_size field is valid.
*
* Note:
* - LNET_MD_KIOV or LNET_MD_IOVEC allows for a scatter/gather
* capability for memory descriptors. They can't be both set.
* - When LNET_MD_MAX_SIZE is set, the total length of the memory
* region (i.e. sum of all fragment lengths) must not be less than
* \a max_size.
*/
unsigned int options;
/**
* A user-specified value that is associated with the memory
* descriptor. The value does not need to be a pointer, but must fit
* in the space used by a pointer. This value is recorded in events
* associated with operations on this MD.
*/
void *user_ptr;
/**
* A handle for the event queue used to log the operations performed on
* the memory region. If this argument is a NULL handle (i.e. nullified
* by LNetInvalidateHandle()), operations performed on this memory
* descriptor are not logged.
*/
struct lnet_handle_eq eq_handle;
};
/*
* Max Transfer Unit (minimum supported everywhere).
* CAVEAT EMPTOR, with multinet (i.e. routers forwarding between networks)
* these limits are system wide and not interface-local.
*/
#define LNET_MTU_BITS 20
#define LNET_MTU (1 << LNET_MTU_BITS)
/** limit on the number of fragments in discontiguous MDs */
#define LNET_MAX_IOV 256
/**
* Options for the MD structure. See lnet_md::options.
*/
#define LNET_MD_OP_PUT (1 << 0)
/** See lnet_md::options. */
#define LNET_MD_OP_GET (1 << 1)
/** See lnet_md::options. */
#define LNET_MD_MANAGE_REMOTE (1 << 2)
/* unused (1 << 3) */
/** See lnet_md::options. */
#define LNET_MD_TRUNCATE (1 << 4)
/** See lnet_md::options. */
#define LNET_MD_ACK_DISABLE (1 << 5)
/** See lnet_md::options. */
#define LNET_MD_IOVEC (1 << 6)
/** See lnet_md::options. */
#define LNET_MD_MAX_SIZE (1 << 7)
/** See lnet_md::options. */
#define LNET_MD_KIOV (1 << 8)
/* For compatibility with Cray Portals */
#define LNET_MD_PHYS 0
/** Infinite threshold on MD operations. See lnet_md::threshold */
#define LNET_MD_THRESH_INF (-1)
/** @} lnet_md */
/** \addtogroup lnet_eq
* @{
*/
/**
* Six types of events can be logged in an event queue.
*/
enum lnet_event_kind {
/** An incoming GET operation has completed on the MD. */
LNET_EVENT_GET = 1,
/**
* An incoming PUT operation has completed on the MD. The
* underlying layers will not alter the memory (on behalf of this
* operation) once this event has been logged.
*/
LNET_EVENT_PUT,
/**
* A REPLY operation has completed. This event is logged after the
* data (if any) from the REPLY has been written into the MD.
*/
LNET_EVENT_REPLY,
/** An acknowledgment has been received. */
LNET_EVENT_ACK,
/**
* An outgoing send (PUT or GET) operation has completed. This event
* is logged after the entire buffer has been sent and it is safe for
* the caller to reuse the buffer.
*
* Note:
* - The LNET_EVENT_SEND doesn't guarantee message delivery. It can
* happen even when the message has not yet been put out on wire.
* - It's unsafe to assume that in an outgoing GET operation
* the LNET_EVENT_SEND event would happen before the
* LNET_EVENT_REPLY event. The same holds for LNET_EVENT_SEND and
* LNET_EVENT_ACK events in an outgoing PUT operation.
*/
LNET_EVENT_SEND,
/**
* A MD has been unlinked. Note that LNetMDUnlink() does not
* necessarily trigger an LNET_EVENT_UNLINK event.
* \see LNetMDUnlink
*/
LNET_EVENT_UNLINK,
};
#define LNET_SEQ_GT(a, b) (((signed long)((a) - (b))) > 0)
/**
* Information about an event on a MD.
*/
struct lnet_event {
/** The identifier (nid, pid) of the target. */
struct lnet_process_id target;
/** The identifier (nid, pid) of the initiator. */
struct lnet_process_id initiator;
/**
* The NID of the immediate sender. If the request has been forwarded
* by routers, this is the NID of the last hop; otherwise it's the
* same as the initiator.
*/
lnet_nid_t sender;
/** Indicates the type of the event. */
enum lnet_event_kind type;
/** The portal table index specified in the request */
unsigned int pt_index;
/** A copy of the match bits specified in the request. */
__u64 match_bits;
/** The length (in bytes) specified in the request. */
unsigned int rlength;
/**
* The length (in bytes) of the data that was manipulated by the
* operation. For truncated operations, the manipulated length will be
* the number of bytes specified by the MD (possibly with an offset,
* see lnet_md). For all other operations, the manipulated length
* will be the length of the requested operation, i.e. rlength.
*/
unsigned int mlength;
/**
* The handle to the MD associated with the event. The handle may be
* invalid if the MD has been unlinked.
*/
struct lnet_handle_md md_handle;
/**
* A snapshot of the state of the MD immediately after the event has
* been processed. In particular, the threshold field in md will
* reflect the value of the threshold after the operation occurred.
*/
struct lnet_md md;
/**
* 64 bits of out-of-band user data. Only valid for LNET_EVENT_PUT.
* \see LNetPut
*/
__u64 hdr_data;
/**
* Indicates the completion status of the operation. It's 0 for
* successful operations, otherwise it's an error code.
*/
int status;
/**
* Indicates whether the MD has been unlinked. Note that:
* - An event with unlinked set is the last event on the MD.
* - This field is also set for an explicit LNET_EVENT_UNLINK event.
* \see LNetMDUnlink
*/
int unlinked;
/**
* The displacement (in bytes) into the memory region that the
* operation used. The offset can be determined by the operation for
* a remote managed MD or by the local MD.
* \see lnet_md::options
*/
unsigned int offset;
/**
* The sequence number for this event. Sequence numbers are unique
* to each event.
*/
volatile unsigned long sequence;
};
/**
* Event queue handler function type.
*
* The EQ handler runs for each event that is deposited into the EQ. The
* handler is supplied with a pointer to the event that triggered the
* handler invocation.
*
* The handler must not block, must be reentrant, and must not call any LNet
* API functions. It should return as quickly as possible.
*/
typedef void (*lnet_eq_handler_t)(struct lnet_event *event);
#define LNET_EQ_HANDLER_NONE NULL
/** @} lnet_eq */
/** \addtogroup lnet_data
* @{
*/
/**
* Specify whether an acknowledgment should be sent by target when the PUT
* operation completes (i.e., when the data has been written to a MD of the
* target process).
*
* \see lnet_md::options for the discussion on LNET_MD_ACK_DISABLE by which
* acknowledgments can be disabled for a MD.
*/
enum lnet_ack_req {
/** Request an acknowledgment */
LNET_ACK_REQ,
/** Request that no acknowledgment should be generated. */
LNET_NOACK_REQ
};
/** @} lnet_data */
/** @} lnet */
#endif

View File

@ -1,123 +0,0 @@
/*
* This file is part of Portals, http://www.sf.net/projects/lustre/
*
* Portals is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*
* Portals is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* header for lnet ioctl
*/
#ifndef _LNETCTL_H_
#define _LNETCTL_H_
#include <uapi/linux/lnet/lnet-types.h>
/** \addtogroup lnet_fault_simulation
* @{
*/
enum {
LNET_CTL_DROP_ADD,
LNET_CTL_DROP_DEL,
LNET_CTL_DROP_RESET,
LNET_CTL_DROP_LIST,
LNET_CTL_DELAY_ADD,
LNET_CTL_DELAY_DEL,
LNET_CTL_DELAY_RESET,
LNET_CTL_DELAY_LIST,
};
#define LNET_ACK_BIT (1 << 0)
#define LNET_PUT_BIT (1 << 1)
#define LNET_GET_BIT (1 << 2)
#define LNET_REPLY_BIT (1 << 3)
/** ioctl parameter for LNet fault simulation */
struct lnet_fault_attr {
/**
* source NID of drop rule
* LNET_NID_ANY is wildcard for all sources
* 255.255.255.255@net is wildcard for all addresses from @net
*/
lnet_nid_t fa_src;
/** destination NID of drop rule, see \a dr_src for details */
lnet_nid_t fa_dst;
/**
* Portal mask to drop, -1 means all portals, for example:
* fa_ptl_mask = (1 << _LDLM_CB_REQUEST_PORTAL ) |
* (1 << LDLM_CANCEL_REQUEST_PORTAL)
*
* If it is non-zero then only PUT and GET will be filtered, otherwise
* there is no portal filter, all matched messages will be checked.
*/
__u64 fa_ptl_mask;
/**
* message types to drop, for example:
* dra_type = LNET_DROP_ACK_BIT | LNET_DROP_PUT_BIT
*
* If it is non-zero then only specified message types are filtered,
* otherwise all message types will be checked.
*/
__u32 fa_msg_mask;
union {
/** message drop simulation */
struct {
/** drop rate of this rule */
__u32 da_rate;
/**
* time interval of message drop, it is exclusive
* with da_rate
*/
__u32 da_interval;
} drop;
/** message latency simulation */
struct {
__u32 la_rate;
/**
* time interval of message delay, it is exclusive
* with la_rate
*/
__u32 la_interval;
/** latency to delay */
__u32 la_latency;
} delay;
__u64 space[8];
} u;
};
/** fault simluation stats */
struct lnet_fault_stat {
/** total # matched messages */
__u64 fs_count;
/** # dropped LNET_MSG_PUT by this rule */
__u64 fs_put;
/** # dropped LNET_MSG_ACK by this rule */
__u64 fs_ack;
/** # dropped LNET_MSG_GET by this rule */
__u64 fs_get;
/** # dropped LNET_MSG_REPLY by this rule */
__u64 fs_reply;
union {
struct {
/** total # dropped messages */
__u64 ds_dropped;
} drop;
struct {
/** total # delayed messages */
__u64 ls_delayed;
} delay;
__u64 space[8];
} u;
};
/** @} lnet_fault_simulation */
#define LNET_DEV_ID 0
#define LNET_DEV_PATH "/dev/lnet"
#endif

View File

@ -1,556 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011 - 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Seagate, Inc.
*
* lnet/include/lnet/lnetst.h
*
* Author: Liang Zhen <liang.zhen@intel.com>
*/
#ifndef __LNET_ST_H__
#define __LNET_ST_H__
#include <linux/types.h>
#define LST_FEAT_NONE (0)
#define LST_FEAT_BULK_LEN (1 << 0) /* enable variable page size */
#define LST_FEATS_EMPTY (LST_FEAT_NONE)
#define LST_FEATS_MASK (LST_FEAT_NONE | LST_FEAT_BULK_LEN)
#define LST_NAME_SIZE 32 /* max name buffer length */
#define LSTIO_DEBUG 0xC00 /* debug */
#define LSTIO_SESSION_NEW 0xC01 /* create session */
#define LSTIO_SESSION_END 0xC02 /* end session */
#define LSTIO_SESSION_INFO 0xC03 /* query session */
#define LSTIO_GROUP_ADD 0xC10 /* add group */
#define LSTIO_GROUP_LIST 0xC11 /* list all groups in session */
#define LSTIO_GROUP_INFO 0xC12 /* query default information of
* specified group
*/
#define LSTIO_GROUP_DEL 0xC13 /* delete group */
#define LSTIO_NODES_ADD 0xC14 /* add nodes to specified group */
#define LSTIO_GROUP_UPDATE 0xC15 /* update group */
#define LSTIO_BATCH_ADD 0xC20 /* add batch */
#define LSTIO_BATCH_START 0xC21 /* start batch */
#define LSTIO_BATCH_STOP 0xC22 /* stop batch */
#define LSTIO_BATCH_DEL 0xC23 /* delete batch */
#define LSTIO_BATCH_LIST 0xC24 /* show all batches in the session */
#define LSTIO_BATCH_INFO 0xC25 /* show defail of specified batch */
#define LSTIO_TEST_ADD 0xC26 /* add test (to batch) */
#define LSTIO_BATCH_QUERY 0xC27 /* query batch status */
#define LSTIO_STAT_QUERY 0xC30 /* get stats */
struct lst_sid {
lnet_nid_t ses_nid; /* nid of console node */
__u64 ses_stamp; /* time stamp */
}; /*** session id */
extern struct lst_sid LST_INVALID_SID;
struct lst_bid {
__u64 bat_id; /* unique id in session */
}; /*** batch id (group of tests) */
/* Status of test node */
#define LST_NODE_ACTIVE 0x1 /* node in this session */
#define LST_NODE_BUSY 0x2 /* node is taken by other session */
#define LST_NODE_DOWN 0x4 /* node is down */
#define LST_NODE_UNKNOWN 0x8 /* node not in session */
struct lstcon_node_ent {
struct lnet_process_id nde_id; /* id of node */
int nde_state; /* state of node */
}; /*** node entry, for list_group command */
struct lstcon_ndlist_ent {
int nle_nnode; /* # of nodes */
int nle_nactive; /* # of active nodes */
int nle_nbusy; /* # of busy nodes */
int nle_ndown; /* # of down nodes */
int nle_nunknown; /* # of unknown nodes */
}; /*** node_list entry, for list_batch command */
struct lstcon_test_ent {
int tse_type; /* test type */
int tse_loop; /* loop count */
int tse_concur; /* concurrency of test */
}; /* test summary entry, for
* list_batch command
*/
struct lstcon_batch_ent {
int bae_state; /* batch status */
int bae_timeout; /* batch timeout */
int bae_ntest; /* # of tests in the batch */
}; /* batch summary entry, for
* list_batch command
*/
struct lstcon_test_batch_ent {
struct lstcon_ndlist_ent tbe_cli_nle; /* client (group) node_list
* entry
*/
struct lstcon_ndlist_ent tbe_srv_nle; /* server (group) node_list
* entry
*/
union {
struct lstcon_test_ent tbe_test; /* test entry */
struct lstcon_batch_ent tbe_batch;/* batch entry */
} u;
}; /* test/batch verbose information entry,
* for list_batch command
*/
struct lstcon_rpc_ent {
struct list_head rpe_link; /* link chain */
struct lnet_process_id rpe_peer; /* peer's id */
struct timeval rpe_stamp; /* time stamp of RPC */
int rpe_state; /* peer's state */
int rpe_rpc_errno; /* RPC errno */
struct lst_sid rpe_sid; /* peer's session id */
int rpe_fwk_errno; /* framework errno */
int rpe_priv[4]; /* private data */
char rpe_payload[0]; /* private reply payload */
};
struct lstcon_trans_stat {
int trs_rpc_stat[4]; /* RPCs stat (0: total 1: failed
* 2: finished
* 4: reserved
*/
int trs_rpc_errno; /* RPC errno */
int trs_fwk_stat[8]; /* framework stat */
int trs_fwk_errno; /* errno of the first remote error */
void *trs_fwk_private; /* private framework stat */
};
static inline int
lstcon_rpc_stat_total(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_rpc_stat[0] : stat->trs_rpc_stat[0];
}
static inline int
lstcon_rpc_stat_success(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_rpc_stat[1] : stat->trs_rpc_stat[1];
}
static inline int
lstcon_rpc_stat_failure(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_rpc_stat[2] : stat->trs_rpc_stat[2];
}
static inline int
lstcon_sesop_stat_success(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[0] : stat->trs_fwk_stat[0];
}
static inline int
lstcon_sesop_stat_failure(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[1] : stat->trs_fwk_stat[1];
}
static inline int
lstcon_sesqry_stat_active(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[0] : stat->trs_fwk_stat[0];
}
static inline int
lstcon_sesqry_stat_busy(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[1] : stat->trs_fwk_stat[1];
}
static inline int
lstcon_sesqry_stat_unknown(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[2] : stat->trs_fwk_stat[2];
}
static inline int
lstcon_tsbop_stat_success(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[0] : stat->trs_fwk_stat[0];
}
static inline int
lstcon_tsbop_stat_failure(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[1] : stat->trs_fwk_stat[1];
}
static inline int
lstcon_tsbqry_stat_idle(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[0] : stat->trs_fwk_stat[0];
}
static inline int
lstcon_tsbqry_stat_run(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[1] : stat->trs_fwk_stat[1];
}
static inline int
lstcon_tsbqry_stat_failure(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[2] : stat->trs_fwk_stat[2];
}
static inline int
lstcon_statqry_stat_success(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[0] : stat->trs_fwk_stat[0];
}
static inline int
lstcon_statqry_stat_failure(struct lstcon_trans_stat *stat, int inc)
{
return inc ? ++stat->trs_fwk_stat[1] : stat->trs_fwk_stat[1];
}
/* create a session */
struct lstio_session_new_args {
int lstio_ses_key; /* IN: local key */
int lstio_ses_timeout; /* IN: session timeout */
int lstio_ses_force; /* IN: force create ? */
/** IN: session features */
unsigned int lstio_ses_feats;
struct lst_sid __user *lstio_ses_idp; /* OUT: session id */
int lstio_ses_nmlen; /* IN: name length */
char __user *lstio_ses_namep; /* IN: session name */
};
/* query current session */
struct lstio_session_info_args {
struct lst_sid __user *lstio_ses_idp; /* OUT: session id */
int __user *lstio_ses_keyp; /* OUT: local key */
/** OUT: session features */
unsigned int __user *lstio_ses_featp;
struct lstcon_ndlist_ent __user *lstio_ses_ndinfo;/* OUT: */
int lstio_ses_nmlen; /* IN: name length */
char __user *lstio_ses_namep; /* OUT: session name */
};
/* delete a session */
struct lstio_session_end_args {
int lstio_ses_key; /* IN: session key */
};
#define LST_OPC_SESSION 1
#define LST_OPC_GROUP 2
#define LST_OPC_NODES 3
#define LST_OPC_BATCHCLI 4
#define LST_OPC_BATCHSRV 5
struct lstio_debug_args {
int lstio_dbg_key; /* IN: session key */
int lstio_dbg_type; /* IN: debug
* session|batch|
* group|nodes list
*/
int lstio_dbg_flags; /* IN: reserved debug
* flags
*/
int lstio_dbg_timeout; /* IN: timeout of
* debug
*/
int lstio_dbg_nmlen; /* IN: len of name */
char __user *lstio_dbg_namep; /* IN: name of
* group|batch
*/
int lstio_dbg_count; /* IN: # of test nodes
* to debug
*/
struct lnet_process_id __user *lstio_dbg_idsp; /* IN: id of test
* nodes
*/
struct list_head __user *lstio_dbg_resultp; /* OUT: list head of
* result buffer
*/
};
struct lstio_group_add_args {
int lstio_grp_key; /* IN: session key */
int lstio_grp_nmlen; /* IN: name length */
char __user *lstio_grp_namep; /* IN: group name */
};
struct lstio_group_del_args {
int lstio_grp_key; /* IN: session key */
int lstio_grp_nmlen; /* IN: name length */
char __user *lstio_grp_namep; /* IN: group name */
};
#define LST_GROUP_CLEAN 1 /* remove inactive nodes in the group */
#define LST_GROUP_REFRESH 2 /* refresh inactive nodes
* in the group
*/
#define LST_GROUP_RMND 3 /* delete nodes from the group */
struct lstio_group_update_args {
int lstio_grp_key; /* IN: session key */
int lstio_grp_opc; /* IN: OPC */
int lstio_grp_args; /* IN: arguments */
int lstio_grp_nmlen; /* IN: name length */
char __user *lstio_grp_namep; /* IN: group name */
int lstio_grp_count; /* IN: # of nodes id */
struct lnet_process_id __user *lstio_grp_idsp; /* IN: array of nodes */
struct list_head __user *lstio_grp_resultp; /* OUT: list head of
* result buffer
*/
};
struct lstio_group_nodes_args {
int lstio_grp_key; /* IN: session key */
int lstio_grp_nmlen; /* IN: name length */
char __user *lstio_grp_namep; /* IN: group name */
int lstio_grp_count; /* IN: # of nodes */
/** OUT: session features */
unsigned int __user *lstio_grp_featp;
struct lnet_process_id __user *lstio_grp_idsp; /* IN: nodes */
struct list_head __user *lstio_grp_resultp; /* OUT: list head of
* result buffer
*/
};
struct lstio_group_list_args {
int lstio_grp_key; /* IN: session key */
int lstio_grp_idx; /* IN: group idx */
int lstio_grp_nmlen; /* IN: name len */
char __user *lstio_grp_namep; /* OUT: name */
};
struct lstio_group_info_args {
int lstio_grp_key; /* IN: session key */
int lstio_grp_nmlen; /* IN: name len */
char __user *lstio_grp_namep; /* IN: name */
struct lstcon_ndlist_ent __user *lstio_grp_entp;/* OUT: description
* of group
*/
int __user *lstio_grp_idxp; /* IN/OUT: node index */
int __user *lstio_grp_ndentp; /* IN/OUT: # of nodent */
struct lstcon_node_ent __user *lstio_grp_dentsp;/* OUT: nodent array */
};
#define LST_DEFAULT_BATCH "batch" /* default batch name */
struct lstio_batch_add_args {
int lstio_bat_key; /* IN: session key */
int lstio_bat_nmlen; /* IN: name length */
char __user *lstio_bat_namep; /* IN: batch name */
};
struct lstio_batch_del_args {
int lstio_bat_key; /* IN: session key */
int lstio_bat_nmlen; /* IN: name length */
char __user *lstio_bat_namep; /* IN: batch name */
};
struct lstio_batch_run_args {
int lstio_bat_key; /* IN: session key */
int lstio_bat_timeout; /* IN: timeout for
* the batch
*/
int lstio_bat_nmlen; /* IN: name length */
char __user *lstio_bat_namep; /* IN: batch name */
struct list_head __user *lstio_bat_resultp; /* OUT: list head of
* result buffer
*/
};
struct lstio_batch_stop_args {
int lstio_bat_key; /* IN: session key */
int lstio_bat_force; /* IN: abort unfinished
* test RPC
*/
int lstio_bat_nmlen; /* IN: name length */
char __user *lstio_bat_namep; /* IN: batch name */
struct list_head __user *lstio_bat_resultp; /* OUT: list head of
* result buffer
*/
};
struct lstio_batch_query_args {
int lstio_bat_key; /* IN: session key */
int lstio_bat_testidx; /* IN: test index */
int lstio_bat_client; /* IN: we testing
* client?
*/
int lstio_bat_timeout; /* IN: timeout for
* waiting
*/
int lstio_bat_nmlen; /* IN: name length */
char __user *lstio_bat_namep; /* IN: batch name */
struct list_head __user *lstio_bat_resultp; /* OUT: list head of
* result buffer
*/
};
struct lstio_batch_list_args {
int lstio_bat_key; /* IN: session key */
int lstio_bat_idx; /* IN: index */
int lstio_bat_nmlen; /* IN: name length */
char __user *lstio_bat_namep; /* IN: batch name */
};
struct lstio_batch_info_args {
int lstio_bat_key; /* IN: session key */
int lstio_bat_nmlen; /* IN: name length */
char __user *lstio_bat_namep; /* IN: name */
int lstio_bat_server; /* IN: query server
* or not
*/
int lstio_bat_testidx; /* IN: test index */
struct lstcon_test_batch_ent __user *lstio_bat_entp;/* OUT: batch ent */
int __user *lstio_bat_idxp; /* IN/OUT: index of node */
int __user *lstio_bat_ndentp; /* IN/OUT: # of nodent */
struct lstcon_node_ent __user *lstio_bat_dentsp;/* array of nodent */
};
/* add stat in session */
struct lstio_stat_args {
int lstio_sta_key; /* IN: session key */
int lstio_sta_timeout; /* IN: timeout for
* stat request
*/
int lstio_sta_nmlen; /* IN: group name
* length
*/
char __user *lstio_sta_namep; /* IN: group name */
int lstio_sta_count; /* IN: # of pid */
struct lnet_process_id __user *lstio_sta_idsp; /* IN: pid */
struct list_head __user *lstio_sta_resultp; /* OUT: list head of
* result buffer
*/
};
enum lst_test_type {
LST_TEST_BULK = 1,
LST_TEST_PING = 2
};
/* create a test in a batch */
#define LST_MAX_CONCUR 1024 /* Max concurrency of test */
struct lstio_test_args {
int lstio_tes_key; /* IN: session key */
int lstio_tes_bat_nmlen; /* IN: batch name len */
char __user *lstio_tes_bat_name; /* IN: batch name */
int lstio_tes_type; /* IN: test type */
int lstio_tes_oneside; /* IN: one sided test */
int lstio_tes_loop; /* IN: loop count */
int lstio_tes_concur; /* IN: concurrency */
int lstio_tes_dist; /* IN: node distribution in
* destination groups
*/
int lstio_tes_span; /* IN: node span in
* destination groups
*/
int lstio_tes_sgrp_nmlen; /* IN: source group
* name length
*/
char __user *lstio_tes_sgrp_name; /* IN: group name */
int lstio_tes_dgrp_nmlen; /* IN: destination group
* name length
*/
char __user *lstio_tes_dgrp_name; /* IN: group name */
int lstio_tes_param_len; /* IN: param buffer len */
void __user *lstio_tes_param; /* IN: parameter for specified
* test: lstio_bulk_param_t,
* lstio_ping_param_t,
* ... more
*/
int __user *lstio_tes_retp; /* OUT: private returned
* value
*/
struct list_head __user *lstio_tes_resultp;/* OUT: list head of
* result buffer
*/
};
enum lst_brw_type {
LST_BRW_READ = 1,
LST_BRW_WRITE = 2
};
enum lst_brw_flags {
LST_BRW_CHECK_NONE = 1,
LST_BRW_CHECK_SIMPLE = 2,
LST_BRW_CHECK_FULL = 3
};
struct lst_test_bulk_param {
int blk_opc; /* bulk operation code */
int blk_size; /* size (bytes) */
int blk_time; /* time of running the test*/
int blk_flags; /* reserved flags */
int blk_cli_off; /* bulk offset on client */
int blk_srv_off; /* reserved: bulk offset on server */
};
struct lst_test_ping_param {
int png_size; /* size of ping message */
int png_time; /* time */
int png_loop; /* loop */
int png_flags; /* reserved flags */
};
struct srpc_counters {
__u32 errors;
__u32 rpcs_sent;
__u32 rpcs_rcvd;
__u32 rpcs_dropped;
__u32 rpcs_expired;
__u64 bulk_get;
__u64 bulk_put;
} WIRE_ATTR;
struct sfw_counters {
/** milliseconds since current session started */
__u32 running_ms;
__u32 active_batches;
__u32 zombie_sessions;
__u32 brw_errors;
__u32 ping_errors;
} WIRE_ATTR;
#endif

View File

@ -1,119 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2015, Intel Corporation.
*/
#ifndef _LNET_NIDSTRINGS_H
#define _LNET_NIDSTRINGS_H
#include <uapi/linux/lnet/lnet-types.h>
/**
* Lustre Network Driver types.
*/
enum {
/*
* Only add to these values (i.e. don't ever change or redefine them):
* network addresses depend on them...
*/
QSWLND = 1,
SOCKLND = 2,
GMLND = 3,
PTLLND = 4,
O2IBLND = 5,
CIBLND = 6,
OPENIBLND = 7,
IIBLND = 8,
LOLND = 9,
RALND = 10,
VIBLND = 11,
MXLND = 12,
GNILND = 13,
GNIIPLND = 14,
};
struct list_head;
#define LNET_NIDSTR_COUNT 1024 /* # of nidstrings */
#define LNET_NIDSTR_SIZE 32 /* size of each one (see below for usage) */
/* support decl needed by both kernel and user space */
char *libcfs_next_nidstring(void);
int libcfs_isknown_lnd(__u32 lnd);
char *libcfs_lnd2modname(__u32 lnd);
char *libcfs_lnd2str_r(__u32 lnd, char *buf, size_t buf_size);
static inline char *libcfs_lnd2str(__u32 lnd)
{
return libcfs_lnd2str_r(lnd, libcfs_next_nidstring(),
LNET_NIDSTR_SIZE);
}
int libcfs_str2lnd(const char *str);
char *libcfs_net2str_r(__u32 net, char *buf, size_t buf_size);
static inline char *libcfs_net2str(__u32 net)
{
return libcfs_net2str_r(net, libcfs_next_nidstring(),
LNET_NIDSTR_SIZE);
}
char *libcfs_nid2str_r(lnet_nid_t nid, char *buf, size_t buf_size);
static inline char *libcfs_nid2str(lnet_nid_t nid)
{
return libcfs_nid2str_r(nid, libcfs_next_nidstring(),
LNET_NIDSTR_SIZE);
}
__u32 libcfs_str2net(const char *str);
lnet_nid_t libcfs_str2nid(const char *str);
int libcfs_str2anynid(lnet_nid_t *nid, const char *str);
char *libcfs_id2str(struct lnet_process_id id);
void cfs_free_nidlist(struct list_head *list);
int cfs_parse_nidlist(char *str, int len, struct list_head *list);
int cfs_print_nidlist(char *buffer, int count, struct list_head *list);
int cfs_match_nid(lnet_nid_t nid, struct list_head *list);
int cfs_ip_addr_parse(char *str, int len, struct list_head *list);
int cfs_ip_addr_match(__u32 addr, struct list_head *list);
bool cfs_nidrange_is_contiguous(struct list_head *nidlist);
void cfs_nidrange_find_min_max(struct list_head *nidlist, char *min_nid,
char *max_nid, size_t nidstr_length);
struct netstrfns {
__u32 nf_type;
char *nf_name;
char *nf_modname;
void (*nf_addr2str)(__u32 addr, char *str, size_t size);
int (*nf_str2addr)(const char *str, int nob, __u32 *addr);
int (*nf_parse_addrlist)(char *str, int len,
struct list_head *list);
int (*nf_print_addrlist)(char *buffer, int count,
struct list_head *list);
int (*nf_match_addr)(__u32 addr, struct list_head *list);
bool (*nf_is_contiguous)(struct list_head *nidlist);
void (*nf_min_max)(struct list_head *nidlist, __u32 *min_nid,
__u32 *max_nid);
};
#endif /* _LNET_NIDSTRINGS_H */

View File

@ -1,44 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* #defines shared between socknal implementation and utilities
*/
#ifndef __UAPI_LNET_SOCKLND_H__
#define __UAPI_LNET_SOCKLND_H__
#define SOCKLND_CONN_NONE (-1)
#define SOCKLND_CONN_ANY 0
#define SOCKLND_CONN_CONTROL 1
#define SOCKLND_CONN_BULK_IN 2
#define SOCKLND_CONN_BULK_OUT 3
#define SOCKLND_CONN_NTYPES 4
#define SOCKLND_CONN_ACK SOCKLND_CONN_BULK_IN
#endif

View File

@ -1,261 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#ifndef _UAPI_LUSTRE_CFG_H_
#define _UAPI_LUSTRE_CFG_H_
#include <linux/errno.h>
#include <linux/kernel.h>
#include <uapi/linux/lustre/lustre_user.h>
/** \defgroup cfg cfg
*
* @{
*/
/*
* 1cf6
* lcfG
*/
#define LUSTRE_CFG_VERSION 0x1cf60001
#define LUSTRE_CFG_MAX_BUFCOUNT 8
#define LCFG_HDR_SIZE(count) \
__ALIGN_KERNEL(offsetof(struct lustre_cfg, lcfg_buflens[(count)]), 8)
/** If the LCFG_REQUIRED bit is set in a configuration command,
* then the client is required to understand this parameter
* in order to mount the filesystem. If it does not understand
* a REQUIRED command the client mount will fail.
*/
#define LCFG_REQUIRED 0x0001000
enum lcfg_command_type {
LCFG_ATTACH = 0x00cf001, /**< create a new obd instance */
LCFG_DETACH = 0x00cf002, /**< destroy obd instance */
LCFG_SETUP = 0x00cf003, /**< call type-specific setup */
LCFG_CLEANUP = 0x00cf004, /**< call type-specific cleanup
*/
LCFG_ADD_UUID = 0x00cf005, /**< add a nid to a niduuid */
LCFG_DEL_UUID = 0x00cf006, /**< remove a nid from
* a niduuid
*/
LCFG_MOUNTOPT = 0x00cf007, /**< create a profile
* (mdc, osc)
*/
LCFG_DEL_MOUNTOPT = 0x00cf008, /**< destroy a profile */
LCFG_SET_TIMEOUT = 0x00cf009, /**< set obd_timeout */
LCFG_SET_UPCALL = 0x00cf00a, /**< deprecated */
LCFG_ADD_CONN = 0x00cf00b, /**< add a failover niduuid to
* an obd
*/
LCFG_DEL_CONN = 0x00cf00c, /**< remove a failover niduuid */
LCFG_LOV_ADD_OBD = 0x00cf00d, /**< add an osc to a lov */
LCFG_LOV_DEL_OBD = 0x00cf00e, /**< remove an osc from a lov */
LCFG_PARAM = 0x00cf00f, /**< set a proc parameter */
LCFG_MARKER = 0x00cf010, /**< metadata about next
* cfg rec
*/
LCFG_LOG_START = 0x00ce011, /**< mgc only, process a
* cfg log
*/
LCFG_LOG_END = 0x00ce012, /**< stop processing updates */
LCFG_LOV_ADD_INA = 0x00ce013, /**< like LOV_ADD_OBD,
* inactive
*/
LCFG_ADD_MDC = 0x00cf014, /**< add an mdc to a lmv */
LCFG_DEL_MDC = 0x00cf015, /**< remove an mdc from a lmv */
LCFG_SPTLRPC_CONF = 0x00ce016, /**< security */
LCFG_POOL_NEW = 0x00ce020, /**< create an ost pool name */
LCFG_POOL_ADD = 0x00ce021, /**< add an ost to a pool */
LCFG_POOL_REM = 0x00ce022, /**< remove an ost from a pool */
LCFG_POOL_DEL = 0x00ce023, /**< destroy an ost pool name */
LCFG_SET_LDLM_TIMEOUT = 0x00ce030, /**< set ldlm_timeout */
LCFG_PRE_CLEANUP = 0x00cf031, /**< call type-specific pre
* cleanup cleanup
*/
LCFG_SET_PARAM = 0x00ce032, /**< use set_param syntax to set
* a proc parameters
*/
};
struct lustre_cfg_bufs {
void *lcfg_buf[LUSTRE_CFG_MAX_BUFCOUNT];
__u32 lcfg_buflen[LUSTRE_CFG_MAX_BUFCOUNT];
__u32 lcfg_bufcount;
};
struct lustre_cfg {
__u32 lcfg_version;
__u32 lcfg_command;
__u32 lcfg_num;
__u32 lcfg_flags;
__u64 lcfg_nid;
__u32 lcfg_nal; /* not used any more */
__u32 lcfg_bufcount;
__u32 lcfg_buflens[0];
};
enum cfg_record_type {
PORTALS_CFG_TYPE = 1,
LUSTRE_CFG_TYPE = 123,
};
#define LUSTRE_CFG_BUFLEN(lcfg, idx) \
((lcfg)->lcfg_bufcount <= (idx) ? 0 : (lcfg)->lcfg_buflens[(idx)])
static inline void lustre_cfg_bufs_set(struct lustre_cfg_bufs *bufs,
__u32 index, void *buf, __u32 buflen)
{
if (index >= LUSTRE_CFG_MAX_BUFCOUNT)
return;
if (!bufs)
return;
if (bufs->lcfg_bufcount <= index)
bufs->lcfg_bufcount = index + 1;
bufs->lcfg_buf[index] = buf;
bufs->lcfg_buflen[index] = buflen;
}
static inline void lustre_cfg_bufs_set_string(struct lustre_cfg_bufs *bufs,
__u32 index, char *str)
{
lustre_cfg_bufs_set(bufs, index, str, str ? strlen(str) + 1 : 0);
}
static inline void lustre_cfg_bufs_reset(struct lustre_cfg_bufs *bufs,
char *name)
{
memset((bufs), 0, sizeof(*bufs));
if (name)
lustre_cfg_bufs_set_string(bufs, 0, name);
}
static inline void *lustre_cfg_buf(struct lustre_cfg *lcfg, __u32 index)
{
__u32 i;
size_t offset;
__u32 bufcount;
if (!lcfg)
return NULL;
bufcount = lcfg->lcfg_bufcount;
if (index >= bufcount)
return NULL;
offset = LCFG_HDR_SIZE(lcfg->lcfg_bufcount);
for (i = 0; i < index; i++)
offset += __ALIGN_KERNEL(lcfg->lcfg_buflens[i], 8);
return (char *)lcfg + offset;
}
static inline void lustre_cfg_bufs_init(struct lustre_cfg_bufs *bufs,
struct lustre_cfg *lcfg)
{
__u32 i;
bufs->lcfg_bufcount = lcfg->lcfg_bufcount;
for (i = 0; i < bufs->lcfg_bufcount; i++) {
bufs->lcfg_buflen[i] = lcfg->lcfg_buflens[i];
bufs->lcfg_buf[i] = lustre_cfg_buf(lcfg, i);
}
}
static inline __u32 lustre_cfg_len(__u32 bufcount, __u32 *buflens)
{
__u32 i;
__u32 len;
len = LCFG_HDR_SIZE(bufcount);
for (i = 0; i < bufcount; i++)
len += __ALIGN_KERNEL(buflens[i], 8);
return __ALIGN_KERNEL(len, 8);
}
static inline void lustre_cfg_init(struct lustre_cfg *lcfg, int cmd,
struct lustre_cfg_bufs *bufs)
{
char *ptr;
__u32 i;
lcfg->lcfg_version = LUSTRE_CFG_VERSION;
lcfg->lcfg_command = cmd;
lcfg->lcfg_bufcount = bufs->lcfg_bufcount;
ptr = (char *)lcfg + LCFG_HDR_SIZE(lcfg->lcfg_bufcount);
for (i = 0; i < lcfg->lcfg_bufcount; i++) {
lcfg->lcfg_buflens[i] = bufs->lcfg_buflen[i];
if (bufs->lcfg_buf[i]) {
memcpy(ptr, bufs->lcfg_buf[i], bufs->lcfg_buflen[i]);
ptr += __ALIGN_KERNEL(bufs->lcfg_buflen[i], 8);
}
}
}
static inline int lustre_cfg_sanity_check(void *buf, size_t len)
{
struct lustre_cfg *lcfg = (struct lustre_cfg *)buf;
if (!lcfg)
return -EINVAL;
/* check that the first bits of the struct are valid */
if (len < LCFG_HDR_SIZE(0))
return -EINVAL;
if (lcfg->lcfg_version != LUSTRE_CFG_VERSION)
return -EINVAL;
if (lcfg->lcfg_bufcount >= LUSTRE_CFG_MAX_BUFCOUNT)
return -EINVAL;
/* check that the buflens are valid */
if (len < LCFG_HDR_SIZE(lcfg->lcfg_bufcount))
return -EINVAL;
/* make sure all the pointers point inside the data */
if (len < lustre_cfg_len(lcfg->lcfg_bufcount, lcfg->lcfg_buflens))
return -EINVAL;
return 0;
}
/** @} cfg */
#endif /* _UAPI_LUSTRE_CFG_H_ */

View File

@ -1,293 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2014, Intel Corporation.
*
* Copyright 2016 Cray Inc, all rights reserved.
* Author: Ben Evans.
*
* all fid manipulation functions go here
*
* FIDS are globally unique within a Lustre filessytem, and are made up
* of three parts: sequence, Object ID, and version.
*
*/
#ifndef _UAPI_LUSTRE_FID_H_
#define _UAPI_LUSTRE_FID_H_
#include <uapi/linux/lustre/lustre_idl.h>
/** returns fid object sequence */
static inline __u64 fid_seq(const struct lu_fid *fid)
{
return fid->f_seq;
}
/** returns fid object id */
static inline __u32 fid_oid(const struct lu_fid *fid)
{
return fid->f_oid;
}
/** returns fid object version */
static inline __u32 fid_ver(const struct lu_fid *fid)
{
return fid->f_ver;
}
static inline void fid_zero(struct lu_fid *fid)
{
memset(fid, 0, sizeof(*fid));
}
static inline __u64 fid_ver_oid(const struct lu_fid *fid)
{
return (__u64)fid_ver(fid) << 32 | fid_oid(fid);
}
static inline bool fid_seq_is_mdt0(__u64 seq)
{
return seq == FID_SEQ_OST_MDT0;
}
static inline bool fid_seq_is_mdt(__u64 seq)
{
return seq == FID_SEQ_OST_MDT0 || seq >= FID_SEQ_NORMAL;
};
static inline bool fid_seq_is_echo(__u64 seq)
{
return seq == FID_SEQ_ECHO;
}
static inline bool fid_is_echo(const struct lu_fid *fid)
{
return fid_seq_is_echo(fid_seq(fid));
}
static inline bool fid_seq_is_llog(__u64 seq)
{
return seq == FID_SEQ_LLOG;
}
static inline bool fid_is_llog(const struct lu_fid *fid)
{
/* file with OID == 0 is not llog but contains last oid */
return fid_seq_is_llog(fid_seq(fid)) && fid_oid(fid) > 0;
}
static inline bool fid_seq_is_rsvd(__u64 seq)
{
return seq > FID_SEQ_OST_MDT0 && seq <= FID_SEQ_RSVD;
};
static inline bool fid_seq_is_special(__u64 seq)
{
return seq == FID_SEQ_SPECIAL;
};
static inline bool fid_seq_is_local_file(__u64 seq)
{
return seq == FID_SEQ_LOCAL_FILE ||
seq == FID_SEQ_LOCAL_NAME;
};
static inline bool fid_seq_is_root(__u64 seq)
{
return seq == FID_SEQ_ROOT;
}
static inline bool fid_seq_is_dot(__u64 seq)
{
return seq == FID_SEQ_DOT_LUSTRE;
}
static inline bool fid_seq_is_default(__u64 seq)
{
return seq == FID_SEQ_LOV_DEFAULT;
}
static inline bool fid_is_mdt0(const struct lu_fid *fid)
{
return fid_seq_is_mdt0(fid_seq(fid));
}
/**
* Check if a fid is igif or not.
* \param fid the fid to be tested.
* \return true if the fid is an igif; otherwise false.
*/
static inline bool fid_seq_is_igif(__u64 seq)
{
return seq >= FID_SEQ_IGIF && seq <= FID_SEQ_IGIF_MAX;
}
static inline bool fid_is_igif(const struct lu_fid *fid)
{
return fid_seq_is_igif(fid_seq(fid));
}
/**
* Check if a fid is idif or not.
* \param fid the fid to be tested.
* \return true if the fid is an idif; otherwise false.
*/
static inline bool fid_seq_is_idif(__u64 seq)
{
return seq >= FID_SEQ_IDIF && seq <= FID_SEQ_IDIF_MAX;
}
static inline bool fid_is_idif(const struct lu_fid *fid)
{
return fid_seq_is_idif(fid_seq(fid));
}
static inline bool fid_is_local_file(const struct lu_fid *fid)
{
return fid_seq_is_local_file(fid_seq(fid));
}
static inline bool fid_seq_is_norm(__u64 seq)
{
return (seq >= FID_SEQ_NORMAL);
}
static inline bool fid_is_norm(const struct lu_fid *fid)
{
return fid_seq_is_norm(fid_seq(fid));
}
/* convert an OST objid into an IDIF FID SEQ number */
static inline __u64 fid_idif_seq(__u64 id, __u32 ost_idx)
{
return FID_SEQ_IDIF | (ost_idx << 16) | ((id >> 32) & 0xffff);
}
/* convert a packed IDIF FID into an OST objid */
static inline __u64 fid_idif_id(__u64 seq, __u32 oid, __u32 ver)
{
return ((__u64)ver << 48) | ((seq & 0xffff) << 32) | oid;
}
static inline __u32 idif_ost_idx(__u64 seq)
{
return (seq >> 16) & 0xffff;
}
/* extract ost index from IDIF FID */
static inline __u32 fid_idif_ost_idx(const struct lu_fid *fid)
{
return idif_ost_idx(fid_seq(fid));
}
/**
* Get inode number from an igif.
* \param fid an igif to get inode number from.
* \return inode number for the igif.
*/
static inline ino_t lu_igif_ino(const struct lu_fid *fid)
{
return fid_seq(fid);
}
/**
* Get inode generation from an igif.
* \param fid an igif to get inode generation from.
* \return inode generation for the igif.
*/
static inline __u32 lu_igif_gen(const struct lu_fid *fid)
{
return fid_oid(fid);
}
/**
* Build igif from the inode number/generation.
*/
static inline void lu_igif_build(struct lu_fid *fid, __u32 ino, __u32 gen)
{
fid->f_seq = ino;
fid->f_oid = gen;
fid->f_ver = 0;
}
/*
* Fids are transmitted across network (in the sender byte-ordering),
* and stored on disk in big-endian order.
*/
static inline void fid_cpu_to_le(struct lu_fid *dst, const struct lu_fid *src)
{
dst->f_seq = __cpu_to_le64(fid_seq(src));
dst->f_oid = __cpu_to_le32(fid_oid(src));
dst->f_ver = __cpu_to_le32(fid_ver(src));
}
static inline void fid_le_to_cpu(struct lu_fid *dst, const struct lu_fid *src)
{
dst->f_seq = __le64_to_cpu(fid_seq(src));
dst->f_oid = __le32_to_cpu(fid_oid(src));
dst->f_ver = __le32_to_cpu(fid_ver(src));
}
static inline void fid_cpu_to_be(struct lu_fid *dst, const struct lu_fid *src)
{
dst->f_seq = __cpu_to_be64(fid_seq(src));
dst->f_oid = __cpu_to_be32(fid_oid(src));
dst->f_ver = __cpu_to_be32(fid_ver(src));
}
static inline void fid_be_to_cpu(struct lu_fid *dst, const struct lu_fid *src)
{
dst->f_seq = __be64_to_cpu(fid_seq(src));
dst->f_oid = __be32_to_cpu(fid_oid(src));
dst->f_ver = __be32_to_cpu(fid_ver(src));
}
static inline bool fid_is_sane(const struct lu_fid *fid)
{
return fid && ((fid_seq(fid) >= FID_SEQ_START && !fid_ver(fid)) ||
fid_is_igif(fid) || fid_is_idif(fid) ||
fid_seq_is_rsvd(fid_seq(fid)));
}
static inline bool lu_fid_eq(const struct lu_fid *f0, const struct lu_fid *f1)
{
return !memcmp(f0, f1, sizeof(*f0));
}
static inline int lu_fid_cmp(const struct lu_fid *f0,
const struct lu_fid *f1)
{
if (fid_seq(f0) != fid_seq(f1))
return fid_seq(f0) > fid_seq(f1) ? 1 : -1;
if (fid_oid(f0) != fid_oid(f1))
return fid_oid(f0) > fid_oid(f1) ? 1 : -1;
if (fid_ver(f0) != fid_ver(f1))
return fid_ver(f0) > fid_ver(f1) ? 1 : -1;
return 0;
}
#endif

View File

@ -1,72 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2014, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* FIEMAP data structures and flags. This header file will be used until
* fiemap.h is available in the upstream kernel.
*
* Author: Kalpak Shah <kalpak.shah@sun.com>
* Author: Andreas Dilger <adilger@sun.com>
*/
#ifndef _LUSTRE_FIEMAP_H
#define _LUSTRE_FIEMAP_H
#include <stddef.h>
#include <linux/fiemap.h>
/* XXX: We use fiemap_extent::fe_reserved[0] */
#define fe_device fe_reserved[0]
static inline size_t fiemap_count_to_size(size_t extent_count)
{
return sizeof(struct fiemap) + extent_count *
sizeof(struct fiemap_extent);
}
static inline unsigned int fiemap_size_to_count(size_t array_size)
{
return (array_size - sizeof(struct fiemap)) /
sizeof(struct fiemap_extent);
}
#define FIEMAP_FLAG_DEVICE_ORDER 0x40000000 /* return device ordered mapping */
#ifdef FIEMAP_FLAGS_COMPAT
#undef FIEMAP_FLAGS_COMPAT
#endif
/* Lustre specific flags - use a high bit, don't conflict with upstream flag */
#define FIEMAP_EXTENT_NO_DIRECT 0x40000000 /* Data mapping undefined */
#define FIEMAP_EXTENT_NET 0x80000000 /* Data stored remotely.
* Sets NO_DIRECT flag
*/
#endif /* _LUSTRE_FIEMAP_H */

File diff suppressed because it is too large Load Diff

View File

@ -1,229 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2015, Intel Corporation.
*/
#ifndef _UAPI_LUSTRE_IOCTL_H_
#define _UAPI_LUSTRE_IOCTL_H_
#include <linux/ioctl.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <uapi/linux/lustre/lustre_idl.h>
#if !defined(__KERNEL__) && !defined(LUSTRE_UTILS)
# error This file is for Lustre internal use only.
#endif
enum md_echo_cmd {
ECHO_MD_CREATE = 1, /* Open/Create file on MDT */
ECHO_MD_MKDIR = 2, /* Mkdir on MDT */
ECHO_MD_DESTROY = 3, /* Unlink file on MDT */
ECHO_MD_RMDIR = 4, /* Rmdir on MDT */
ECHO_MD_LOOKUP = 5, /* Lookup on MDT */
ECHO_MD_GETATTR = 6, /* Getattr on MDT */
ECHO_MD_SETATTR = 7, /* Setattr on MDT */
ECHO_MD_ALLOC_FID = 8, /* Get FIDs from MDT */
};
#define OBD_DEV_ID 1
#define OBD_DEV_NAME "obd"
#define OBD_DEV_PATH "/dev/" OBD_DEV_NAME
#define OBD_IOCTL_VERSION 0x00010004
#define OBD_DEV_BY_DEVNAME 0xffffd0de
struct obd_ioctl_data {
__u32 ioc_len;
__u32 ioc_version;
union {
__u64 ioc_cookie;
__u64 ioc_u64_1;
};
union {
__u32 ioc_conn1;
__u32 ioc_u32_1;
};
union {
__u32 ioc_conn2;
__u32 ioc_u32_2;
};
struct obdo ioc_obdo1;
struct obdo ioc_obdo2;
__u64 ioc_count;
__u64 ioc_offset;
__u32 ioc_dev;
__u32 ioc_command;
__u64 ioc_nid;
__u32 ioc_nal;
__u32 ioc_type;
/* buffers the kernel will treat as user pointers */
__u32 ioc_plen1;
char __user *ioc_pbuf1;
__u32 ioc_plen2;
char __user *ioc_pbuf2;
/* inline buffers for various arguments */
__u32 ioc_inllen1;
char *ioc_inlbuf1;
__u32 ioc_inllen2;
char *ioc_inlbuf2;
__u32 ioc_inllen3;
char *ioc_inlbuf3;
__u32 ioc_inllen4;
char *ioc_inlbuf4;
char ioc_bulk[0];
};
struct obd_ioctl_hdr {
__u32 ioc_len;
__u32 ioc_version;
};
static inline __u32 obd_ioctl_packlen(struct obd_ioctl_data *data)
{
__u32 len = __ALIGN_KERNEL(sizeof(*data), 8);
len += __ALIGN_KERNEL(data->ioc_inllen1, 8);
len += __ALIGN_KERNEL(data->ioc_inllen2, 8);
len += __ALIGN_KERNEL(data->ioc_inllen3, 8);
len += __ALIGN_KERNEL(data->ioc_inllen4, 8);
return len;
}
/*
* OBD_IOC_DATA_TYPE is only for compatibility reasons with older
* Linux Lustre user tools. New ioctls should NOT use this macro as
* the ioctl "size". Instead the ioctl should get a "size" argument
* which is the actual data type used by the ioctl, to ensure the
* ioctl interface is versioned correctly.
*/
#define OBD_IOC_DATA_TYPE long
/* IOC_LDLM_TEST _IOWR('f', 40, long) */
/* IOC_LDLM_DUMP _IOWR('f', 41, long) */
/* IOC_LDLM_REGRESS_START _IOWR('f', 42, long) */
/* IOC_LDLM_REGRESS_STOP _IOWR('f', 43, long) */
#define OBD_IOC_CREATE _IOWR('f', 101, OBD_IOC_DATA_TYPE)
#define OBD_IOC_DESTROY _IOW('f', 104, OBD_IOC_DATA_TYPE)
/* OBD_IOC_PREALLOCATE _IOWR('f', 105, OBD_IOC_DATA_TYPE) */
#define OBD_IOC_SETATTR _IOW('f', 107, OBD_IOC_DATA_TYPE)
#define OBD_IOC_GETATTR _IOWR('f', 108, OBD_IOC_DATA_TYPE)
#define OBD_IOC_READ _IOWR('f', 109, OBD_IOC_DATA_TYPE)
#define OBD_IOC_WRITE _IOWR('f', 110, OBD_IOC_DATA_TYPE)
#define OBD_IOC_STATFS _IOWR('f', 113, OBD_IOC_DATA_TYPE)
#define OBD_IOC_SYNC _IOW('f', 114, OBD_IOC_DATA_TYPE)
/* OBD_IOC_READ2 _IOWR('f', 115, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_FORMAT _IOWR('f', 116, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_PARTITION _IOWR('f', 117, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_COPY _IOWR('f', 120, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_MIGR _IOWR('f', 121, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_PUNCH _IOWR('f', 122, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_MODULE_DEBUG _IOWR('f', 124, OBD_IOC_DATA_TYPE) */
#define OBD_IOC_BRW_READ _IOWR('f', 125, OBD_IOC_DATA_TYPE)
#define OBD_IOC_BRW_WRITE _IOWR('f', 126, OBD_IOC_DATA_TYPE)
#define OBD_IOC_NAME2DEV _IOWR('f', 127, OBD_IOC_DATA_TYPE)
#define OBD_IOC_UUID2DEV _IOWR('f', 130, OBD_IOC_DATA_TYPE)
#define OBD_IOC_GETNAME _IOWR('f', 131, OBD_IOC_DATA_TYPE)
#define OBD_IOC_GETMDNAME _IOR('f', 131, char[MAX_OBD_NAME])
#define OBD_IOC_GETDTNAME OBD_IOC_GETNAME
#define OBD_IOC_LOV_GET_CONFIG _IOWR('f', 132, OBD_IOC_DATA_TYPE)
#define OBD_IOC_CLIENT_RECOVER _IOW('f', 133, OBD_IOC_DATA_TYPE)
#define OBD_IOC_PING_TARGET _IOW('f', 136, OBD_IOC_DATA_TYPE)
/* OBD_IOC_DEC_FS_USE_COUNT _IO('f', 139) */
#define OBD_IOC_NO_TRANSNO _IOW('f', 140, OBD_IOC_DATA_TYPE)
#define OBD_IOC_SET_READONLY _IOW('f', 141, OBD_IOC_DATA_TYPE)
#define OBD_IOC_ABORT_RECOVERY _IOR('f', 142, OBD_IOC_DATA_TYPE)
/* OBD_IOC_ROOT_SQUASH _IOWR('f', 143, OBD_IOC_DATA_TYPE) */
#define OBD_GET_VERSION _IOWR('f', 144, OBD_IOC_DATA_TYPE)
/* OBD_IOC_GSS_SUPPORT _IOWR('f', 145, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_CLOSE_UUID _IOWR('f', 147, OBD_IOC_DATA_TYPE) */
#define OBD_IOC_CHANGELOG_SEND _IOW('f', 148, OBD_IOC_DATA_TYPE)
#define OBD_IOC_GETDEVICE _IOWR('f', 149, OBD_IOC_DATA_TYPE)
#define OBD_IOC_FID2PATH _IOWR('f', 150, OBD_IOC_DATA_TYPE)
/* lustre/lustre_user.h 151-153 */
/* OBD_IOC_LOV_SETSTRIPE 154 LL_IOC_LOV_SETSTRIPE */
/* OBD_IOC_LOV_GETSTRIPE 155 LL_IOC_LOV_GETSTRIPE */
/* OBD_IOC_LOV_SETEA 156 LL_IOC_LOV_SETEA */
/* lustre/lustre_user.h 157-159 */
/* OBD_IOC_QUOTACHECK _IOW('f', 160, int) */
/* OBD_IOC_POLL_QUOTACHECK _IOR('f', 161, struct if_quotacheck *) */
#define OBD_IOC_QUOTACTL _IOWR('f', 162, struct if_quotactl)
/* lustre/lustre_user.h 163-176 */
#define OBD_IOC_CHANGELOG_REG _IOW('f', 177, struct obd_ioctl_data)
#define OBD_IOC_CHANGELOG_DEREG _IOW('f', 178, struct obd_ioctl_data)
#define OBD_IOC_CHANGELOG_CLEAR _IOW('f', 179, struct obd_ioctl_data)
/* OBD_IOC_RECORD _IOWR('f', 180, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_ENDRECORD _IOWR('f', 181, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_PARSE _IOWR('f', 182, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_DORECORD _IOWR('f', 183, OBD_IOC_DATA_TYPE) */
#define OBD_IOC_PROCESS_CFG _IOWR('f', 184, OBD_IOC_DATA_TYPE)
/* OBD_IOC_DUMP_LOG _IOWR('f', 185, OBD_IOC_DATA_TYPE) */
/* OBD_IOC_CLEAR_LOG _IOWR('f', 186, OBD_IOC_DATA_TYPE) */
#define OBD_IOC_PARAM _IOW('f', 187, OBD_IOC_DATA_TYPE)
#define OBD_IOC_POOL _IOWR('f', 188, OBD_IOC_DATA_TYPE)
#define OBD_IOC_REPLACE_NIDS _IOWR('f', 189, OBD_IOC_DATA_TYPE)
#define OBD_IOC_CATLOGLIST _IOWR('f', 190, OBD_IOC_DATA_TYPE)
#define OBD_IOC_LLOG_INFO _IOWR('f', 191, OBD_IOC_DATA_TYPE)
#define OBD_IOC_LLOG_PRINT _IOWR('f', 192, OBD_IOC_DATA_TYPE)
#define OBD_IOC_LLOG_CANCEL _IOWR('f', 193, OBD_IOC_DATA_TYPE)
#define OBD_IOC_LLOG_REMOVE _IOWR('f', 194, OBD_IOC_DATA_TYPE)
#define OBD_IOC_LLOG_CHECK _IOWR('f', 195, OBD_IOC_DATA_TYPE)
/* OBD_IOC_LLOG_CATINFO _IOWR('f', 196, OBD_IOC_DATA_TYPE) */
#define OBD_IOC_NODEMAP _IOWR('f', 197, OBD_IOC_DATA_TYPE)
/* ECHO_IOC_GET_STRIPE _IOWR('f', 200, OBD_IOC_DATA_TYPE) */
/* ECHO_IOC_SET_STRIPE _IOWR('f', 201, OBD_IOC_DATA_TYPE) */
/* ECHO_IOC_ENQUEUE _IOWR('f', 202, OBD_IOC_DATA_TYPE) */
/* ECHO_IOC_CANCEL _IOWR('f', 203, OBD_IOC_DATA_TYPE) */
#define OBD_IOC_GET_OBJ_VERSION _IOR('f', 210, OBD_IOC_DATA_TYPE)
/* lustre/lustre_user.h 212-217 */
#define OBD_IOC_GET_MNTOPT _IOW('f', 220, mntopt_t)
#define OBD_IOC_ECHO_MD _IOR('f', 221, struct obd_ioctl_data)
#define OBD_IOC_ECHO_ALLOC_SEQ _IOWR('f', 222, struct obd_ioctl_data)
#define OBD_IOC_START_LFSCK _IOWR('f', 230, OBD_IOC_DATA_TYPE)
#define OBD_IOC_STOP_LFSCK _IOW('f', 231, OBD_IOC_DATA_TYPE)
#define OBD_IOC_QUERY_LFSCK _IOR('f', 232, struct obd_ioctl_data)
/* lustre/lustre_user.h 240-249 */
/* LIBCFS_IOC_DEBUG_MASK 250 */
#define IOC_OSC_SET_ACTIVE _IOWR('h', 21, void *)
#endif /* _UAPI_LUSTRE_IOCTL_H_ */

View File

@ -1,94 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2009, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2013, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
*
* Author: Nathan Rutman <nathan.rutman@sun.com>
*
* Kernel <-> userspace communication routines.
* The definitions below are used in the kernel and userspace.
*/
#ifndef __UAPI_LUSTRE_KERNELCOMM_H__
#define __UAPI_LUSTRE_KERNELCOMM_H__
#include <linux/types.h>
/* KUC message header.
* All current and future KUC messages should use this header.
* To avoid having to include Lustre headers from libcfs, define this here.
*/
struct kuc_hdr {
__u16 kuc_magic;
/* Each new Lustre feature should use a different transport */
__u8 kuc_transport;
__u8 kuc_flags;
/* Message type or opcode, transport-specific */
__u16 kuc_msgtype;
/* Including header */
__u16 kuc_msglen;
} __aligned(sizeof(__u64));
#define KUC_CHANGELOG_MSG_MAXSIZE (sizeof(struct kuc_hdr) + CR_MAXSIZE)
#define KUC_MAGIC 0x191C /*Lustre9etLinC */
/* kuc_msgtype values are defined in each transport */
enum kuc_transport_type {
KUC_TRANSPORT_GENERIC = 1,
KUC_TRANSPORT_HSM = 2,
KUC_TRANSPORT_CHANGELOG = 3,
};
enum kuc_generic_message_type {
KUC_MSG_SHUTDOWN = 1,
};
/* KUC Broadcast Groups. This determines which userspace process hears which
* messages. Mutliple transports may be used within a group, or multiple
* groups may use the same transport. Broadcast
* groups need not be used if e.g. a UID is specified instead;
* use group 0 to signify unicast.
*/
#define KUC_GRP_HSM 0x02
#define KUC_GRP_MAX KUC_GRP_HSM
#define LK_FLG_STOP 0x01
#define LK_NOFD -1U
/* kernelcomm control structure, passed from userspace to kernel */
struct lustre_kernelcomm {
__u32 lk_wfd;
__u32 lk_rfd;
__u32 lk_uid;
__u32 lk_group;
__u32 lk_data;
__u32 lk_flags;
} __packed;
#endif /* __UAPI_LUSTRE_KERNELCOMM_H__ */

View File

@ -1,236 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2014, Intel Corporation.
*
* Copyright 2015 Cray Inc, all rights reserved.
* Author: Ben Evans.
*
* Define ost_id associated functions
*/
#ifndef _UAPI_LUSTRE_OSTID_H_
#define _UAPI_LUSTRE_OSTID_H_
#include <linux/errno.h>
#include <uapi/linux/lustre/lustre_fid.h>
static inline __u64 lmm_oi_id(const struct ost_id *oi)
{
return oi->oi.oi_id;
}
static inline __u64 lmm_oi_seq(const struct ost_id *oi)
{
return oi->oi.oi_seq;
}
static inline void lmm_oi_set_seq(struct ost_id *oi, __u64 seq)
{
oi->oi.oi_seq = seq;
}
static inline void lmm_oi_set_id(struct ost_id *oi, __u64 oid)
{
oi->oi.oi_id = oid;
}
static inline void lmm_oi_le_to_cpu(struct ost_id *dst_oi,
const struct ost_id *src_oi)
{
dst_oi->oi.oi_id = __le64_to_cpu(src_oi->oi.oi_id);
dst_oi->oi.oi_seq = __le64_to_cpu(src_oi->oi.oi_seq);
}
static inline void lmm_oi_cpu_to_le(struct ost_id *dst_oi,
const struct ost_id *src_oi)
{
dst_oi->oi.oi_id = __cpu_to_le64(src_oi->oi.oi_id);
dst_oi->oi.oi_seq = __cpu_to_le64(src_oi->oi.oi_seq);
}
/* extract OST sequence (group) from a wire ost_id (id/seq) pair */
static inline __u64 ostid_seq(const struct ost_id *ostid)
{
if (fid_seq_is_mdt0(ostid->oi.oi_seq))
return FID_SEQ_OST_MDT0;
if (fid_seq_is_default(ostid->oi.oi_seq))
return FID_SEQ_LOV_DEFAULT;
if (fid_is_idif(&ostid->oi_fid))
return FID_SEQ_OST_MDT0;
return fid_seq(&ostid->oi_fid);
}
/* extract OST objid from a wire ost_id (id/seq) pair */
static inline __u64 ostid_id(const struct ost_id *ostid)
{
if (fid_seq_is_mdt0(ostid->oi.oi_seq))
return ostid->oi.oi_id & IDIF_OID_MASK;
if (fid_seq_is_default(ostid->oi.oi_seq))
return ostid->oi.oi_id;
if (fid_is_idif(&ostid->oi_fid))
return fid_idif_id(fid_seq(&ostid->oi_fid),
fid_oid(&ostid->oi_fid), 0);
return fid_oid(&ostid->oi_fid);
}
static inline void ostid_set_seq(struct ost_id *oi, __u64 seq)
{
if (fid_seq_is_mdt0(seq) || fid_seq_is_default(seq)) {
oi->oi.oi_seq = seq;
} else {
oi->oi_fid.f_seq = seq;
/*
* Note: if f_oid + f_ver is zero, we need init it
* to be 1, otherwise, ostid_seq will treat this
* as old ostid (oi_seq == 0)
*/
if (!oi->oi_fid.f_oid && !oi->oi_fid.f_ver)
oi->oi_fid.f_oid = LUSTRE_FID_INIT_OID;
}
}
static inline void ostid_set_seq_mdt0(struct ost_id *oi)
{
ostid_set_seq(oi, FID_SEQ_OST_MDT0);
}
static inline void ostid_set_seq_echo(struct ost_id *oi)
{
ostid_set_seq(oi, FID_SEQ_ECHO);
}
static inline void ostid_set_seq_llog(struct ost_id *oi)
{
ostid_set_seq(oi, FID_SEQ_LLOG);
}
static inline void ostid_cpu_to_le(const struct ost_id *src_oi,
struct ost_id *dst_oi)
{
if (fid_seq_is_mdt0(src_oi->oi.oi_seq)) {
dst_oi->oi.oi_id = __cpu_to_le64(src_oi->oi.oi_id);
dst_oi->oi.oi_seq = __cpu_to_le64(src_oi->oi.oi_seq);
} else {
fid_cpu_to_le(&dst_oi->oi_fid, &src_oi->oi_fid);
}
}
static inline void ostid_le_to_cpu(const struct ost_id *src_oi,
struct ost_id *dst_oi)
{
if (fid_seq_is_mdt0(src_oi->oi.oi_seq)) {
dst_oi->oi.oi_id = __le64_to_cpu(src_oi->oi.oi_id);
dst_oi->oi.oi_seq = __le64_to_cpu(src_oi->oi.oi_seq);
} else {
fid_le_to_cpu(&dst_oi->oi_fid, &src_oi->oi_fid);
}
}
/**
* Sigh, because pre-2.4 uses
* struct lov_mds_md_v1 {
* ........
* __u64 lmm_object_id;
* __u64 lmm_object_seq;
* ......
* }
* to identify the LOV(MDT) object, and lmm_object_seq will
* be normal_fid, which make it hard to combine these conversion
* to ostid_to FID. so we will do lmm_oi/fid conversion separately
*
* We can tell the lmm_oi by this way,
* 1.8: lmm_object_id = {inode}, lmm_object_gr = 0
* 2.1: lmm_object_id = {oid < 128k}, lmm_object_seq = FID_SEQ_NORMAL
* 2.4: lmm_oi.f_seq = FID_SEQ_NORMAL, lmm_oi.f_oid = {oid < 128k},
* lmm_oi.f_ver = 0
*
* But currently lmm_oi/lsm_oi does not have any "real" usages,
* except for printing some information, and the user can always
* get the real FID from LMA, besides this multiple case check might
* make swab more complicate. So we will keep using id/seq for lmm_oi.
*/
static inline void fid_to_lmm_oi(const struct lu_fid *fid,
struct ost_id *oi)
{
oi->oi.oi_id = fid_oid(fid);
oi->oi.oi_seq = fid_seq(fid);
}
/**
* Unpack an OST object id/seq (group) into a FID. This is needed for
* converting all obdo, lmm, lsm, etc. 64-bit id/seq pairs into proper
* FIDs. Note that if an id/seq is already in FID/IDIF format it will
* be passed through unchanged. Only legacy OST objects in "group 0"
* will be mapped into the IDIF namespace so that they can fit into the
* struct lu_fid fields without loss.
*/
static inline int ostid_to_fid(struct lu_fid *fid, const struct ost_id *ostid,
__u32 ost_idx)
{
__u64 seq = ostid_seq(ostid);
if (ost_idx > 0xffff)
return -EBADF;
if (fid_seq_is_mdt0(seq)) {
__u64 oid = ostid_id(ostid);
/* This is a "legacy" (old 1.x/2.early) OST object in "group 0"
* that we map into the IDIF namespace. It allows up to 2^48
* objects per OST, as this is the object namespace that has
* been in production for years. This can handle create rates
* of 1M objects/s/OST for 9 years, or combinations thereof.
*/
if (oid >= IDIF_MAX_OID)
return -EBADF;
fid->f_seq = fid_idif_seq(oid, ost_idx);
/* truncate to 32 bits by assignment */
fid->f_oid = oid;
/* in theory, not currently used */
fid->f_ver = oid >> 48;
} else if (!fid_seq_is_default(seq)) {
/* This is either an IDIF object, which identifies objects
* across all OSTs, or a regular FID. The IDIF namespace
* maps legacy OST objects into the FID namespace. In both
* cases, we just pass the FID through, no conversion needed.
*/
if (ostid->oi_fid.f_ver)
return -EBADF;
*fid = ostid->oi_fid;
}
return 0;
}
#endif /* _UAPI_LUSTRE_OSTID_H_ */

View File

@ -1,94 +0,0 @@
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* User-settable parameter keys
*
* Author: Nathan Rutman <nathan@clusterfs.com>
*/
#ifndef _UAPI_LUSTRE_PARAM_H_
#define _UAPI_LUSTRE_PARAM_H_
/** \defgroup param param
*
* @{
*/
/****************** User-settable parameter keys *********************/
/* e.g.
* tunefs.lustre --param="failover.node=192.168.0.13@tcp0" /dev/sda
* lctl conf_param testfs-OST0000 failover.node=3@elan,192.168.0.3@tcp0
* ... testfs-MDT0000.lov.stripesize=4M
* ... testfs-OST0000.ost.client_cache_seconds=15
* ... testfs.sys.timeout=<secs>
* ... testfs.llite.max_read_ahead_mb=16
*/
/* System global or special params not handled in obd's proc
* See mgs_write_log_sys()
*/
#define PARAM_TIMEOUT "timeout=" /* global */
#define PARAM_LDLM_TIMEOUT "ldlm_timeout=" /* global */
#define PARAM_AT_MIN "at_min=" /* global */
#define PARAM_AT_MAX "at_max=" /* global */
#define PARAM_AT_EXTRA "at_extra=" /* global */
#define PARAM_AT_EARLY_MARGIN "at_early_margin=" /* global */
#define PARAM_AT_HISTORY "at_history=" /* global */
#define PARAM_JOBID_VAR "jobid_var=" /* global */
#define PARAM_MGSNODE "mgsnode=" /* only at mounttime */
#define PARAM_FAILNODE "failover.node=" /* add failover nid */
#define PARAM_FAILMODE "failover.mode=" /* initial mount only */
#define PARAM_ACTIVE "active=" /* activate/deactivate */
#define PARAM_NETWORK "network=" /* bind on nid */
#define PARAM_ID_UPCALL "identity_upcall=" /* identity upcall */
/* Prefixes for parameters handled by obd's proc methods (XXX_process_config) */
#define PARAM_OST "ost."
#define PARAM_OSD "osd."
#define PARAM_OSC "osc."
#define PARAM_MDT "mdt."
#define PARAM_HSM "mdt.hsm."
#define PARAM_MDD "mdd."
#define PARAM_MDC "mdc."
#define PARAM_LLITE "llite."
#define PARAM_LOV "lov."
#define PARAM_LOD "lod."
#define PARAM_OSP "osp."
#define PARAM_SYS "sys." /* global */
#define PARAM_SRPC "srpc."
#define PARAM_SRPC_FLVR "srpc.flavor."
#define PARAM_SRPC_UDESC "srpc.udesc.cli2mdt"
#define PARAM_SEC "security."
#define PARAM_QUOTA "quota." /* global */
/** @} param */
#endif /* _UAPI_LUSTRE_PARAM_H_ */

File diff suppressed because it is too large Load Diff

View File

@ -1,27 +0,0 @@
#ifndef _LUSTRE_VER_H_
#define _LUSTRE_VER_H_
#define LUSTRE_MAJOR 2
#define LUSTRE_MINOR 6
#define LUSTRE_PATCH 99
#define LUSTRE_FIX 0
#define LUSTRE_VERSION_STRING "2.6.99"
#define OBD_OCD_VERSION(major, minor, patch, fix) \
(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
#define OBD_OCD_VERSION_MAJOR(version) ((int)((version) >> 24) & 255)
#define OBD_OCD_VERSION_MINOR(version) ((int)((version) >> 16) & 255)
#define OBD_OCD_VERSION_PATCH(version) ((int)((version) >> 8) & 255)
#define OBD_OCD_VERSION_FIX(version) ((int)((version) >> 0) & 255)
#define LUSTRE_VERSION_CODE \
OBD_OCD_VERSION(LUSTRE_MAJOR, LUSTRE_MINOR, LUSTRE_PATCH, LUSTRE_FIX)
/*
* If lustre version of client and servers it connects to differs by more
* than this amount, client would issue a warning.
*/
#define LUSTRE_VERSION_OFFSET_WARN OBD_OCD_VERSION(0, 4, 0, 0)
#endif

View File

@ -1,46 +0,0 @@
config LNET
tristate "Lustre networking subsystem (LNet)"
depends on INET
help
The Lustre network layer, also known as LNet, is a networking abstaction
level API that was initially created to allow Lustre Filesystem to utilize
very different networks like tcp and ib verbs in a uniform way. In the
case of Lustre routers only the LNet layer is required. Lately other
projects are also looking into using LNet as their networking API as well.
config LNET_MAX_PAYLOAD
int "Lustre lnet max transfer payload (default 1MB)"
depends on LNET
default "1048576"
help
This option defines the maximum size of payload in bytes that lnet
can put into its transport.
If unsure, use default.
config LNET_SELFTEST
tristate "Lustre networking self testing"
depends on LNET
help
Choose Y here if you want to do lnet self testing. To compile this
as a module, choose M here: the module will be called lnet_selftest.
To compile this as a kernel modules, choose M here and it will be
called lnet_selftest.
If unsure, say N.
See also http://wiki.lustre.org/
config LNET_XPRT_IB
tristate "LNET infiniband support"
depends on LNET && PCI && INFINIBAND && INFINIBAND_ADDR_TRANS
default LNET && INFINIBAND
help
This option allows the LNET users to use infiniband as an
RDMA-enabled transport.
To compile this as a kernel module, choose M here and it will be
called ko2iblnd.
If unsure, say N.

View File

@ -1 +0,0 @@
obj-$(CONFIG_LNET) += libcfs/ lnet/ klnds/ selftest/

View File

@ -1 +0,0 @@
obj-$(CONFIG_LNET) += o2iblnd/ socklnd/

View File

@ -1,5 +0,0 @@
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
obj-$(CONFIG_LNET_XPRT_IB) += ko2iblnd.o
ko2iblnd-y := o2iblnd.o o2iblnd_cb.o o2iblnd_modparams.o

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,296 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/klnds/o2iblnd/o2iblnd_modparams.c
*
* Author: Eric Barton <eric@bartonsoftware.com>
*/
#include "o2iblnd.h"
static int service = 987;
module_param(service, int, 0444);
MODULE_PARM_DESC(service, "service number (within RDMA_PS_TCP)");
static int cksum;
module_param(cksum, int, 0644);
MODULE_PARM_DESC(cksum, "set non-zero to enable message (not RDMA) checksums");
static int timeout = 50;
module_param(timeout, int, 0644);
MODULE_PARM_DESC(timeout, "timeout (seconds)");
/*
* Number of threads in each scheduler pool which is percpt,
* we will estimate reasonable value based on CPUs if it's set to zero.
*/
static int nscheds;
module_param(nscheds, int, 0444);
MODULE_PARM_DESC(nscheds, "number of threads in each scheduler pool");
static unsigned int conns_per_peer = 1;
module_param(conns_per_peer, uint, 0444);
MODULE_PARM_DESC(conns_per_peer, "number of connections per peer");
/* NB: this value is shared by all CPTs, it can grow at runtime */
static int ntx = 512;
module_param(ntx, int, 0444);
MODULE_PARM_DESC(ntx, "# of message descriptors allocated for each pool");
/* NB: this value is shared by all CPTs */
static int credits = 256;
module_param(credits, int, 0444);
MODULE_PARM_DESC(credits, "# concurrent sends");
static int peer_credits = 8;
module_param(peer_credits, int, 0444);
MODULE_PARM_DESC(peer_credits, "# concurrent sends to 1 peer");
static int peer_credits_hiw;
module_param(peer_credits_hiw, int, 0444);
MODULE_PARM_DESC(peer_credits_hiw, "when eagerly to return credits");
static int peer_buffer_credits;
module_param(peer_buffer_credits, int, 0444);
MODULE_PARM_DESC(peer_buffer_credits, "# per-peer router buffer credits");
static int peer_timeout = 180;
module_param(peer_timeout, int, 0444);
MODULE_PARM_DESC(peer_timeout, "Seconds without aliveness news to declare peer dead (<=0 to disable)");
static char *ipif_name = "ib0";
module_param(ipif_name, charp, 0444);
MODULE_PARM_DESC(ipif_name, "IPoIB interface name");
static int retry_count = 5;
module_param(retry_count, int, 0644);
MODULE_PARM_DESC(retry_count, "Retransmissions when no ACK received");
static int rnr_retry_count = 6;
module_param(rnr_retry_count, int, 0644);
MODULE_PARM_DESC(rnr_retry_count, "RNR retransmissions");
static int keepalive = 100;
module_param(keepalive, int, 0644);
MODULE_PARM_DESC(keepalive, "Idle time in seconds before sending a keepalive");
static int ib_mtu;
module_param(ib_mtu, int, 0444);
MODULE_PARM_DESC(ib_mtu, "IB MTU 256/512/1024/2048/4096");
static int concurrent_sends;
module_param(concurrent_sends, int, 0444);
MODULE_PARM_DESC(concurrent_sends, "send work-queue sizing");
#define IBLND_DEFAULT_MAP_ON_DEMAND IBLND_MAX_RDMA_FRAGS
static int map_on_demand = IBLND_DEFAULT_MAP_ON_DEMAND;
module_param(map_on_demand, int, 0444);
MODULE_PARM_DESC(map_on_demand, "map on demand");
/* NB: this value is shared by all CPTs, it can grow at runtime */
static int fmr_pool_size = 512;
module_param(fmr_pool_size, int, 0444);
MODULE_PARM_DESC(fmr_pool_size, "size of fmr pool on each CPT (>= ntx / 4)");
/* NB: this value is shared by all CPTs, it can grow at runtime */
static int fmr_flush_trigger = 384;
module_param(fmr_flush_trigger, int, 0444);
MODULE_PARM_DESC(fmr_flush_trigger, "# dirty FMRs that triggers pool flush");
static int fmr_cache = 1;
module_param(fmr_cache, int, 0444);
MODULE_PARM_DESC(fmr_cache, "non-zero to enable FMR caching");
/*
* 0: disable failover
* 1: enable failover if necessary
* 2: force to failover (for debug)
*/
static int dev_failover;
module_param(dev_failover, int, 0444);
MODULE_PARM_DESC(dev_failover, "HCA failover for bonding (0 off, 1 on, other values reserved)");
static int require_privileged_port;
module_param(require_privileged_port, int, 0644);
MODULE_PARM_DESC(require_privileged_port, "require privileged port when accepting connection");
static int use_privileged_port = 1;
module_param(use_privileged_port, int, 0644);
MODULE_PARM_DESC(use_privileged_port, "use privileged port when initiating connection");
struct kib_tunables kiblnd_tunables = {
.kib_dev_failover = &dev_failover,
.kib_service = &service,
.kib_cksum = &cksum,
.kib_timeout = &timeout,
.kib_keepalive = &keepalive,
.kib_ntx = &ntx,
.kib_default_ipif = &ipif_name,
.kib_retry_count = &retry_count,
.kib_rnr_retry_count = &rnr_retry_count,
.kib_ib_mtu = &ib_mtu,
.kib_require_priv_port = &require_privileged_port,
.kib_use_priv_port = &use_privileged_port,
.kib_nscheds = &nscheds
};
static struct lnet_ioctl_config_o2iblnd_tunables default_tunables;
/* # messages/RDMAs in-flight */
int kiblnd_msg_queue_size(int version, struct lnet_ni *ni)
{
if (version == IBLND_MSG_VERSION_1)
return IBLND_MSG_QUEUE_SIZE_V1;
else if (ni)
return ni->ni_peertxcredits;
else
return peer_credits;
}
int kiblnd_tunables_setup(struct lnet_ni *ni)
{
struct lnet_ioctl_config_o2iblnd_tunables *tunables;
/*
* if there was no tunables specified, setup the tunables to be
* defaulted
*/
if (!ni->ni_lnd_tunables) {
ni->ni_lnd_tunables = kzalloc(sizeof(*ni->ni_lnd_tunables),
GFP_NOFS);
if (!ni->ni_lnd_tunables)
return -ENOMEM;
memcpy(&ni->ni_lnd_tunables->lt_tun_u.lt_o2ib,
&default_tunables, sizeof(*tunables));
}
tunables = &ni->ni_lnd_tunables->lt_tun_u.lt_o2ib;
/* Current API version */
tunables->lnd_version = 0;
if (kiblnd_translate_mtu(*kiblnd_tunables.kib_ib_mtu) < 0) {
CERROR("Invalid ib_mtu %d, expected 256/512/1024/2048/4096\n",
*kiblnd_tunables.kib_ib_mtu);
return -EINVAL;
}
if (!ni->ni_peertimeout)
ni->ni_peertimeout = peer_timeout;
if (!ni->ni_maxtxcredits)
ni->ni_maxtxcredits = credits;
if (!ni->ni_peertxcredits)
ni->ni_peertxcredits = peer_credits;
if (!ni->ni_peerrtrcredits)
ni->ni_peerrtrcredits = peer_buffer_credits;
if (ni->ni_peertxcredits < IBLND_CREDITS_DEFAULT)
ni->ni_peertxcredits = IBLND_CREDITS_DEFAULT;
if (ni->ni_peertxcredits > IBLND_CREDITS_MAX)
ni->ni_peertxcredits = IBLND_CREDITS_MAX;
if (ni->ni_peertxcredits > credits)
ni->ni_peertxcredits = credits;
if (!tunables->lnd_peercredits_hiw)
tunables->lnd_peercredits_hiw = peer_credits_hiw;
if (tunables->lnd_peercredits_hiw < ni->ni_peertxcredits / 2)
tunables->lnd_peercredits_hiw = ni->ni_peertxcredits / 2;
if (tunables->lnd_peercredits_hiw >= ni->ni_peertxcredits)
tunables->lnd_peercredits_hiw = ni->ni_peertxcredits - 1;
if (tunables->lnd_map_on_demand <= 0 ||
tunables->lnd_map_on_demand > IBLND_MAX_RDMA_FRAGS) {
/* Use the default */
CWARN("Invalid map_on_demand (%d), expects 1 - %d. Using default of %d\n",
tunables->lnd_map_on_demand,
IBLND_MAX_RDMA_FRAGS, IBLND_DEFAULT_MAP_ON_DEMAND);
tunables->lnd_map_on_demand = IBLND_DEFAULT_MAP_ON_DEMAND;
}
if (tunables->lnd_map_on_demand == 1) {
/* don't make sense to create map if only one fragment */
tunables->lnd_map_on_demand = 2;
}
if (!tunables->lnd_concurrent_sends) {
if (tunables->lnd_map_on_demand > 0 &&
tunables->lnd_map_on_demand <= IBLND_MAX_RDMA_FRAGS / 8) {
tunables->lnd_concurrent_sends =
ni->ni_peertxcredits * 2;
} else {
tunables->lnd_concurrent_sends = ni->ni_peertxcredits;
}
}
if (tunables->lnd_concurrent_sends > ni->ni_peertxcredits * 2)
tunables->lnd_concurrent_sends = ni->ni_peertxcredits * 2;
if (tunables->lnd_concurrent_sends < ni->ni_peertxcredits / 2)
tunables->lnd_concurrent_sends = ni->ni_peertxcredits / 2;
if (tunables->lnd_concurrent_sends < ni->ni_peertxcredits) {
CWARN("Concurrent sends %d is lower than message queue size: %d, performance may drop slightly.\n",
tunables->lnd_concurrent_sends, ni->ni_peertxcredits);
}
if (!tunables->lnd_fmr_pool_size)
tunables->lnd_fmr_pool_size = fmr_pool_size;
if (!tunables->lnd_fmr_flush_trigger)
tunables->lnd_fmr_flush_trigger = fmr_flush_trigger;
if (!tunables->lnd_fmr_cache)
tunables->lnd_fmr_cache = fmr_cache;
if (!tunables->lnd_conns_per_peer) {
tunables->lnd_conns_per_peer = (conns_per_peer) ?
conns_per_peer : 1;
}
return 0;
}
void kiblnd_tunables_init(void)
{
default_tunables.lnd_version = 0;
default_tunables.lnd_peercredits_hiw = peer_credits_hiw,
default_tunables.lnd_map_on_demand = map_on_demand;
default_tunables.lnd_concurrent_sends = concurrent_sends;
default_tunables.lnd_fmr_pool_size = fmr_pool_size;
default_tunables.lnd_fmr_flush_trigger = fmr_flush_trigger;
default_tunables.lnd_fmr_cache = fmr_cache;
default_tunables.lnd_conns_per_peer = conns_per_peer;
}

View File

@ -1,6 +0,0 @@
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
obj-$(CONFIG_LNET) += ksocklnd.o
ksocklnd-y := socklnd.o socklnd_cb.o socklnd_proto.o socklnd_modparams.o socklnd_lib.o

File diff suppressed because it is too large Load Diff

View File

@ -1,704 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2011, 2012, Intel Corporation.
*
* Author: Zach Brown <zab@zabbo.net>
* Author: Peter J. Braam <braam@clusterfs.com>
* Author: Phil Schwan <phil@clusterfs.com>
* Author: Eric Barton <eric@bartonsoftware.com>
*
* This file is part of Lustre, http://www.lustre.org
*
* Portals is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*
* Portals is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#ifndef _SOCKLND_SOCKLND_H_
#define _SOCKLND_SOCKLND_H_
#define DEBUG_PORTAL_ALLOC
#define DEBUG_SUBSYSTEM S_LND
#include <linux/crc32.h>
#include <linux/errno.h>
#include <linux/if.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/kmod.h>
#include <linux/list.h>
#include <linux/mm.h>
#include <linux/module.h>
#include <linux/stat.h>
#include <linux/string.h>
#include <linux/syscalls.h>
#include <linux/sysctl.h>
#include <linux/uio.h>
#include <linux/unistd.h>
#include <asm/irq.h>
#include <net/sock.h>
#include <net/tcp.h>
#include <linux/lnet/lib-lnet.h>
#include <linux/lnet/socklnd.h>
/* assume one thread for each connection type */
#define SOCKNAL_NSCHEDS 3
#define SOCKNAL_NSCHEDS_HIGH (SOCKNAL_NSCHEDS << 1)
#define SOCKNAL_PEER_HASH_SIZE 101 /* # peer lists */
#define SOCKNAL_RESCHED 100 /* # scheduler loops before reschedule */
#define SOCKNAL_INSANITY_RECONN 5000 /* connd is trying on reconn infinitely */
#define SOCKNAL_ENOMEM_RETRY 1 /* jiffies between retries */
#define SOCKNAL_SINGLE_FRAG_TX 0 /* disable multi-fragment sends */
#define SOCKNAL_SINGLE_FRAG_RX 0 /* disable multi-fragment receives */
#define SOCKNAL_VERSION_DEBUG 0 /* enable protocol version debugging */
/*
* risk kmap deadlock on multi-frag I/O (backs off to single-frag if disabled).
* no risk if we're not running on a CONFIG_HIGHMEM platform.
*/
#ifdef CONFIG_HIGHMEM
# define SOCKNAL_RISK_KMAP_DEADLOCK 0
#else
# define SOCKNAL_RISK_KMAP_DEADLOCK 1
#endif
struct ksock_sched_info;
struct ksock_sched { /* per scheduler state */
spinlock_t kss_lock; /* serialise */
struct list_head kss_rx_conns; /* conn waiting to be read */
struct list_head kss_tx_conns; /* conn waiting to be written */
struct list_head kss_zombie_noop_txs; /* zombie noop tx list */
wait_queue_head_t kss_waitq; /* where scheduler sleeps */
int kss_nconns; /* # connections assigned to
* this scheduler
*/
struct ksock_sched_info *kss_info; /* owner of it */
};
struct ksock_sched_info {
int ksi_nthreads_max; /* max allowed threads */
int ksi_nthreads; /* number of threads */
int ksi_cpt; /* CPT id */
struct ksock_sched *ksi_scheds; /* array of schedulers */
};
#define KSOCK_CPT_SHIFT 16
#define KSOCK_THREAD_ID(cpt, sid) (((cpt) << KSOCK_CPT_SHIFT) | (sid))
#define KSOCK_THREAD_CPT(id) ((id) >> KSOCK_CPT_SHIFT)
#define KSOCK_THREAD_SID(id) ((id) & ((1UL << KSOCK_CPT_SHIFT) - 1))
struct ksock_interface { /* in-use interface */
__u32 ksni_ipaddr; /* interface's IP address */
__u32 ksni_netmask; /* interface's network mask */
int ksni_nroutes; /* # routes using (active) */
int ksni_npeers; /* # peers using (passive) */
char ksni_name[IFNAMSIZ]; /* interface name */
};
struct ksock_tunables {
int *ksnd_timeout; /* "stuck" socket timeout
* (seconds)
*/
int *ksnd_nscheds; /* # scheduler threads in each
* pool while starting
*/
int *ksnd_nconnds; /* # connection daemons */
int *ksnd_nconnds_max; /* max # connection daemons */
int *ksnd_min_reconnectms; /* first connection retry after
* (ms)...
*/
int *ksnd_max_reconnectms; /* ...exponentially increasing to
* this
*/
int *ksnd_eager_ack; /* make TCP ack eagerly? */
int *ksnd_typed_conns; /* drive sockets by type? */
int *ksnd_min_bulk; /* smallest "large" message */
int *ksnd_tx_buffer_size; /* socket tx buffer size */
int *ksnd_rx_buffer_size; /* socket rx buffer size */
int *ksnd_nagle; /* enable NAGLE? */
int *ksnd_round_robin; /* round robin for multiple
* interfaces
*/
int *ksnd_keepalive; /* # secs for sending keepalive
* NOOP
*/
int *ksnd_keepalive_idle; /* # idle secs before 1st probe
*/
int *ksnd_keepalive_count; /* # probes */
int *ksnd_keepalive_intvl; /* time between probes */
int *ksnd_credits; /* # concurrent sends */
int *ksnd_peertxcredits; /* # concurrent sends to 1 peer
*/
int *ksnd_peerrtrcredits; /* # per-peer router buffer
* credits
*/
int *ksnd_peertimeout; /* seconds to consider peer dead
*/
int *ksnd_enable_csum; /* enable check sum */
int *ksnd_inject_csum_error; /* set non-zero to inject
* checksum error
*/
int *ksnd_nonblk_zcack; /* always send zc-ack on
* non-blocking connection
*/
unsigned int *ksnd_zc_min_payload; /* minimum zero copy payload
* size
*/
int *ksnd_zc_recv; /* enable ZC receive (for
* Chelsio TOE)
*/
int *ksnd_zc_recv_min_nfrags; /* minimum # of fragments to
* enable ZC receive
*/
};
struct ksock_net {
__u64 ksnn_incarnation; /* my epoch */
spinlock_t ksnn_lock; /* serialise */
struct list_head ksnn_list; /* chain on global list */
int ksnn_npeers; /* # peers */
int ksnn_shutdown; /* shutting down? */
int ksnn_ninterfaces; /* IP interfaces */
struct ksock_interface ksnn_interfaces[LNET_MAX_INTERFACES];
};
/** connd timeout */
#define SOCKNAL_CONND_TIMEOUT 120
/** reserved thread for accepting & creating new connd */
#define SOCKNAL_CONND_RESV 1
struct ksock_nal_data {
int ksnd_init; /* initialisation state
*/
int ksnd_nnets; /* # networks set up */
struct list_head ksnd_nets; /* list of nets */
rwlock_t ksnd_global_lock; /* stabilize peer/conn
* ops
*/
struct list_head *ksnd_peers; /* hash table of all my
* known peers
*/
int ksnd_peer_hash_size; /* size of ksnd_peers */
int ksnd_nthreads; /* # live threads */
int ksnd_shuttingdown; /* tell threads to exit
*/
struct ksock_sched_info **ksnd_sched_info; /* schedulers info */
atomic_t ksnd_nactive_txs; /* #active txs */
struct list_head ksnd_deathrow_conns; /* conns to close:
* reaper_lock
*/
struct list_head ksnd_zombie_conns; /* conns to free:
* reaper_lock
*/
struct list_head ksnd_enomem_conns; /* conns to retry:
* reaper_lock
*/
wait_queue_head_t ksnd_reaper_waitq; /* reaper sleeps here */
unsigned long ksnd_reaper_waketime; /* when reaper will wake
*/
spinlock_t ksnd_reaper_lock; /* serialise */
int ksnd_enomem_tx; /* test ENOMEM sender */
int ksnd_stall_tx; /* test sluggish sender
*/
int ksnd_stall_rx; /* test sluggish
* receiver
*/
struct list_head ksnd_connd_connreqs; /* incoming connection
* requests
*/
struct list_head ksnd_connd_routes; /* routes waiting to be
* connected
*/
wait_queue_head_t ksnd_connd_waitq; /* connds sleep here */
int ksnd_connd_connecting; /* # connds connecting
*/
time64_t ksnd_connd_failed_stamp;/* time stamp of the
* last failed
* connecting attempt
*/
time64_t ksnd_connd_starting_stamp;/* time stamp of the
* last starting connd
*/
unsigned int ksnd_connd_starting; /* # starting connd */
unsigned int ksnd_connd_running; /* # running connd */
spinlock_t ksnd_connd_lock; /* serialise */
struct list_head ksnd_idle_noop_txs; /* list head for freed
* noop tx
*/
spinlock_t ksnd_tx_lock; /* serialise, g_lock
* unsafe
*/
};
#define SOCKNAL_INIT_NOTHING 0
#define SOCKNAL_INIT_DATA 1
#define SOCKNAL_INIT_ALL 2
/*
* A packet just assembled for transmission is represented by 1 or more
* struct iovec fragments (the first frag contains the portals header),
* followed by 0 or more struct bio_vec fragments.
*
* On the receive side, initially 1 struct iovec fragment is posted for
* receive (the header). Once the header has been received, the payload is
* received into either struct iovec or struct bio_vec fragments, depending on
* what the header matched or whether the message needs forwarding.
*/
struct ksock_conn; /* forward ref */
struct ksock_peer; /* forward ref */
struct ksock_route; /* forward ref */
struct ksock_proto; /* forward ref */
struct ksock_tx { /* transmit packet */
struct list_head tx_list; /* queue on conn for transmission etc
*/
struct list_head tx_zc_list; /* queue on peer for ZC request */
atomic_t tx_refcount; /* tx reference count */
int tx_nob; /* # packet bytes */
int tx_resid; /* residual bytes */
int tx_niov; /* # packet iovec frags */
struct kvec *tx_iov; /* packet iovec frags */
int tx_nkiov; /* # packet page frags */
unsigned short tx_zc_aborted; /* aborted ZC request */
unsigned short tx_zc_capable:1; /* payload is large enough for ZC */
unsigned short tx_zc_checked:1; /* Have I checked if I should ZC? */
unsigned short tx_nonblk:1; /* it's a non-blocking ACK */
struct bio_vec *tx_kiov; /* packet page frags */
struct ksock_conn *tx_conn; /* owning conn */
struct lnet_msg *tx_lnetmsg; /* lnet message for lnet_finalize()
*/
unsigned long tx_deadline; /* when (in jiffies) tx times out */
struct ksock_msg tx_msg; /* socklnd message buffer */
int tx_desc_size; /* size of this descriptor */
union {
struct {
struct kvec iov; /* virt hdr */
struct bio_vec kiov[0]; /* paged payload */
} paged;
struct {
struct kvec iov[1]; /* virt hdr + payload */
} virt;
} tx_frags;
};
#define KSOCK_NOOP_TX_SIZE (offsetof(struct ksock_tx, tx_frags.paged.kiov[0]))
/* network zero copy callback descriptor embedded in struct ksock_tx */
#define SOCKNAL_RX_KSM_HEADER 1 /* reading ksock message header */
#define SOCKNAL_RX_LNET_HEADER 2 /* reading lnet message header */
#define SOCKNAL_RX_PARSE 3 /* Calling lnet_parse() */
#define SOCKNAL_RX_PARSE_WAIT 4 /* waiting to be told to read the body */
#define SOCKNAL_RX_LNET_PAYLOAD 5 /* reading lnet payload (to deliver here) */
#define SOCKNAL_RX_SLOP 6 /* skipping body */
struct ksock_conn {
struct ksock_peer *ksnc_peer; /* owning peer */
struct ksock_route *ksnc_route; /* owning route */
struct list_head ksnc_list; /* stash on peer's conn list */
struct socket *ksnc_sock; /* actual socket */
void *ksnc_saved_data_ready; /* socket's original
* data_ready() callback
*/
void *ksnc_saved_write_space; /* socket's original
* write_space() callback
*/
atomic_t ksnc_conn_refcount;/* conn refcount */
atomic_t ksnc_sock_refcount;/* sock refcount */
struct ksock_sched *ksnc_scheduler; /* who schedules this connection
*/
__u32 ksnc_myipaddr; /* my IP */
__u32 ksnc_ipaddr; /* peer's IP */
int ksnc_port; /* peer's port */
signed int ksnc_type:3; /* type of connection, should be
* signed value
*/
unsigned int ksnc_closing:1; /* being shut down */
unsigned int ksnc_flip:1; /* flip or not, only for V2.x */
unsigned int ksnc_zc_capable:1; /* enable to ZC */
struct ksock_proto *ksnc_proto; /* protocol for the connection */
/* reader */
struct list_head ksnc_rx_list; /* where I enq waiting input or a
* forwarding descriptor
*/
unsigned long ksnc_rx_deadline; /* when (in jiffies) receive times
* out
*/
__u8 ksnc_rx_started; /* started receiving a message */
__u8 ksnc_rx_ready; /* data ready to read */
__u8 ksnc_rx_scheduled; /* being progressed */
__u8 ksnc_rx_state; /* what is being read */
int ksnc_rx_nob_left; /* # bytes to next hdr/body */
struct iov_iter ksnc_rx_to; /* copy destination */
struct kvec ksnc_rx_iov_space[LNET_MAX_IOV]; /* space for frag descriptors */
__u32 ksnc_rx_csum; /* partial checksum for incoming
* data
*/
void *ksnc_cookie; /* rx lnet_finalize passthru arg
*/
struct ksock_msg ksnc_msg; /* incoming message buffer:
* V2.x message takes the
* whole struct
* V1.x message is a bare
* struct lnet_hdr, it's stored in
* ksnc_msg.ksm_u.lnetmsg
*/
/* WRITER */
struct list_head ksnc_tx_list; /* where I enq waiting for output
* space
*/
struct list_head ksnc_tx_queue; /* packets waiting to be sent */
struct ksock_tx *ksnc_tx_carrier; /* next TX that can carry a LNet
* message or ZC-ACK
*/
unsigned long ksnc_tx_deadline; /* when (in jiffies) tx times out
*/
int ksnc_tx_bufnob; /* send buffer marker */
atomic_t ksnc_tx_nob; /* # bytes queued */
int ksnc_tx_ready; /* write space */
int ksnc_tx_scheduled; /* being progressed */
unsigned long ksnc_tx_last_post; /* time stamp of the last posted
* TX
*/
};
struct ksock_route {
struct list_head ksnr_list; /* chain on peer route list */
struct list_head ksnr_connd_list; /* chain on ksnr_connd_routes */
struct ksock_peer *ksnr_peer; /* owning peer */
atomic_t ksnr_refcount; /* # users */
unsigned long ksnr_timeout; /* when (in jiffies) reconnection
* can happen next
*/
long ksnr_retry_interval; /* how long between retries */
__u32 ksnr_myipaddr; /* my IP */
__u32 ksnr_ipaddr; /* IP address to connect to */
int ksnr_port; /* port to connect to */
unsigned int ksnr_scheduled:1; /* scheduled for attention */
unsigned int ksnr_connecting:1; /* connection establishment in
* progress
*/
unsigned int ksnr_connected:4; /* connections established by
* type
*/
unsigned int ksnr_deleted:1; /* been removed from peer? */
unsigned int ksnr_share_count; /* created explicitly? */
int ksnr_conn_count; /* # conns established by this
* route
*/
};
#define SOCKNAL_KEEPALIVE_PING 1 /* cookie for keepalive ping */
struct ksock_peer {
struct list_head ksnp_list; /* stash on global peer list */
unsigned long ksnp_last_alive; /* when (in jiffies) I was last
* alive
*/
struct lnet_process_id ksnp_id; /* who's on the other end(s) */
atomic_t ksnp_refcount; /* # users */
int ksnp_sharecount; /* lconf usage counter */
int ksnp_closing; /* being closed */
int ksnp_accepting; /* # passive connections pending
*/
int ksnp_error; /* errno on closing last conn */
__u64 ksnp_zc_next_cookie; /* ZC completion cookie */
__u64 ksnp_incarnation; /* latest known peer incarnation
*/
struct ksock_proto *ksnp_proto; /* latest known peer protocol */
struct list_head ksnp_conns; /* all active connections */
struct list_head ksnp_routes; /* routes */
struct list_head ksnp_tx_queue; /* waiting packets */
spinlock_t ksnp_lock; /* serialize, g_lock unsafe */
struct list_head ksnp_zc_req_list; /* zero copy requests wait for
* ACK
*/
unsigned long ksnp_send_keepalive; /* time to send keepalive */
struct lnet_ni *ksnp_ni; /* which network */
int ksnp_n_passive_ips; /* # of... */
/* preferred local interfaces */
__u32 ksnp_passive_ips[LNET_MAX_INTERFACES];
};
struct ksock_connreq {
struct list_head ksncr_list; /* stash on ksnd_connd_connreqs */
struct lnet_ni *ksncr_ni; /* chosen NI */
struct socket *ksncr_sock; /* accepted socket */
};
extern struct ksock_nal_data ksocknal_data;
extern struct ksock_tunables ksocknal_tunables;
#define SOCKNAL_MATCH_NO 0 /* TX can't match type of connection */
#define SOCKNAL_MATCH_YES 1 /* TX matches type of connection */
#define SOCKNAL_MATCH_MAY 2 /* TX can be sent on the connection, but not
* preferred
*/
struct ksock_proto {
/* version number of protocol */
int pro_version;
/* handshake function */
int (*pro_send_hello)(struct ksock_conn *, struct ksock_hello_msg *);
/* handshake function */
int (*pro_recv_hello)(struct ksock_conn *, struct ksock_hello_msg *, int);
/* message pack */
void (*pro_pack)(struct ksock_tx *);
/* message unpack */
void (*pro_unpack)(struct ksock_msg *);
/* queue tx on the connection */
struct ksock_tx *(*pro_queue_tx_msg)(struct ksock_conn *, struct ksock_tx *);
/* queue ZC ack on the connection */
int (*pro_queue_tx_zcack)(struct ksock_conn *, struct ksock_tx *, __u64);
/* handle ZC request */
int (*pro_handle_zcreq)(struct ksock_conn *, __u64, int);
/* handle ZC ACK */
int (*pro_handle_zcack)(struct ksock_conn *, __u64, __u64);
/*
* msg type matches the connection type:
* return value:
* return MATCH_NO : no
* return MATCH_YES : matching type
* return MATCH_MAY : can be backup
*/
int (*pro_match_tx)(struct ksock_conn *, struct ksock_tx *, int);
};
extern struct ksock_proto ksocknal_protocol_v1x;
extern struct ksock_proto ksocknal_protocol_v2x;
extern struct ksock_proto ksocknal_protocol_v3x;
#define KSOCK_PROTO_V1_MAJOR LNET_PROTO_TCP_VERSION_MAJOR
#define KSOCK_PROTO_V1_MINOR LNET_PROTO_TCP_VERSION_MINOR
#define KSOCK_PROTO_V1 KSOCK_PROTO_V1_MAJOR
#ifndef CPU_MASK_NONE
#define CPU_MASK_NONE 0UL
#endif
static inline int
ksocknal_route_mask(void)
{
if (!*ksocknal_tunables.ksnd_typed_conns)
return (1 << SOCKLND_CONN_ANY);
return ((1 << SOCKLND_CONN_CONTROL) |
(1 << SOCKLND_CONN_BULK_IN) |
(1 << SOCKLND_CONN_BULK_OUT));
}
static inline struct list_head *
ksocknal_nid2peerlist(lnet_nid_t nid)
{
unsigned int hash = ((unsigned int)nid) % ksocknal_data.ksnd_peer_hash_size;
return &ksocknal_data.ksnd_peers[hash];
}
static inline void
ksocknal_conn_addref(struct ksock_conn *conn)
{
LASSERT(atomic_read(&conn->ksnc_conn_refcount) > 0);
atomic_inc(&conn->ksnc_conn_refcount);
}
void ksocknal_queue_zombie_conn(struct ksock_conn *conn);
void ksocknal_finalize_zcreq(struct ksock_conn *conn);
static inline void
ksocknal_conn_decref(struct ksock_conn *conn)
{
LASSERT(atomic_read(&conn->ksnc_conn_refcount) > 0);
if (atomic_dec_and_test(&conn->ksnc_conn_refcount))
ksocknal_queue_zombie_conn(conn);
}
static inline int
ksocknal_connsock_addref(struct ksock_conn *conn)
{
int rc = -ESHUTDOWN;
read_lock(&ksocknal_data.ksnd_global_lock);
if (!conn->ksnc_closing) {
LASSERT(atomic_read(&conn->ksnc_sock_refcount) > 0);
atomic_inc(&conn->ksnc_sock_refcount);
rc = 0;
}
read_unlock(&ksocknal_data.ksnd_global_lock);
return rc;
}
static inline void
ksocknal_connsock_decref(struct ksock_conn *conn)
{
LASSERT(atomic_read(&conn->ksnc_sock_refcount) > 0);
if (atomic_dec_and_test(&conn->ksnc_sock_refcount)) {
LASSERT(conn->ksnc_closing);
sock_release(conn->ksnc_sock);
conn->ksnc_sock = NULL;
ksocknal_finalize_zcreq(conn);
}
}
static inline void
ksocknal_tx_addref(struct ksock_tx *tx)
{
LASSERT(atomic_read(&tx->tx_refcount) > 0);
atomic_inc(&tx->tx_refcount);
}
void ksocknal_tx_prep(struct ksock_conn *, struct ksock_tx *tx);
void ksocknal_tx_done(struct lnet_ni *ni, struct ksock_tx *tx);
static inline void
ksocknal_tx_decref(struct ksock_tx *tx)
{
LASSERT(atomic_read(&tx->tx_refcount) > 0);
if (atomic_dec_and_test(&tx->tx_refcount))
ksocknal_tx_done(NULL, tx);
}
static inline void
ksocknal_route_addref(struct ksock_route *route)
{
LASSERT(atomic_read(&route->ksnr_refcount) > 0);
atomic_inc(&route->ksnr_refcount);
}
void ksocknal_destroy_route(struct ksock_route *route);
static inline void
ksocknal_route_decref(struct ksock_route *route)
{
LASSERT(atomic_read(&route->ksnr_refcount) > 0);
if (atomic_dec_and_test(&route->ksnr_refcount))
ksocknal_destroy_route(route);
}
static inline void
ksocknal_peer_addref(struct ksock_peer *peer)
{
LASSERT(atomic_read(&peer->ksnp_refcount) > 0);
atomic_inc(&peer->ksnp_refcount);
}
void ksocknal_destroy_peer(struct ksock_peer *peer);
static inline void
ksocknal_peer_decref(struct ksock_peer *peer)
{
LASSERT(atomic_read(&peer->ksnp_refcount) > 0);
if (atomic_dec_and_test(&peer->ksnp_refcount))
ksocknal_destroy_peer(peer);
}
int ksocknal_startup(struct lnet_ni *ni);
void ksocknal_shutdown(struct lnet_ni *ni);
int ksocknal_ctl(struct lnet_ni *ni, unsigned int cmd, void *arg);
int ksocknal_send(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg);
int ksocknal_recv(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg,
int delayed, struct iov_iter *to, unsigned int rlen);
int ksocknal_accept(struct lnet_ni *ni, struct socket *sock);
int ksocknal_add_peer(struct lnet_ni *ni, struct lnet_process_id id, __u32 ip,
int port);
struct ksock_peer *ksocknal_find_peer_locked(struct lnet_ni *ni,
struct lnet_process_id id);
struct ksock_peer *ksocknal_find_peer(struct lnet_ni *ni,
struct lnet_process_id id);
void ksocknal_peer_failed(struct ksock_peer *peer);
int ksocknal_create_conn(struct lnet_ni *ni, struct ksock_route *route,
struct socket *sock, int type);
void ksocknal_close_conn_locked(struct ksock_conn *conn, int why);
void ksocknal_terminate_conn(struct ksock_conn *conn);
void ksocknal_destroy_conn(struct ksock_conn *conn);
int ksocknal_close_peer_conns_locked(struct ksock_peer *peer,
__u32 ipaddr, int why);
int ksocknal_close_conn_and_siblings(struct ksock_conn *conn, int why);
int ksocknal_close_matching_conns(struct lnet_process_id id, __u32 ipaddr);
struct ksock_conn *ksocknal_find_conn_locked(struct ksock_peer *peer,
struct ksock_tx *tx, int nonblk);
int ksocknal_launch_packet(struct lnet_ni *ni, struct ksock_tx *tx,
struct lnet_process_id id);
struct ksock_tx *ksocknal_alloc_tx(int type, int size);
void ksocknal_free_tx(struct ksock_tx *tx);
struct ksock_tx *ksocknal_alloc_tx_noop(__u64 cookie, int nonblk);
void ksocknal_next_tx_carrier(struct ksock_conn *conn);
void ksocknal_queue_tx_locked(struct ksock_tx *tx, struct ksock_conn *conn);
void ksocknal_txlist_done(struct lnet_ni *ni, struct list_head *txlist, int error);
void ksocknal_notify(struct lnet_ni *ni, lnet_nid_t gw_nid, int alive);
void ksocknal_query(struct lnet_ni *ni, lnet_nid_t nid, unsigned long *when);
int ksocknal_thread_start(int (*fn)(void *arg), void *arg, char *name);
void ksocknal_thread_fini(void);
void ksocknal_launch_all_connections_locked(struct ksock_peer *peer);
struct ksock_route *ksocknal_find_connectable_route_locked(struct ksock_peer *peer);
struct ksock_route *ksocknal_find_connecting_route_locked(struct ksock_peer *peer);
int ksocknal_new_packet(struct ksock_conn *conn, int skip);
int ksocknal_scheduler(void *arg);
int ksocknal_connd(void *arg);
int ksocknal_reaper(void *arg);
int ksocknal_send_hello(struct lnet_ni *ni, struct ksock_conn *conn,
lnet_nid_t peer_nid, struct ksock_hello_msg *hello);
int ksocknal_recv_hello(struct lnet_ni *ni, struct ksock_conn *conn,
struct ksock_hello_msg *hello,
struct lnet_process_id *id,
__u64 *incarnation);
void ksocknal_read_callback(struct ksock_conn *conn);
void ksocknal_write_callback(struct ksock_conn *conn);
int ksocknal_lib_zc_capable(struct ksock_conn *conn);
void ksocknal_lib_save_callback(struct socket *sock, struct ksock_conn *conn);
void ksocknal_lib_set_callback(struct socket *sock, struct ksock_conn *conn);
void ksocknal_lib_reset_callback(struct socket *sock, struct ksock_conn *conn);
void ksocknal_lib_push_conn(struct ksock_conn *conn);
int ksocknal_lib_get_conn_addrs(struct ksock_conn *conn);
int ksocknal_lib_setup_sock(struct socket *so);
int ksocknal_lib_send_iov(struct ksock_conn *conn, struct ksock_tx *tx);
int ksocknal_lib_send_kiov(struct ksock_conn *conn, struct ksock_tx *tx);
void ksocknal_lib_eager_ack(struct ksock_conn *conn);
int ksocknal_lib_recv(struct ksock_conn *conn);
int ksocknal_lib_get_conn_tunables(struct ksock_conn *conn, int *txmem,
int *rxmem, int *nagle);
void ksocknal_read_callback(struct ksock_conn *conn);
void ksocknal_write_callback(struct ksock_conn *conn);
int ksocknal_tunables_init(void);
void ksocknal_lib_csum_tx(struct ksock_tx *tx);
int ksocknal_lib_memory_pressure(struct ksock_conn *conn);
int ksocknal_lib_bind_thread_to_cpu(int id);
#endif /* _SOCKLND_SOCKLND_H_ */

File diff suppressed because it is too large Load Diff

View File

@ -1,534 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#include <linux/highmem.h>
#include "socklnd.h"
int
ksocknal_lib_get_conn_addrs(struct ksock_conn *conn)
{
int rc = lnet_sock_getaddr(conn->ksnc_sock, 1, &conn->ksnc_ipaddr,
&conn->ksnc_port);
/* Didn't need the {get,put}connsock dance to deref ksnc_sock... */
LASSERT(!conn->ksnc_closing);
if (rc) {
CERROR("Error %d getting sock peer IP\n", rc);
return rc;
}
rc = lnet_sock_getaddr(conn->ksnc_sock, 0, &conn->ksnc_myipaddr, NULL);
if (rc) {
CERROR("Error %d getting sock local IP\n", rc);
return rc;
}
return 0;
}
int
ksocknal_lib_zc_capable(struct ksock_conn *conn)
{
int caps = conn->ksnc_sock->sk->sk_route_caps;
if (conn->ksnc_proto == &ksocknal_protocol_v1x)
return 0;
/*
* ZC if the socket supports scatter/gather and doesn't need software
* checksums
*/
return ((caps & NETIF_F_SG) && (caps & NETIF_F_CSUM_MASK));
}
int
ksocknal_lib_send_iov(struct ksock_conn *conn, struct ksock_tx *tx)
{
struct msghdr msg = {.msg_flags = MSG_DONTWAIT};
struct socket *sock = conn->ksnc_sock;
int nob, i;
if (*ksocknal_tunables.ksnd_enable_csum && /* checksum enabled */
conn->ksnc_proto == &ksocknal_protocol_v2x && /* V2.x connection */
tx->tx_nob == tx->tx_resid && /* frist sending */
!tx->tx_msg.ksm_csum) /* not checksummed */
ksocknal_lib_csum_tx(tx);
for (nob = i = 0; i < tx->tx_niov; i++)
nob += tx->tx_iov[i].iov_len;
if (!list_empty(&conn->ksnc_tx_queue) ||
nob < tx->tx_resid)
msg.msg_flags |= MSG_MORE;
iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC,
tx->tx_iov, tx->tx_niov, nob);
return sock_sendmsg(sock, &msg);
}
int
ksocknal_lib_send_kiov(struct ksock_conn *conn, struct ksock_tx *tx)
{
struct socket *sock = conn->ksnc_sock;
struct bio_vec *kiov = tx->tx_kiov;
int rc;
int nob;
/* Not NOOP message */
LASSERT(tx->tx_lnetmsg);
if (tx->tx_msg.ksm_zc_cookies[0]) {
/* Zero copy is enabled */
struct sock *sk = sock->sk;
struct page *page = kiov->bv_page;
int offset = kiov->bv_offset;
int fragsize = kiov->bv_len;
int msgflg = MSG_DONTWAIT;
CDEBUG(D_NET, "page %p + offset %x for %d\n",
page, offset, kiov->bv_len);
if (!list_empty(&conn->ksnc_tx_queue) ||
fragsize < tx->tx_resid)
msgflg |= MSG_MORE;
if (sk->sk_prot->sendpage) {
rc = sk->sk_prot->sendpage(sk, page,
offset, fragsize, msgflg);
} else {
rc = tcp_sendpage(sk, page, offset, fragsize, msgflg);
}
} else {
struct msghdr msg = {.msg_flags = MSG_DONTWAIT};
int i;
for (nob = i = 0; i < tx->tx_nkiov; i++)
nob += kiov[i].bv_len;
if (!list_empty(&conn->ksnc_tx_queue) ||
nob < tx->tx_resid)
msg.msg_flags |= MSG_MORE;
iov_iter_bvec(&msg.msg_iter, WRITE | ITER_BVEC,
kiov, tx->tx_nkiov, nob);
rc = sock_sendmsg(sock, &msg);
}
return rc;
}
void
ksocknal_lib_eager_ack(struct ksock_conn *conn)
{
int opt = 1;
struct socket *sock = conn->ksnc_sock;
/*
* Remind the socket to ACK eagerly. If I don't, the socket might
* think I'm about to send something it could piggy-back the ACK
* on, introducing delay in completing zero-copy sends in my
* peer.
*/
kernel_setsockopt(sock, SOL_TCP, TCP_QUICKACK, (char *)&opt,
sizeof(opt));
}
static int lustre_csum(struct kvec *v, void *context)
{
struct ksock_conn *conn = context;
conn->ksnc_rx_csum = crc32_le(conn->ksnc_rx_csum,
v->iov_base, v->iov_len);
return 0;
}
int
ksocknal_lib_recv(struct ksock_conn *conn)
{
struct msghdr msg = { .msg_iter = conn->ksnc_rx_to };
__u32 saved_csum;
int rc;
rc = sock_recvmsg(conn->ksnc_sock, &msg, MSG_DONTWAIT);
if (rc <= 0)
return rc;
saved_csum = conn->ksnc_msg.ksm_csum;
if (!saved_csum)
return rc;
/* header is included only in V2 - V3 checksums only the bulk data */
if (!(conn->ksnc_rx_to.type & ITER_BVEC) &&
conn->ksnc_proto != &ksocknal_protocol_v2x)
return rc;
/* accumulate checksum */
conn->ksnc_msg.ksm_csum = 0;
iov_iter_for_each_range(&conn->ksnc_rx_to, rc, lustre_csum, conn);
conn->ksnc_msg.ksm_csum = saved_csum;
return rc;
}
void
ksocknal_lib_csum_tx(struct ksock_tx *tx)
{
int i;
__u32 csum;
void *base;
LASSERT(tx->tx_iov[0].iov_base == &tx->tx_msg);
LASSERT(tx->tx_conn);
LASSERT(tx->tx_conn->ksnc_proto == &ksocknal_protocol_v2x);
tx->tx_msg.ksm_csum = 0;
csum = crc32_le(~0, tx->tx_iov[0].iov_base,
tx->tx_iov[0].iov_len);
if (tx->tx_kiov) {
for (i = 0; i < tx->tx_nkiov; i++) {
base = kmap(tx->tx_kiov[i].bv_page) +
tx->tx_kiov[i].bv_offset;
csum = crc32_le(csum, base, tx->tx_kiov[i].bv_len);
kunmap(tx->tx_kiov[i].bv_page);
}
} else {
for (i = 1; i < tx->tx_niov; i++)
csum = crc32_le(csum, tx->tx_iov[i].iov_base,
tx->tx_iov[i].iov_len);
}
if (*ksocknal_tunables.ksnd_inject_csum_error) {
csum++;
*ksocknal_tunables.ksnd_inject_csum_error = 0;
}
tx->tx_msg.ksm_csum = csum;
}
int
ksocknal_lib_get_conn_tunables(struct ksock_conn *conn, int *txmem,
int *rxmem, int *nagle)
{
struct socket *sock = conn->ksnc_sock;
int len;
int rc;
rc = ksocknal_connsock_addref(conn);
if (rc) {
LASSERT(conn->ksnc_closing);
*txmem = *rxmem = *nagle = 0;
return -ESHUTDOWN;
}
rc = lnet_sock_getbuf(sock, txmem, rxmem);
if (!rc) {
len = sizeof(*nagle);
rc = kernel_getsockopt(sock, SOL_TCP, TCP_NODELAY,
(char *)nagle, &len);
}
ksocknal_connsock_decref(conn);
if (!rc)
*nagle = !*nagle;
else
*txmem = *rxmem = *nagle = 0;
return rc;
}
int
ksocknal_lib_setup_sock(struct socket *sock)
{
int rc;
int option;
int keep_idle;
int keep_intvl;
int keep_count;
int do_keepalive;
struct linger linger;
sock->sk->sk_allocation = GFP_NOFS;
/*
* Ensure this socket aborts active sends immediately when we close
* it.
*/
linger.l_onoff = 0;
linger.l_linger = 0;
rc = kernel_setsockopt(sock, SOL_SOCKET, SO_LINGER, (char *)&linger,
sizeof(linger));
if (rc) {
CERROR("Can't set SO_LINGER: %d\n", rc);
return rc;
}
option = -1;
rc = kernel_setsockopt(sock, SOL_TCP, TCP_LINGER2, (char *)&option,
sizeof(option));
if (rc) {
CERROR("Can't set SO_LINGER2: %d\n", rc);
return rc;
}
if (!*ksocknal_tunables.ksnd_nagle) {
option = 1;
rc = kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY,
(char *)&option, sizeof(option));
if (rc) {
CERROR("Can't disable nagle: %d\n", rc);
return rc;
}
}
rc = lnet_sock_setbuf(sock, *ksocknal_tunables.ksnd_tx_buffer_size,
*ksocknal_tunables.ksnd_rx_buffer_size);
if (rc) {
CERROR("Can't set buffer tx %d, rx %d buffers: %d\n",
*ksocknal_tunables.ksnd_tx_buffer_size,
*ksocknal_tunables.ksnd_rx_buffer_size, rc);
return rc;
}
/* TCP_BACKOFF_* sockopt tunables unsupported in stock kernels */
/* snapshot tunables */
keep_idle = *ksocknal_tunables.ksnd_keepalive_idle;
keep_count = *ksocknal_tunables.ksnd_keepalive_count;
keep_intvl = *ksocknal_tunables.ksnd_keepalive_intvl;
do_keepalive = (keep_idle > 0 && keep_count > 0 && keep_intvl > 0);
option = (do_keepalive ? 1 : 0);
rc = kernel_setsockopt(sock, SOL_SOCKET, SO_KEEPALIVE, (char *)&option,
sizeof(option));
if (rc) {
CERROR("Can't set SO_KEEPALIVE: %d\n", rc);
return rc;
}
if (!do_keepalive)
return 0;
rc = kernel_setsockopt(sock, SOL_TCP, TCP_KEEPIDLE, (char *)&keep_idle,
sizeof(keep_idle));
if (rc) {
CERROR("Can't set TCP_KEEPIDLE: %d\n", rc);
return rc;
}
rc = kernel_setsockopt(sock, SOL_TCP, TCP_KEEPINTVL,
(char *)&keep_intvl, sizeof(keep_intvl));
if (rc) {
CERROR("Can't set TCP_KEEPINTVL: %d\n", rc);
return rc;
}
rc = kernel_setsockopt(sock, SOL_TCP, TCP_KEEPCNT, (char *)&keep_count,
sizeof(keep_count));
if (rc) {
CERROR("Can't set TCP_KEEPCNT: %d\n", rc);
return rc;
}
return 0;
}
void
ksocknal_lib_push_conn(struct ksock_conn *conn)
{
struct sock *sk;
struct tcp_sock *tp;
int nonagle;
int val = 1;
int rc;
rc = ksocknal_connsock_addref(conn);
if (rc) /* being shut down */
return;
sk = conn->ksnc_sock->sk;
tp = tcp_sk(sk);
lock_sock(sk);
nonagle = tp->nonagle;
tp->nonagle = 1;
release_sock(sk);
rc = kernel_setsockopt(conn->ksnc_sock, SOL_TCP, TCP_NODELAY,
(char *)&val, sizeof(val));
LASSERT(!rc);
lock_sock(sk);
tp->nonagle = nonagle;
release_sock(sk);
ksocknal_connsock_decref(conn);
}
/*
* socket call back in Linux
*/
static void
ksocknal_data_ready(struct sock *sk)
{
struct ksock_conn *conn;
/* interleave correctly with closing sockets... */
LASSERT(!in_irq());
read_lock(&ksocknal_data.ksnd_global_lock);
conn = sk->sk_user_data;
if (!conn) { /* raced with ksocknal_terminate_conn */
LASSERT(sk->sk_data_ready != &ksocknal_data_ready);
sk->sk_data_ready(sk);
} else {
ksocknal_read_callback(conn);
}
read_unlock(&ksocknal_data.ksnd_global_lock);
}
static void
ksocknal_write_space(struct sock *sk)
{
struct ksock_conn *conn;
int wspace;
int min_wpace;
/* interleave correctly with closing sockets... */
LASSERT(!in_irq());
read_lock(&ksocknal_data.ksnd_global_lock);
conn = sk->sk_user_data;
wspace = sk_stream_wspace(sk);
min_wpace = sk_stream_min_wspace(sk);
CDEBUG(D_NET, "sk %p wspace %d low water %d conn %p%s%s%s\n",
sk, wspace, min_wpace, conn,
!conn ? "" : (conn->ksnc_tx_ready ?
" ready" : " blocked"),
!conn ? "" : (conn->ksnc_tx_scheduled ?
" scheduled" : " idle"),
!conn ? "" : (list_empty(&conn->ksnc_tx_queue) ?
" empty" : " queued"));
if (!conn) { /* raced with ksocknal_terminate_conn */
LASSERT(sk->sk_write_space != &ksocknal_write_space);
sk->sk_write_space(sk);
read_unlock(&ksocknal_data.ksnd_global_lock);
return;
}
if (wspace >= min_wpace) { /* got enough space */
ksocknal_write_callback(conn);
/*
* Clear SOCK_NOSPACE _after_ ksocknal_write_callback so the
* ENOMEM check in ksocknal_transmit is race-free (think about
* it).
*/
clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
}
read_unlock(&ksocknal_data.ksnd_global_lock);
}
void
ksocknal_lib_save_callback(struct socket *sock, struct ksock_conn *conn)
{
conn->ksnc_saved_data_ready = sock->sk->sk_data_ready;
conn->ksnc_saved_write_space = sock->sk->sk_write_space;
}
void
ksocknal_lib_set_callback(struct socket *sock, struct ksock_conn *conn)
{
sock->sk->sk_user_data = conn;
sock->sk->sk_data_ready = ksocknal_data_ready;
sock->sk->sk_write_space = ksocknal_write_space;
}
void
ksocknal_lib_reset_callback(struct socket *sock, struct ksock_conn *conn)
{
/*
* Remove conn's network callbacks.
* NB I _have_ to restore the callback, rather than storing a noop,
* since the socket could survive past this module being unloaded!!
*/
sock->sk->sk_data_ready = conn->ksnc_saved_data_ready;
sock->sk->sk_write_space = conn->ksnc_saved_write_space;
/*
* A callback could be in progress already; they hold a read lock
* on ksnd_global_lock (to serialise with me) and NOOP if
* sk_user_data is NULL.
*/
sock->sk->sk_user_data = NULL;
}
int
ksocknal_lib_memory_pressure(struct ksock_conn *conn)
{
int rc = 0;
struct ksock_sched *sched;
sched = conn->ksnc_scheduler;
spin_lock_bh(&sched->kss_lock);
if (!test_bit(SOCK_NOSPACE, &conn->ksnc_sock->flags) &&
!conn->ksnc_tx_ready) {
/*
* SOCK_NOSPACE is set when the socket fills
* and cleared in the write_space callback
* (which also sets ksnc_tx_ready). If
* SOCK_NOSPACE and ksnc_tx_ready are BOTH
* zero, I didn't fill the socket and
* write_space won't reschedule me, so I
* return -ENOMEM to get my caller to retry
* after a timeout
*/
rc = -ENOMEM;
}
spin_unlock_bh(&sched->kss_lock);
return rc;
}

View File

@ -1,184 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2011, 2012, Intel Corporation.
*
* Author: Eric Barton <eric@bartonsoftware.com>
*
* Portals is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*
* Portals is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#include "socklnd.h"
static int sock_timeout = 50;
module_param(sock_timeout, int, 0644);
MODULE_PARM_DESC(sock_timeout, "dead socket timeout (seconds)");
static int credits = 256;
module_param(credits, int, 0444);
MODULE_PARM_DESC(credits, "# concurrent sends");
static int peer_credits = 8;
module_param(peer_credits, int, 0444);
MODULE_PARM_DESC(peer_credits, "# concurrent sends to 1 peer");
static int peer_buffer_credits;
module_param(peer_buffer_credits, int, 0444);
MODULE_PARM_DESC(peer_buffer_credits, "# per-peer router buffer credits");
static int peer_timeout = 180;
module_param(peer_timeout, int, 0444);
MODULE_PARM_DESC(peer_timeout, "Seconds without aliveness news to declare peer dead (<=0 to disable)");
/*
* Number of daemons in each thread pool which is percpt,
* we will estimate reasonable value based on CPUs if it's not set.
*/
static unsigned int nscheds;
module_param(nscheds, int, 0444);
MODULE_PARM_DESC(nscheds, "# scheduler daemons in each pool while starting");
static int nconnds = 4;
module_param(nconnds, int, 0444);
MODULE_PARM_DESC(nconnds, "# connection daemons while starting");
static int nconnds_max = 64;
module_param(nconnds_max, int, 0444);
MODULE_PARM_DESC(nconnds_max, "max # connection daemons");
static int min_reconnectms = 1000;
module_param(min_reconnectms, int, 0644);
MODULE_PARM_DESC(min_reconnectms, "min connection retry interval (mS)");
static int max_reconnectms = 60000;
module_param(max_reconnectms, int, 0644);
MODULE_PARM_DESC(max_reconnectms, "max connection retry interval (mS)");
# define DEFAULT_EAGER_ACK 0
static int eager_ack = DEFAULT_EAGER_ACK;
module_param(eager_ack, int, 0644);
MODULE_PARM_DESC(eager_ack, "send tcp ack packets eagerly");
static int typed_conns = 1;
module_param(typed_conns, int, 0444);
MODULE_PARM_DESC(typed_conns, "use different sockets for bulk");
static int min_bulk = 1 << 10;
module_param(min_bulk, int, 0644);
MODULE_PARM_DESC(min_bulk, "smallest 'large' message");
# define DEFAULT_BUFFER_SIZE 0
static int tx_buffer_size = DEFAULT_BUFFER_SIZE;
module_param(tx_buffer_size, int, 0644);
MODULE_PARM_DESC(tx_buffer_size, "socket tx buffer size (0 for system default)");
static int rx_buffer_size = DEFAULT_BUFFER_SIZE;
module_param(rx_buffer_size, int, 0644);
MODULE_PARM_DESC(rx_buffer_size, "socket rx buffer size (0 for system default)");
static int nagle;
module_param(nagle, int, 0644);
MODULE_PARM_DESC(nagle, "enable NAGLE?");
static int round_robin = 1;
module_param(round_robin, int, 0644);
MODULE_PARM_DESC(round_robin, "Round robin for multiple interfaces");
static int keepalive = 30;
module_param(keepalive, int, 0644);
MODULE_PARM_DESC(keepalive, "# seconds before send keepalive");
static int keepalive_idle = 30;
module_param(keepalive_idle, int, 0644);
MODULE_PARM_DESC(keepalive_idle, "# idle seconds before probe");
#define DEFAULT_KEEPALIVE_COUNT 5
static int keepalive_count = DEFAULT_KEEPALIVE_COUNT;
module_param(keepalive_count, int, 0644);
MODULE_PARM_DESC(keepalive_count, "# missed probes == dead");
static int keepalive_intvl = 5;
module_param(keepalive_intvl, int, 0644);
MODULE_PARM_DESC(keepalive_intvl, "seconds between probes");
static int enable_csum;
module_param(enable_csum, int, 0644);
MODULE_PARM_DESC(enable_csum, "enable check sum");
static int inject_csum_error;
module_param(inject_csum_error, int, 0644);
MODULE_PARM_DESC(inject_csum_error, "set non-zero to inject a checksum error");
static int nonblk_zcack = 1;
module_param(nonblk_zcack, int, 0644);
MODULE_PARM_DESC(nonblk_zcack, "always send ZC-ACK on non-blocking connection");
static unsigned int zc_min_payload = 16 << 10;
module_param(zc_min_payload, int, 0644);
MODULE_PARM_DESC(zc_min_payload, "minimum payload size to zero copy");
static unsigned int zc_recv;
module_param(zc_recv, int, 0644);
MODULE_PARM_DESC(zc_recv, "enable ZC recv for Chelsio driver");
static unsigned int zc_recv_min_nfrags = 16;
module_param(zc_recv_min_nfrags, int, 0644);
MODULE_PARM_DESC(zc_recv_min_nfrags, "minimum # of fragments to enable ZC recv");
#if SOCKNAL_VERSION_DEBUG
static int protocol = 3;
module_param(protocol, int, 0644);
MODULE_PARM_DESC(protocol, "protocol version");
#endif
struct ksock_tunables ksocknal_tunables;
int ksocknal_tunables_init(void)
{
/* initialize ksocknal_tunables structure */
ksocknal_tunables.ksnd_timeout = &sock_timeout;
ksocknal_tunables.ksnd_nscheds = &nscheds;
ksocknal_tunables.ksnd_nconnds = &nconnds;
ksocknal_tunables.ksnd_nconnds_max = &nconnds_max;
ksocknal_tunables.ksnd_min_reconnectms = &min_reconnectms;
ksocknal_tunables.ksnd_max_reconnectms = &max_reconnectms;
ksocknal_tunables.ksnd_eager_ack = &eager_ack;
ksocknal_tunables.ksnd_typed_conns = &typed_conns;
ksocknal_tunables.ksnd_min_bulk = &min_bulk;
ksocknal_tunables.ksnd_tx_buffer_size = &tx_buffer_size;
ksocknal_tunables.ksnd_rx_buffer_size = &rx_buffer_size;
ksocknal_tunables.ksnd_nagle = &nagle;
ksocknal_tunables.ksnd_round_robin = &round_robin;
ksocknal_tunables.ksnd_keepalive = &keepalive;
ksocknal_tunables.ksnd_keepalive_idle = &keepalive_idle;
ksocknal_tunables.ksnd_keepalive_count = &keepalive_count;
ksocknal_tunables.ksnd_keepalive_intvl = &keepalive_intvl;
ksocknal_tunables.ksnd_credits = &credits;
ksocknal_tunables.ksnd_peertxcredits = &peer_credits;
ksocknal_tunables.ksnd_peerrtrcredits = &peer_buffer_credits;
ksocknal_tunables.ksnd_peertimeout = &peer_timeout;
ksocknal_tunables.ksnd_enable_csum = &enable_csum;
ksocknal_tunables.ksnd_inject_csum_error = &inject_csum_error;
ksocknal_tunables.ksnd_nonblk_zcack = &nonblk_zcack;
ksocknal_tunables.ksnd_zc_min_payload = &zc_min_payload;
ksocknal_tunables.ksnd_zc_recv = &zc_recv;
ksocknal_tunables.ksnd_zc_recv_min_nfrags = &zc_recv_min_nfrags;
#if SOCKNAL_VERSION_DEBUG
ksocknal_tunables.ksnd_protocol = &protocol;
#endif
if (*ksocknal_tunables.ksnd_zc_min_payload < (2 << 10))
*ksocknal_tunables.ksnd_zc_min_payload = 2 << 10;
return 0;
};

View File

@ -1,810 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2009, 2010, Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2012, Intel Corporation.
*
* Author: Zach Brown <zab@zabbo.net>
* Author: Peter J. Braam <braam@clusterfs.com>
* Author: Phil Schwan <phil@clusterfs.com>
* Author: Eric Barton <eric@bartonsoftware.com>
*
* This file is part of Portals, http://www.sf.net/projects/sandiaportals/
*
* Portals is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*
* Portals is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#include "socklnd.h"
/*
* Protocol entries :
* pro_send_hello : send hello message
* pro_recv_hello : receive hello message
* pro_pack : pack message header
* pro_unpack : unpack message header
* pro_queue_tx_zcack() : Called holding BH lock: kss_lock
* return 1 if ACK is piggybacked, otherwise return 0
* pro_queue_tx_msg() : Called holding BH lock: kss_lock
* return the ACK that piggybacked by my message, or NULL
* pro_handle_zcreq() : handler of incoming ZC-REQ
* pro_handle_zcack() : handler of incoming ZC-ACK
* pro_match_tx() : Called holding glock
*/
static struct ksock_tx *
ksocknal_queue_tx_msg_v1(struct ksock_conn *conn, struct ksock_tx *tx_msg)
{
/* V1.x, just enqueue it */
list_add_tail(&tx_msg->tx_list, &conn->ksnc_tx_queue);
return NULL;
}
void
ksocknal_next_tx_carrier(struct ksock_conn *conn)
{
struct ksock_tx *tx = conn->ksnc_tx_carrier;
/* Called holding BH lock: conn->ksnc_scheduler->kss_lock */
LASSERT(!list_empty(&conn->ksnc_tx_queue));
LASSERT(tx);
/* Next TX that can carry ZC-ACK or LNet message */
if (tx->tx_list.next == &conn->ksnc_tx_queue) {
/* no more packets queued */
conn->ksnc_tx_carrier = NULL;
} else {
conn->ksnc_tx_carrier = list_next_entry(tx, tx_list);
LASSERT(conn->ksnc_tx_carrier->tx_msg.ksm_type == tx->tx_msg.ksm_type);
}
}
static int
ksocknal_queue_tx_zcack_v2(struct ksock_conn *conn,
struct ksock_tx *tx_ack, __u64 cookie)
{
struct ksock_tx *tx = conn->ksnc_tx_carrier;
LASSERT(!tx_ack ||
tx_ack->tx_msg.ksm_type == KSOCK_MSG_NOOP);
/*
* Enqueue or piggyback tx_ack / cookie
* . no tx can piggyback cookie of tx_ack (or cookie), just
* enqueue the tx_ack (if tx_ack != NUL) and return NULL.
* . There is tx can piggyback cookie of tx_ack (or cookie),
* piggyback the cookie and return the tx.
*/
if (!tx) {
if (tx_ack) {
list_add_tail(&tx_ack->tx_list,
&conn->ksnc_tx_queue);
conn->ksnc_tx_carrier = tx_ack;
}
return 0;
}
if (tx->tx_msg.ksm_type == KSOCK_MSG_NOOP) {
/* tx is noop zc-ack, can't piggyback zc-ack cookie */
if (tx_ack)
list_add_tail(&tx_ack->tx_list,
&conn->ksnc_tx_queue);
return 0;
}
LASSERT(tx->tx_msg.ksm_type == KSOCK_MSG_LNET);
LASSERT(!tx->tx_msg.ksm_zc_cookies[1]);
if (tx_ack)
cookie = tx_ack->tx_msg.ksm_zc_cookies[1];
/* piggyback the zc-ack cookie */
tx->tx_msg.ksm_zc_cookies[1] = cookie;
/* move on to the next TX which can carry cookie */
ksocknal_next_tx_carrier(conn);
return 1;
}
static struct ksock_tx *
ksocknal_queue_tx_msg_v2(struct ksock_conn *conn, struct ksock_tx *tx_msg)
{
struct ksock_tx *tx = conn->ksnc_tx_carrier;
/*
* Enqueue tx_msg:
* . If there is no NOOP on the connection, just enqueue
* tx_msg and return NULL
* . If there is NOOP on the connection, piggyback the cookie
* and replace the NOOP tx, and return the NOOP tx.
*/
if (!tx) { /* nothing on queue */
list_add_tail(&tx_msg->tx_list, &conn->ksnc_tx_queue);
conn->ksnc_tx_carrier = tx_msg;
return NULL;
}
if (tx->tx_msg.ksm_type == KSOCK_MSG_LNET) { /* nothing to carry */
list_add_tail(&tx_msg->tx_list, &conn->ksnc_tx_queue);
return NULL;
}
LASSERT(tx->tx_msg.ksm_type == KSOCK_MSG_NOOP);
/* There is a noop zc-ack can be piggybacked */
tx_msg->tx_msg.ksm_zc_cookies[1] = tx->tx_msg.ksm_zc_cookies[1];
ksocknal_next_tx_carrier(conn);
/* use new_tx to replace the noop zc-ack packet */
list_add(&tx_msg->tx_list, &tx->tx_list);
list_del(&tx->tx_list);
return tx;
}
static int
ksocknal_queue_tx_zcack_v3(struct ksock_conn *conn,
struct ksock_tx *tx_ack, __u64 cookie)
{
struct ksock_tx *tx;
if (conn->ksnc_type != SOCKLND_CONN_ACK)
return ksocknal_queue_tx_zcack_v2(conn, tx_ack, cookie);
/* non-blocking ZC-ACK (to router) */
LASSERT(!tx_ack ||
tx_ack->tx_msg.ksm_type == KSOCK_MSG_NOOP);
tx = conn->ksnc_tx_carrier;
if (!tx) {
if (tx_ack) {
list_add_tail(&tx_ack->tx_list,
&conn->ksnc_tx_queue);
conn->ksnc_tx_carrier = tx_ack;
}
return 0;
}
/* conn->ksnc_tx_carrier */
if (tx_ack)
cookie = tx_ack->tx_msg.ksm_zc_cookies[1];
if (cookie == SOCKNAL_KEEPALIVE_PING) /* ignore keepalive PING */
return 1;
if (tx->tx_msg.ksm_zc_cookies[1] == SOCKNAL_KEEPALIVE_PING) {
/* replace the keepalive PING with a real ACK */
LASSERT(!tx->tx_msg.ksm_zc_cookies[0]);
tx->tx_msg.ksm_zc_cookies[1] = cookie;
return 1;
}
if (cookie == tx->tx_msg.ksm_zc_cookies[0] ||
cookie == tx->tx_msg.ksm_zc_cookies[1]) {
CWARN("%s: duplicated ZC cookie: %llu\n",
libcfs_id2str(conn->ksnc_peer->ksnp_id), cookie);
return 1; /* XXX return error in the future */
}
if (!tx->tx_msg.ksm_zc_cookies[0]) {
/*
* NOOP tx has only one ZC-ACK cookie,
* can carry at least one more
*/
if (tx->tx_msg.ksm_zc_cookies[1] > cookie) {
tx->tx_msg.ksm_zc_cookies[0] = tx->tx_msg.ksm_zc_cookies[1];
tx->tx_msg.ksm_zc_cookies[1] = cookie;
} else {
tx->tx_msg.ksm_zc_cookies[0] = cookie;
}
if (tx->tx_msg.ksm_zc_cookies[0] - tx->tx_msg.ksm_zc_cookies[1] > 2) {
/*
* not likely to carry more ACKs, skip it
* to simplify logic
*/
ksocknal_next_tx_carrier(conn);
}
return 1;
}
/* takes two or more cookies already */
if (tx->tx_msg.ksm_zc_cookies[0] > tx->tx_msg.ksm_zc_cookies[1]) {
__u64 tmp = 0;
/* two separated cookies: (a+2, a) or (a+1, a) */
LASSERT(tx->tx_msg.ksm_zc_cookies[0] -
tx->tx_msg.ksm_zc_cookies[1] <= 2);
if (tx->tx_msg.ksm_zc_cookies[0] -
tx->tx_msg.ksm_zc_cookies[1] == 2) {
if (cookie == tx->tx_msg.ksm_zc_cookies[1] + 1)
tmp = cookie;
} else if (cookie == tx->tx_msg.ksm_zc_cookies[1] - 1) {
tmp = tx->tx_msg.ksm_zc_cookies[1];
} else if (cookie == tx->tx_msg.ksm_zc_cookies[0] + 1) {
tmp = tx->tx_msg.ksm_zc_cookies[0];
}
if (tmp) {
/* range of cookies */
tx->tx_msg.ksm_zc_cookies[0] = tmp - 1;
tx->tx_msg.ksm_zc_cookies[1] = tmp + 1;
return 1;
}
} else {
/*
* ksm_zc_cookies[0] < ksm_zc_cookies[1],
* it is range of cookies
*/
if (cookie >= tx->tx_msg.ksm_zc_cookies[0] &&
cookie <= tx->tx_msg.ksm_zc_cookies[1]) {
CWARN("%s: duplicated ZC cookie: %llu\n",
libcfs_id2str(conn->ksnc_peer->ksnp_id), cookie);
return 1; /* XXX: return error in the future */
}
if (cookie == tx->tx_msg.ksm_zc_cookies[1] + 1) {
tx->tx_msg.ksm_zc_cookies[1] = cookie;
return 1;
}
if (cookie == tx->tx_msg.ksm_zc_cookies[0] - 1) {
tx->tx_msg.ksm_zc_cookies[0] = cookie;
return 1;
}
}
/* failed to piggyback ZC-ACK */
if (tx_ack) {
list_add_tail(&tx_ack->tx_list, &conn->ksnc_tx_queue);
/* the next tx can piggyback at least 1 ACK */
ksocknal_next_tx_carrier(conn);
}
return 0;
}
static int
ksocknal_match_tx(struct ksock_conn *conn, struct ksock_tx *tx, int nonblk)
{
int nob;
#if SOCKNAL_VERSION_DEBUG
if (!*ksocknal_tunables.ksnd_typed_conns)
return SOCKNAL_MATCH_YES;
#endif
if (!tx || !tx->tx_lnetmsg) {
/* noop packet */
nob = offsetof(struct ksock_msg, ksm_u);
} else {
nob = tx->tx_lnetmsg->msg_len +
((conn->ksnc_proto == &ksocknal_protocol_v1x) ?
sizeof(struct lnet_hdr) : sizeof(struct ksock_msg));
}
/* default checking for typed connection */
switch (conn->ksnc_type) {
default:
CERROR("ksnc_type bad: %u\n", conn->ksnc_type);
LBUG();
case SOCKLND_CONN_ANY:
return SOCKNAL_MATCH_YES;
case SOCKLND_CONN_BULK_IN:
return SOCKNAL_MATCH_MAY;
case SOCKLND_CONN_BULK_OUT:
if (nob < *ksocknal_tunables.ksnd_min_bulk)
return SOCKNAL_MATCH_MAY;
else
return SOCKNAL_MATCH_YES;
case SOCKLND_CONN_CONTROL:
if (nob >= *ksocknal_tunables.ksnd_min_bulk)
return SOCKNAL_MATCH_MAY;
else
return SOCKNAL_MATCH_YES;
}
}
static int
ksocknal_match_tx_v3(struct ksock_conn *conn, struct ksock_tx *tx, int nonblk)
{
int nob;
if (!tx || !tx->tx_lnetmsg)
nob = offsetof(struct ksock_msg, ksm_u);
else
nob = tx->tx_lnetmsg->msg_len + sizeof(struct ksock_msg);
switch (conn->ksnc_type) {
default:
CERROR("ksnc_type bad: %u\n", conn->ksnc_type);
LBUG();
case SOCKLND_CONN_ANY:
return SOCKNAL_MATCH_NO;
case SOCKLND_CONN_ACK:
if (nonblk)
return SOCKNAL_MATCH_YES;
else if (!tx || !tx->tx_lnetmsg)
return SOCKNAL_MATCH_MAY;
else
return SOCKNAL_MATCH_NO;
case SOCKLND_CONN_BULK_OUT:
if (nonblk)
return SOCKNAL_MATCH_NO;
else if (nob < *ksocknal_tunables.ksnd_min_bulk)
return SOCKNAL_MATCH_MAY;
else
return SOCKNAL_MATCH_YES;
case SOCKLND_CONN_CONTROL:
if (nonblk)
return SOCKNAL_MATCH_NO;
else if (nob >= *ksocknal_tunables.ksnd_min_bulk)
return SOCKNAL_MATCH_MAY;
else
return SOCKNAL_MATCH_YES;
}
}
/* (Sink) handle incoming ZC request from sender */
static int
ksocknal_handle_zcreq(struct ksock_conn *c, __u64 cookie, int remote)
{
struct ksock_peer *peer = c->ksnc_peer;
struct ksock_conn *conn;
struct ksock_tx *tx;
int rc;
read_lock(&ksocknal_data.ksnd_global_lock);
conn = ksocknal_find_conn_locked(peer, NULL, !!remote);
if (conn) {
struct ksock_sched *sched = conn->ksnc_scheduler;
LASSERT(conn->ksnc_proto->pro_queue_tx_zcack);
spin_lock_bh(&sched->kss_lock);
rc = conn->ksnc_proto->pro_queue_tx_zcack(conn, NULL, cookie);
spin_unlock_bh(&sched->kss_lock);
if (rc) { /* piggybacked */
read_unlock(&ksocknal_data.ksnd_global_lock);
return 0;
}
}
read_unlock(&ksocknal_data.ksnd_global_lock);
/* ACK connection is not ready, or can't piggyback the ACK */
tx = ksocknal_alloc_tx_noop(cookie, !!remote);
if (!tx)
return -ENOMEM;
rc = ksocknal_launch_packet(peer->ksnp_ni, tx, peer->ksnp_id);
if (!rc)
return 0;
ksocknal_free_tx(tx);
return rc;
}
/* (Sender) handle ZC_ACK from sink */
static int
ksocknal_handle_zcack(struct ksock_conn *conn, __u64 cookie1, __u64 cookie2)
{
struct ksock_peer *peer = conn->ksnc_peer;
struct ksock_tx *tx;
struct ksock_tx *temp;
struct ksock_tx *tmp;
LIST_HEAD(zlist);
int count;
if (!cookie1)
cookie1 = cookie2;
count = (cookie1 > cookie2) ? 2 : (cookie2 - cookie1 + 1);
if (cookie2 == SOCKNAL_KEEPALIVE_PING &&
conn->ksnc_proto == &ksocknal_protocol_v3x) {
/* keepalive PING for V3.x, just ignore it */
return count == 1 ? 0 : -EPROTO;
}
spin_lock(&peer->ksnp_lock);
list_for_each_entry_safe(tx, tmp, &peer->ksnp_zc_req_list,
tx_zc_list) {
__u64 c = tx->tx_msg.ksm_zc_cookies[0];
if (c == cookie1 || c == cookie2 ||
(cookie1 < c && c < cookie2)) {
tx->tx_msg.ksm_zc_cookies[0] = 0;
list_del(&tx->tx_zc_list);
list_add(&tx->tx_zc_list, &zlist);
if (!--count)
break;
}
}
spin_unlock(&peer->ksnp_lock);
list_for_each_entry_safe(tx, temp, &zlist, tx_zc_list) {
list_del(&tx->tx_zc_list);
ksocknal_tx_decref(tx);
}
return !count ? 0 : -EPROTO;
}
static int
ksocknal_send_hello_v1(struct ksock_conn *conn, struct ksock_hello_msg *hello)
{
struct socket *sock = conn->ksnc_sock;
struct lnet_hdr *hdr;
struct lnet_magicversion *hmv;
int rc;
int i;
BUILD_BUG_ON(sizeof(struct lnet_magicversion) != offsetof(struct lnet_hdr, src_nid));
hdr = kzalloc(sizeof(*hdr), GFP_NOFS);
if (!hdr) {
CERROR("Can't allocate struct lnet_hdr\n");
return -ENOMEM;
}
hmv = (struct lnet_magicversion *)&hdr->dest_nid;
/*
* Re-organize V2.x message header to V1.x (struct lnet_hdr)
* header and send out
*/
hmv->magic = cpu_to_le32(LNET_PROTO_TCP_MAGIC);
hmv->version_major = cpu_to_le16(KSOCK_PROTO_V1_MAJOR);
hmv->version_minor = cpu_to_le16(KSOCK_PROTO_V1_MINOR);
if (the_lnet.ln_testprotocompat) {
/* single-shot proto check */
LNET_LOCK();
if (the_lnet.ln_testprotocompat & 1) {
hmv->version_major++; /* just different! */
the_lnet.ln_testprotocompat &= ~1;
}
if (the_lnet.ln_testprotocompat & 2) {
hmv->magic = LNET_PROTO_MAGIC;
the_lnet.ln_testprotocompat &= ~2;
}
LNET_UNLOCK();
}
hdr->src_nid = cpu_to_le64(hello->kshm_src_nid);
hdr->src_pid = cpu_to_le32(hello->kshm_src_pid);
hdr->type = cpu_to_le32(LNET_MSG_HELLO);
hdr->payload_length = cpu_to_le32(hello->kshm_nips * sizeof(__u32));
hdr->msg.hello.type = cpu_to_le32(hello->kshm_ctype);
hdr->msg.hello.incarnation = cpu_to_le64(hello->kshm_src_incarnation);
rc = lnet_sock_write(sock, hdr, sizeof(*hdr), lnet_acceptor_timeout());
if (rc) {
CNETERR("Error %d sending HELLO hdr to %pI4h/%d\n",
rc, &conn->ksnc_ipaddr, conn->ksnc_port);
goto out;
}
if (!hello->kshm_nips)
goto out;
for (i = 0; i < (int)hello->kshm_nips; i++)
hello->kshm_ips[i] = __cpu_to_le32(hello->kshm_ips[i]);
rc = lnet_sock_write(sock, hello->kshm_ips,
hello->kshm_nips * sizeof(__u32),
lnet_acceptor_timeout());
if (rc) {
CNETERR("Error %d sending HELLO payload (%d) to %pI4h/%d\n",
rc, hello->kshm_nips,
&conn->ksnc_ipaddr, conn->ksnc_port);
}
out:
kfree(hdr);
return rc;
}
static int
ksocknal_send_hello_v2(struct ksock_conn *conn, struct ksock_hello_msg *hello)
{
struct socket *sock = conn->ksnc_sock;
int rc;
hello->kshm_magic = LNET_PROTO_MAGIC;
hello->kshm_version = conn->ksnc_proto->pro_version;
if (the_lnet.ln_testprotocompat) {
/* single-shot proto check */
LNET_LOCK();
if (the_lnet.ln_testprotocompat & 1) {
hello->kshm_version++; /* just different! */
the_lnet.ln_testprotocompat &= ~1;
}
LNET_UNLOCK();
}
rc = lnet_sock_write(sock, hello, offsetof(struct ksock_hello_msg, kshm_ips),
lnet_acceptor_timeout());
if (rc) {
CNETERR("Error %d sending HELLO hdr to %pI4h/%d\n",
rc, &conn->ksnc_ipaddr, conn->ksnc_port);
return rc;
}
if (!hello->kshm_nips)
return 0;
rc = lnet_sock_write(sock, hello->kshm_ips,
hello->kshm_nips * sizeof(__u32),
lnet_acceptor_timeout());
if (rc) {
CNETERR("Error %d sending HELLO payload (%d) to %pI4h/%d\n",
rc, hello->kshm_nips,
&conn->ksnc_ipaddr, conn->ksnc_port);
}
return rc;
}
static int
ksocknal_recv_hello_v1(struct ksock_conn *conn, struct ksock_hello_msg *hello,
int timeout)
{
struct socket *sock = conn->ksnc_sock;
struct lnet_hdr *hdr;
int rc;
int i;
hdr = kzalloc(sizeof(*hdr), GFP_NOFS);
if (!hdr) {
CERROR("Can't allocate struct lnet_hdr\n");
return -ENOMEM;
}
rc = lnet_sock_read(sock, &hdr->src_nid,
sizeof(*hdr) - offsetof(struct lnet_hdr, src_nid),
timeout);
if (rc) {
CERROR("Error %d reading rest of HELLO hdr from %pI4h\n",
rc, &conn->ksnc_ipaddr);
LASSERT(rc < 0 && rc != -EALREADY);
goto out;
}
/* ...and check we got what we expected */
if (hdr->type != cpu_to_le32(LNET_MSG_HELLO)) {
CERROR("Expecting a HELLO hdr, but got type %d from %pI4h\n",
le32_to_cpu(hdr->type),
&conn->ksnc_ipaddr);
rc = -EPROTO;
goto out;
}
hello->kshm_src_nid = le64_to_cpu(hdr->src_nid);
hello->kshm_src_pid = le32_to_cpu(hdr->src_pid);
hello->kshm_src_incarnation = le64_to_cpu(hdr->msg.hello.incarnation);
hello->kshm_ctype = le32_to_cpu(hdr->msg.hello.type);
hello->kshm_nips = le32_to_cpu(hdr->payload_length) /
sizeof(__u32);
if (hello->kshm_nips > LNET_MAX_INTERFACES) {
CERROR("Bad nips %d from ip %pI4h\n",
hello->kshm_nips, &conn->ksnc_ipaddr);
rc = -EPROTO;
goto out;
}
if (!hello->kshm_nips)
goto out;
rc = lnet_sock_read(sock, hello->kshm_ips,
hello->kshm_nips * sizeof(__u32), timeout);
if (rc) {
CERROR("Error %d reading IPs from ip %pI4h\n",
rc, &conn->ksnc_ipaddr);
LASSERT(rc < 0 && rc != -EALREADY);
goto out;
}
for (i = 0; i < (int)hello->kshm_nips; i++) {
hello->kshm_ips[i] = __le32_to_cpu(hello->kshm_ips[i]);
if (!hello->kshm_ips[i]) {
CERROR("Zero IP[%d] from ip %pI4h\n",
i, &conn->ksnc_ipaddr);
rc = -EPROTO;
break;
}
}
out:
kfree(hdr);
return rc;
}
static int
ksocknal_recv_hello_v2(struct ksock_conn *conn, struct ksock_hello_msg *hello,
int timeout)
{
struct socket *sock = conn->ksnc_sock;
int rc;
int i;
if (hello->kshm_magic == LNET_PROTO_MAGIC)
conn->ksnc_flip = 0;
else
conn->ksnc_flip = 1;
rc = lnet_sock_read(sock, &hello->kshm_src_nid,
offsetof(struct ksock_hello_msg, kshm_ips) -
offsetof(struct ksock_hello_msg, kshm_src_nid),
timeout);
if (rc) {
CERROR("Error %d reading HELLO from %pI4h\n",
rc, &conn->ksnc_ipaddr);
LASSERT(rc < 0 && rc != -EALREADY);
return rc;
}
if (conn->ksnc_flip) {
__swab32s(&hello->kshm_src_pid);
__swab64s(&hello->kshm_src_nid);
__swab32s(&hello->kshm_dst_pid);
__swab64s(&hello->kshm_dst_nid);
__swab64s(&hello->kshm_src_incarnation);
__swab64s(&hello->kshm_dst_incarnation);
__swab32s(&hello->kshm_ctype);
__swab32s(&hello->kshm_nips);
}
if (hello->kshm_nips > LNET_MAX_INTERFACES) {
CERROR("Bad nips %d from ip %pI4h\n",
hello->kshm_nips, &conn->ksnc_ipaddr);
return -EPROTO;
}
if (!hello->kshm_nips)
return 0;
rc = lnet_sock_read(sock, hello->kshm_ips,
hello->kshm_nips * sizeof(__u32), timeout);
if (rc) {
CERROR("Error %d reading IPs from ip %pI4h\n",
rc, &conn->ksnc_ipaddr);
LASSERT(rc < 0 && rc != -EALREADY);
return rc;
}
for (i = 0; i < (int)hello->kshm_nips; i++) {
if (conn->ksnc_flip)
__swab32s(&hello->kshm_ips[i]);
if (!hello->kshm_ips[i]) {
CERROR("Zero IP[%d] from ip %pI4h\n",
i, &conn->ksnc_ipaddr);
return -EPROTO;
}
}
return 0;
}
static void
ksocknal_pack_msg_v1(struct ksock_tx *tx)
{
/* V1.x has no KSOCK_MSG_NOOP */
LASSERT(tx->tx_msg.ksm_type != KSOCK_MSG_NOOP);
LASSERT(tx->tx_lnetmsg);
tx->tx_iov[0].iov_base = &tx->tx_lnetmsg->msg_hdr;
tx->tx_iov[0].iov_len = sizeof(struct lnet_hdr);
tx->tx_nob = tx->tx_lnetmsg->msg_len + sizeof(struct lnet_hdr);
tx->tx_resid = tx->tx_lnetmsg->msg_len + sizeof(struct lnet_hdr);
}
static void
ksocknal_pack_msg_v2(struct ksock_tx *tx)
{
tx->tx_iov[0].iov_base = &tx->tx_msg;
if (tx->tx_lnetmsg) {
LASSERT(tx->tx_msg.ksm_type != KSOCK_MSG_NOOP);
tx->tx_msg.ksm_u.lnetmsg.ksnm_hdr = tx->tx_lnetmsg->msg_hdr;
tx->tx_iov[0].iov_len = sizeof(struct ksock_msg);
tx->tx_nob = sizeof(struct ksock_msg) + tx->tx_lnetmsg->msg_len;
tx->tx_resid = sizeof(struct ksock_msg) + tx->tx_lnetmsg->msg_len;
} else {
LASSERT(tx->tx_msg.ksm_type == KSOCK_MSG_NOOP);
tx->tx_iov[0].iov_len = offsetof(struct ksock_msg, ksm_u.lnetmsg.ksnm_hdr);
tx->tx_nob = offsetof(struct ksock_msg, ksm_u.lnetmsg.ksnm_hdr);
tx->tx_resid = offsetof(struct ksock_msg, ksm_u.lnetmsg.ksnm_hdr);
}
/*
* Don't checksum before start sending, because packet can be
* piggybacked with ACK
*/
}
static void
ksocknal_unpack_msg_v1(struct ksock_msg *msg)
{
msg->ksm_csum = 0;
msg->ksm_type = KSOCK_MSG_LNET;
msg->ksm_zc_cookies[0] = 0;
msg->ksm_zc_cookies[1] = 0;
}
static void
ksocknal_unpack_msg_v2(struct ksock_msg *msg)
{
return; /* Do nothing */
}
struct ksock_proto ksocknal_protocol_v1x = {
.pro_version = KSOCK_PROTO_V1,
.pro_send_hello = ksocknal_send_hello_v1,
.pro_recv_hello = ksocknal_recv_hello_v1,
.pro_pack = ksocknal_pack_msg_v1,
.pro_unpack = ksocknal_unpack_msg_v1,
.pro_queue_tx_msg = ksocknal_queue_tx_msg_v1,
.pro_handle_zcreq = NULL,
.pro_handle_zcack = NULL,
.pro_queue_tx_zcack = NULL,
.pro_match_tx = ksocknal_match_tx
};
struct ksock_proto ksocknal_protocol_v2x = {
.pro_version = KSOCK_PROTO_V2,
.pro_send_hello = ksocknal_send_hello_v2,
.pro_recv_hello = ksocknal_recv_hello_v2,
.pro_pack = ksocknal_pack_msg_v2,
.pro_unpack = ksocknal_unpack_msg_v2,
.pro_queue_tx_msg = ksocknal_queue_tx_msg_v2,
.pro_queue_tx_zcack = ksocknal_queue_tx_zcack_v2,
.pro_handle_zcreq = ksocknal_handle_zcreq,
.pro_handle_zcack = ksocknal_handle_zcack,
.pro_match_tx = ksocknal_match_tx
};
struct ksock_proto ksocknal_protocol_v3x = {
.pro_version = KSOCK_PROTO_V3,
.pro_send_hello = ksocknal_send_hello_v2,
.pro_recv_hello = ksocknal_recv_hello_v2,
.pro_pack = ksocknal_pack_msg_v2,
.pro_unpack = ksocknal_unpack_msg_v2,
.pro_queue_tx_msg = ksocknal_queue_tx_msg_v2,
.pro_queue_tx_zcack = ksocknal_queue_tx_zcack_v3,
.pro_handle_zcreq = ksocknal_handle_zcreq,
.pro_handle_zcack = ksocknal_handle_zcack,
.pro_match_tx = ksocknal_match_tx_v3
};

View File

@ -1,16 +0,0 @@
# SPDX-License-Identifier: GPL-2.0
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
obj-$(CONFIG_LNET) += libcfs.o
libcfs-obj-y += linux-tracefile.o linux-debug.o
libcfs-obj-y += linux-crypto.o
libcfs-obj-y += linux-crypto-adler.o
libcfs-obj-y += debug.o fail.o module.o tracefile.o
libcfs-obj-y += libcfs_string.o hash.o
libcfs-obj-$(CONFIG_SMP) += libcfs_cpu.o
libcfs-obj-y += libcfs_mem.o libcfs_lock.o
libcfs-objs := $(libcfs-obj-y)

View File

@ -1,461 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* libcfs/libcfs/debug.c
*
* Author: Phil Schwan <phil@clusterfs.com>
*
*/
# define DEBUG_SUBSYSTEM S_LNET
#include <linux/module.h>
#include <linux/ctype.h>
#include <linux/libcfs/libcfs_string.h>
#include <linux/kthread.h>
#include "tracefile.h"
static char debug_file_name[1024];
unsigned int libcfs_subsystem_debug = ~0;
EXPORT_SYMBOL(libcfs_subsystem_debug);
module_param(libcfs_subsystem_debug, int, 0644);
MODULE_PARM_DESC(libcfs_subsystem_debug, "Lustre kernel debug subsystem mask");
unsigned int libcfs_debug = (D_CANTMASK |
D_NETERROR | D_HA | D_CONFIG | D_IOCTL);
EXPORT_SYMBOL(libcfs_debug);
module_param(libcfs_debug, int, 0644);
MODULE_PARM_DESC(libcfs_debug, "Lustre kernel debug mask");
static int libcfs_param_debug_mb_set(const char *val,
const struct kernel_param *kp)
{
int rc;
unsigned int num;
rc = kstrtouint(val, 0, &num);
if (rc < 0)
return rc;
if (!*((unsigned int *)kp->arg)) {
*((unsigned int *)kp->arg) = num;
return 0;
}
rc = cfs_trace_set_debug_mb(num);
if (!rc)
*((unsigned int *)kp->arg) = cfs_trace_get_debug_mb();
return rc;
}
/* While debug_mb setting look like unsigned int, in fact
* it needs quite a bunch of extra processing, so we define special
* debugmb parameter type with corresponding methods to handle this case
*/
static const struct kernel_param_ops param_ops_debugmb = {
.set = libcfs_param_debug_mb_set,
.get = param_get_uint,
};
#define param_check_debugmb(name, p) \
__param_check(name, p, unsigned int)
static unsigned int libcfs_debug_mb;
module_param(libcfs_debug_mb, debugmb, 0644);
MODULE_PARM_DESC(libcfs_debug_mb, "Total debug buffer size.");
unsigned int libcfs_printk = D_CANTMASK;
module_param(libcfs_printk, uint, 0644);
MODULE_PARM_DESC(libcfs_printk, "Lustre kernel debug console mask");
unsigned int libcfs_console_ratelimit = 1;
module_param(libcfs_console_ratelimit, uint, 0644);
MODULE_PARM_DESC(libcfs_console_ratelimit, "Lustre kernel debug console ratelimit (0 to disable)");
static int param_set_delay_minmax(const char *val,
const struct kernel_param *kp,
long min, long max)
{
long d;
int sec;
int rc;
rc = kstrtoint(val, 0, &sec);
if (rc)
return -EINVAL;
d = sec * HZ / 100;
if (d < min || d > max)
return -EINVAL;
*((unsigned int *)kp->arg) = d;
return 0;
}
static int param_get_delay(char *buffer, const struct kernel_param *kp)
{
unsigned int d = *(unsigned int *)kp->arg;
return sprintf(buffer, "%u", (unsigned int)(d * 100) / HZ);
}
unsigned int libcfs_console_max_delay;
unsigned int libcfs_console_min_delay;
static int param_set_console_max_delay(const char *val,
const struct kernel_param *kp)
{
return param_set_delay_minmax(val, kp,
libcfs_console_min_delay, INT_MAX);
}
static const struct kernel_param_ops param_ops_console_max_delay = {
.set = param_set_console_max_delay,
.get = param_get_delay,
};
#define param_check_console_max_delay(name, p) \
__param_check(name, p, unsigned int)
module_param(libcfs_console_max_delay, console_max_delay, 0644);
MODULE_PARM_DESC(libcfs_console_max_delay, "Lustre kernel debug console max delay (jiffies)");
static int param_set_console_min_delay(const char *val,
const struct kernel_param *kp)
{
return param_set_delay_minmax(val, kp,
1, libcfs_console_max_delay);
}
static const struct kernel_param_ops param_ops_console_min_delay = {
.set = param_set_console_min_delay,
.get = param_get_delay,
};
#define param_check_console_min_delay(name, p) \
__param_check(name, p, unsigned int)
module_param(libcfs_console_min_delay, console_min_delay, 0644);
MODULE_PARM_DESC(libcfs_console_min_delay, "Lustre kernel debug console min delay (jiffies)");
static int param_set_uint_minmax(const char *val,
const struct kernel_param *kp,
unsigned int min, unsigned int max)
{
unsigned int num;
int ret;
if (!val)
return -EINVAL;
ret = kstrtouint(val, 0, &num);
if (ret < 0 || num < min || num > max)
return -EINVAL;
*((unsigned int *)kp->arg) = num;
return 0;
}
static int param_set_uintpos(const char *val, const struct kernel_param *kp)
{
return param_set_uint_minmax(val, kp, 1, -1);
}
static const struct kernel_param_ops param_ops_uintpos = {
.set = param_set_uintpos,
.get = param_get_uint,
};
#define param_check_uintpos(name, p) \
__param_check(name, p, unsigned int)
unsigned int libcfs_console_backoff = CDEBUG_DEFAULT_BACKOFF;
module_param(libcfs_console_backoff, uintpos, 0644);
MODULE_PARM_DESC(libcfs_console_backoff, "Lustre kernel debug console backoff factor");
unsigned int libcfs_debug_binary = 1;
unsigned int libcfs_stack = 3 * THREAD_SIZE / 4;
EXPORT_SYMBOL(libcfs_stack);
unsigned int libcfs_catastrophe;
EXPORT_SYMBOL(libcfs_catastrophe);
unsigned int libcfs_panic_on_lbug = 1;
module_param(libcfs_panic_on_lbug, uint, 0644);
MODULE_PARM_DESC(libcfs_panic_on_lbug, "Lustre kernel panic on LBUG");
static wait_queue_head_t debug_ctlwq;
char libcfs_debug_file_path_arr[PATH_MAX] = LIBCFS_DEBUG_FILE_PATH_DEFAULT;
/* We need to pass a pointer here, but elsewhere this must be a const */
static char *libcfs_debug_file_path;
module_param(libcfs_debug_file_path, charp, 0644);
MODULE_PARM_DESC(libcfs_debug_file_path,
"Path for dumping debug logs, set 'NONE' to prevent log dumping");
int libcfs_panic_in_progress;
/* libcfs_debug_token2mask() expects the returned string in lower-case */
static const char *
libcfs_debug_subsys2str(int subsys)
{
static const char * const libcfs_debug_subsystems[] =
LIBCFS_DEBUG_SUBSYS_NAMES;
if (subsys >= ARRAY_SIZE(libcfs_debug_subsystems))
return NULL;
return libcfs_debug_subsystems[subsys];
}
/* libcfs_debug_token2mask() expects the returned string in lower-case */
static const char *
libcfs_debug_dbg2str(int debug)
{
static const char * const libcfs_debug_masks[] =
LIBCFS_DEBUG_MASKS_NAMES;
if (debug >= ARRAY_SIZE(libcfs_debug_masks))
return NULL;
return libcfs_debug_masks[debug];
}
int
libcfs_debug_mask2str(char *str, int size, int mask, int is_subsys)
{
const char *(*fn)(int bit) = is_subsys ? libcfs_debug_subsys2str :
libcfs_debug_dbg2str;
int len = 0;
const char *token;
int i;
if (!mask) { /* "0" */
if (size > 0)
str[0] = '0';
len = 1;
} else { /* space-separated tokens */
for (i = 0; i < 32; i++) {
if (!(mask & (1 << i)))
continue;
token = fn(i);
if (!token) /* unused bit */
continue;
if (len > 0) { /* separator? */
if (len < size)
str[len] = ' ';
len++;
}
while (*token) {
if (len < size)
str[len] = *token;
token++;
len++;
}
}
}
/* terminate 'str' */
if (len < size)
str[len] = 0;
else
str[size - 1] = 0;
return len;
}
int
libcfs_debug_str2mask(int *mask, const char *str, int is_subsys)
{
const char *(*fn)(int bit) = is_subsys ? libcfs_debug_subsys2str :
libcfs_debug_dbg2str;
int m = 0;
int matched;
int n;
int t;
/* Allow a number for backwards compatibility */
for (n = strlen(str); n > 0; n--)
if (!isspace(str[n - 1]))
break;
matched = n;
t = sscanf(str, "%i%n", &m, &matched);
if (t >= 1 && matched == n) {
/* don't print warning for lctl set_param debug=0 or -1 */
if (m && m != -1)
CWARN("You are trying to use a numerical value for the mask - this will be deprecated in a future release.\n");
*mask = m;
return 0;
}
return cfs_str2mask(str, fn, mask, is_subsys ? 0 : D_CANTMASK,
0xffffffff);
}
/**
* Dump Lustre log to ::debug_file_path by calling tracefile_dump_all_pages()
*/
void libcfs_debug_dumplog_internal(void *arg)
{
static time64_t last_dump_time;
time64_t current_time;
void *journal_info;
journal_info = current->journal_info;
current->journal_info = NULL;
current_time = ktime_get_real_seconds();
if (strncmp(libcfs_debug_file_path_arr, "NONE", 4) &&
current_time > last_dump_time) {
last_dump_time = current_time;
snprintf(debug_file_name, sizeof(debug_file_name) - 1,
"%s.%lld.%ld", libcfs_debug_file_path_arr,
(s64)current_time, (long)arg);
pr_alert("LustreError: dumping log to %s\n", debug_file_name);
cfs_tracefile_dump_all_pages(debug_file_name);
libcfs_run_debug_log_upcall(debug_file_name);
}
current->journal_info = journal_info;
}
static int libcfs_debug_dumplog_thread(void *arg)
{
libcfs_debug_dumplog_internal(arg);
wake_up(&debug_ctlwq);
return 0;
}
void libcfs_debug_dumplog(void)
{
wait_queue_entry_t wait;
struct task_struct *dumper;
/* we're being careful to ensure that the kernel thread is
* able to set our state to running as it exits before we
* get to schedule()
*/
init_waitqueue_entry(&wait, current);
add_wait_queue(&debug_ctlwq, &wait);
dumper = kthread_run(libcfs_debug_dumplog_thread,
(void *)(long)current->pid,
"libcfs_debug_dumper");
set_current_state(TASK_INTERRUPTIBLE);
if (IS_ERR(dumper))
pr_err("LustreError: cannot start log dump thread: %ld\n",
PTR_ERR(dumper));
else
schedule();
/* be sure to teardown if cfs_create_thread() failed */
remove_wait_queue(&debug_ctlwq, &wait);
set_current_state(TASK_RUNNING);
}
EXPORT_SYMBOL(libcfs_debug_dumplog);
int libcfs_debug_init(unsigned long bufsize)
{
unsigned int max = libcfs_debug_mb;
int rc = 0;
init_waitqueue_head(&debug_ctlwq);
if (libcfs_console_max_delay <= 0 || /* not set by user or */
libcfs_console_min_delay <= 0 || /* set to invalid values */
libcfs_console_min_delay >= libcfs_console_max_delay) {
libcfs_console_max_delay = CDEBUG_DEFAULT_MAX_DELAY;
libcfs_console_min_delay = CDEBUG_DEFAULT_MIN_DELAY;
}
if (libcfs_debug_file_path) {
strlcpy(libcfs_debug_file_path_arr,
libcfs_debug_file_path,
sizeof(libcfs_debug_file_path_arr));
}
/* If libcfs_debug_mb is set to an invalid value or uninitialized
* then just make the total buffers smp_num_cpus * TCD_MAX_PAGES
*/
if (max > cfs_trace_max_debug_mb() || max < num_possible_cpus()) {
max = TCD_MAX_PAGES;
} else {
max = max / num_possible_cpus();
max <<= (20 - PAGE_SHIFT);
}
rc = cfs_tracefile_init(max);
if (!rc) {
libcfs_register_panic_notifier();
libcfs_debug_mb = cfs_trace_get_debug_mb();
}
return rc;
}
int libcfs_debug_cleanup(void)
{
libcfs_unregister_panic_notifier();
cfs_tracefile_exit();
return 0;
}
int libcfs_debug_clear_buffer(void)
{
cfs_trace_flush_pages();
return 0;
}
/* Debug markers, although printed by S_LNET should not be marked as such. */
#undef DEBUG_SUBSYSTEM
#define DEBUG_SUBSYSTEM S_UNDEFINED
int libcfs_debug_mark_buffer(const char *text)
{
CDEBUG(D_TRACE,
"***************************************************\n");
LCONSOLE(D_WARNING, "DEBUG MARKER: %s\n", text);
CDEBUG(D_TRACE,
"***************************************************\n");
return 0;
}
#undef DEBUG_SUBSYSTEM
#define DEBUG_SUBSYSTEM S_LNET

View File

@ -1,146 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see http://www.gnu.org/licenses
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Oracle Corporation, Inc.
*/
#include <linux/types.h>
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/libcfs/libcfs.h>
#include <linux/random.h>
unsigned long cfs_fail_loc;
EXPORT_SYMBOL(cfs_fail_loc);
unsigned int cfs_fail_val;
EXPORT_SYMBOL(cfs_fail_val);
int cfs_fail_err;
EXPORT_SYMBOL(cfs_fail_err);
DECLARE_WAIT_QUEUE_HEAD(cfs_race_waitq);
EXPORT_SYMBOL(cfs_race_waitq);
int cfs_race_state;
EXPORT_SYMBOL(cfs_race_state);
int __cfs_fail_check_set(u32 id, u32 value, int set)
{
static atomic_t cfs_fail_count = ATOMIC_INIT(0);
LASSERT(!(id & CFS_FAIL_ONCE));
if ((cfs_fail_loc & (CFS_FAILED | CFS_FAIL_ONCE)) ==
(CFS_FAILED | CFS_FAIL_ONCE)) {
atomic_set(&cfs_fail_count, 0); /* paranoia */
return 0;
}
/* Fail 1/cfs_fail_val times */
if (cfs_fail_loc & CFS_FAIL_RAND) {
if (cfs_fail_val < 2 || prandom_u32_max(cfs_fail_val) > 0)
return 0;
}
/* Skip the first cfs_fail_val, then fail */
if (cfs_fail_loc & CFS_FAIL_SKIP) {
if (atomic_inc_return(&cfs_fail_count) <= cfs_fail_val)
return 0;
}
/* check cfs_fail_val... */
if (set == CFS_FAIL_LOC_VALUE) {
if (cfs_fail_val != -1 && cfs_fail_val != value)
return 0;
}
/* Fail cfs_fail_val times, overridden by FAIL_ONCE */
if (cfs_fail_loc & CFS_FAIL_SOME &&
(!(cfs_fail_loc & CFS_FAIL_ONCE) || cfs_fail_val <= 1)) {
int count = atomic_inc_return(&cfs_fail_count);
if (count >= cfs_fail_val) {
set_bit(CFS_FAIL_ONCE_BIT, &cfs_fail_loc);
atomic_set(&cfs_fail_count, 0);
/* we are lost race to increase */
if (count > cfs_fail_val)
return 0;
}
}
/* Take into account the current call for FAIL_ONCE for ORSET only,
* as RESET is a new fail_loc, it does not change the current call
*/
if ((set == CFS_FAIL_LOC_ORSET) && (value & CFS_FAIL_ONCE))
set_bit(CFS_FAIL_ONCE_BIT, &cfs_fail_loc);
/* Lost race to set CFS_FAILED_BIT. */
if (test_and_set_bit(CFS_FAILED_BIT, &cfs_fail_loc)) {
/* If CFS_FAIL_ONCE is valid, only one process can fail,
* otherwise multi-process can fail at the same time.
*/
if (cfs_fail_loc & CFS_FAIL_ONCE)
return 0;
}
switch (set) {
case CFS_FAIL_LOC_NOSET:
case CFS_FAIL_LOC_VALUE:
break;
case CFS_FAIL_LOC_ORSET:
cfs_fail_loc |= value & ~(CFS_FAILED | CFS_FAIL_ONCE);
break;
case CFS_FAIL_LOC_RESET:
cfs_fail_loc = value;
atomic_set(&cfs_fail_count, 0);
break;
default:
LASSERTF(0, "called with bad set %u\n", set);
break;
}
return 1;
}
EXPORT_SYMBOL(__cfs_fail_check_set);
int __cfs_fail_timeout_set(u32 id, u32 value, int ms, int set)
{
int ret;
ret = __cfs_fail_check_set(id, value, set);
if (ret && likely(ms > 0)) {
CERROR("cfs_fail_timeout id %x sleeping for %dms\n",
id, ms);
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(ms * HZ / 1000);
CERROR("cfs_fail_timeout id %x awake\n", id);
}
return ret;
}
EXPORT_SYMBOL(__cfs_fail_timeout_set);

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,155 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* GPL HEADER END
*/
/* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2012, 2015 Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* Author: liang@whamcloud.com
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/libcfs/libcfs.h>
#include <linux/libcfs/libcfs_cpu.h>
/** destroy cpu-partition lock, see libcfs_private.h for more detail */
void
cfs_percpt_lock_free(struct cfs_percpt_lock *pcl)
{
LASSERT(pcl->pcl_locks);
LASSERT(!pcl->pcl_locked);
cfs_percpt_free(pcl->pcl_locks);
kfree(pcl);
}
EXPORT_SYMBOL(cfs_percpt_lock_free);
/**
* create cpu-partition lock, see libcfs_private.h for more detail.
*
* cpu-partition lock is designed for large-scale SMP system, so we need to
* reduce cacheline conflict as possible as we can, that's the
* reason we always allocate cacheline-aligned memory block.
*/
struct cfs_percpt_lock *
cfs_percpt_lock_create(struct cfs_cpt_table *cptab,
struct lock_class_key *keys)
{
struct cfs_percpt_lock *pcl;
spinlock_t *lock;
int i;
/* NB: cptab can be NULL, pcl will be for HW CPUs on that case */
pcl = kzalloc(sizeof(*pcl), GFP_NOFS);
if (!pcl)
return NULL;
pcl->pcl_cptab = cptab;
pcl->pcl_locks = cfs_percpt_alloc(cptab, sizeof(*lock));
if (!pcl->pcl_locks) {
kfree(pcl);
return NULL;
}
if (!keys)
CWARN("Cannot setup class key for percpt lock, you may see recursive locking warnings which are actually fake.\n");
cfs_percpt_for_each(lock, i, pcl->pcl_locks) {
spin_lock_init(lock);
if (keys)
lockdep_set_class(lock, &keys[i]);
}
return pcl;
}
EXPORT_SYMBOL(cfs_percpt_lock_create);
/**
* lock a CPU partition
*
* \a index != CFS_PERCPT_LOCK_EX
* hold private lock indexed by \a index
*
* \a index == CFS_PERCPT_LOCK_EX
* exclusively lock @pcl and nobody can take private lock
*/
void
cfs_percpt_lock(struct cfs_percpt_lock *pcl, int index)
__acquires(pcl->pcl_locks)
{
int ncpt = cfs_cpt_number(pcl->pcl_cptab);
int i;
LASSERT(index >= CFS_PERCPT_LOCK_EX && index < ncpt);
if (ncpt == 1) {
index = 0;
} else { /* serialize with exclusive lock */
while (pcl->pcl_locked)
cpu_relax();
}
if (likely(index != CFS_PERCPT_LOCK_EX)) {
spin_lock(pcl->pcl_locks[index]);
return;
}
/* exclusive lock request */
for (i = 0; i < ncpt; i++) {
spin_lock(pcl->pcl_locks[i]);
if (!i) {
LASSERT(!pcl->pcl_locked);
/* nobody should take private lock after this
* so I wouldn't starve for too long time
*/
pcl->pcl_locked = 1;
}
}
}
EXPORT_SYMBOL(cfs_percpt_lock);
/** unlock a CPU partition */
void
cfs_percpt_unlock(struct cfs_percpt_lock *pcl, int index)
__releases(pcl->pcl_locks)
{
int ncpt = cfs_cpt_number(pcl->pcl_cptab);
int i;
index = ncpt == 1 ? 0 : index;
if (likely(index != CFS_PERCPT_LOCK_EX)) {
spin_unlock(pcl->pcl_locks[index]);
return;
}
for (i = ncpt - 1; i >= 0; i--) {
if (!i) {
LASSERT(pcl->pcl_locked);
pcl->pcl_locked = 0;
}
spin_unlock(pcl->pcl_locks[i]);
}
}
EXPORT_SYMBOL(cfs_percpt_unlock);

View File

@ -1,171 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* GPL HEADER END
*/
/*
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* Author: liang@whamcloud.com
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/libcfs/libcfs_cpu.h>
#include <linux/slab.h>
#include <linux/mm.h>
struct cfs_var_array {
unsigned int va_count; /* # of buffers */
unsigned int va_size; /* size of each var */
struct cfs_cpt_table *va_cptab; /* cpu partition table */
void *va_ptrs[0]; /* buffer addresses */
};
/*
* free per-cpu data, see more detail in cfs_percpt_free
*/
void
cfs_percpt_free(void *vars)
{
struct cfs_var_array *arr;
int i;
arr = container_of(vars, struct cfs_var_array, va_ptrs[0]);
for (i = 0; i < arr->va_count; i++)
kfree(arr->va_ptrs[i]);
kvfree(arr);
}
EXPORT_SYMBOL(cfs_percpt_free);
/*
* allocate per cpu-partition variables, returned value is an array of pointers,
* variable can be indexed by CPU partition ID, i.e:
*
* arr = cfs_percpt_alloc(cfs_cpu_pt, size);
* then caller can access memory block for CPU 0 by arr[0],
* memory block for CPU 1 by arr[1]...
* memory block for CPU N by arr[N]...
*
* cacheline aligned.
*/
void *
cfs_percpt_alloc(struct cfs_cpt_table *cptab, unsigned int size)
{
struct cfs_var_array *arr;
int count;
int i;
count = cfs_cpt_number(cptab);
arr = kvzalloc(offsetof(struct cfs_var_array, va_ptrs[count]),
GFP_KERNEL);
if (!arr)
return NULL;
size = L1_CACHE_ALIGN(size);
arr->va_size = size;
arr->va_count = count;
arr->va_cptab = cptab;
for (i = 0; i < count; i++) {
arr->va_ptrs[i] = kzalloc_node(size, GFP_KERNEL,
cfs_cpt_spread_node(cptab, i));
if (!arr->va_ptrs[i]) {
cfs_percpt_free((void *)&arr->va_ptrs[0]);
return NULL;
}
}
return (void *)&arr->va_ptrs[0];
}
EXPORT_SYMBOL(cfs_percpt_alloc);
/*
* return number of CPUs (or number of elements in per-cpu data)
* according to cptab of @vars
*/
int
cfs_percpt_number(void *vars)
{
struct cfs_var_array *arr;
arr = container_of(vars, struct cfs_var_array, va_ptrs[0]);
return arr->va_count;
}
EXPORT_SYMBOL(cfs_percpt_number);
/*
* free variable array, see more detail in cfs_array_alloc
*/
void
cfs_array_free(void *vars)
{
struct cfs_var_array *arr;
int i;
arr = container_of(vars, struct cfs_var_array, va_ptrs[0]);
for (i = 0; i < arr->va_count; i++) {
if (!arr->va_ptrs[i])
continue;
kvfree(arr->va_ptrs[i]);
}
kvfree(arr);
}
EXPORT_SYMBOL(cfs_array_free);
/*
* allocate a variable array, returned value is an array of pointers.
* Caller can specify length of array by @count, @size is size of each
* memory block in array.
*/
void *
cfs_array_alloc(int count, unsigned int size)
{
struct cfs_var_array *arr;
int i;
arr = kvmalloc(offsetof(struct cfs_var_array, va_ptrs[count]), GFP_KERNEL);
if (!arr)
return NULL;
arr->va_count = count;
arr->va_size = size;
for (i = 0; i < count; i++) {
arr->va_ptrs[i] = kvzalloc(size, GFP_KERNEL);
if (!arr->va_ptrs[i]) {
cfs_array_free((void *)&arr->va_ptrs[0]);
return NULL;
}
}
return (void *)&arr->va_ptrs[0];
}
EXPORT_SYMBOL(cfs_array_alloc);

View File

@ -1,562 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, 2015 Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* String manipulation functions.
*
* libcfs/libcfs/libcfs_string.c
*
* Author: Nathan Rutman <nathan.rutman@sun.com>
*/
#include <linux/ctype.h>
#include <linux/string.h>
#include <linux/errno.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/libcfs/libcfs.h>
#include <linux/libcfs/libcfs_string.h>
/* Convert a text string to a bitmask */
int cfs_str2mask(const char *str, const char *(*bit2str)(int bit),
int *oldmask, int minmask, int allmask)
{
const char *debugstr;
char op = '\0';
int newmask = minmask, i, len, found = 0;
/* <str> must be a list of tokens separated by whitespace
* and optionally an operator ('+' or '-'). If an operator
* appears first in <str>, '*oldmask' is used as the starting point
* (relative), otherwise minmask is used (absolute). An operator
* applies to all following tokens up to the next operator.
*/
while (*str != '\0') {
while (isspace(*str))
str++;
if (*str == '\0')
break;
if (*str == '+' || *str == '-') {
op = *str++;
if (!found)
/* only if first token is relative */
newmask = *oldmask;
while (isspace(*str))
str++;
if (*str == '\0') /* trailing op */
return -EINVAL;
}
/* find token length */
len = 0;
while (str[len] != '\0' && !isspace(str[len]) &&
str[len] != '+' && str[len] != '-')
len++;
/* match token */
found = 0;
for (i = 0; i < 32; i++) {
debugstr = bit2str(i);
if (debugstr && strlen(debugstr) == len &&
!strncasecmp(str, debugstr, len)) {
if (op == '-')
newmask &= ~(1 << i);
else
newmask |= (1 << i);
found = 1;
break;
}
}
if (!found && len == 3 &&
!strncasecmp(str, "ALL", len)) {
if (op == '-')
newmask = minmask;
else
newmask = allmask;
found = 1;
}
if (!found) {
CWARN("unknown mask '%.*s'.\n"
"mask usage: [+|-]<all|type> ...\n", len, str);
return -EINVAL;
}
str += len;
}
*oldmask = newmask;
return 0;
}
/* get the first string out of @str */
char *cfs_firststr(char *str, size_t size)
{
size_t i = 0;
char *end;
/* trim leading spaces */
while (i < size && *str && isspace(*str)) {
++i;
++str;
}
/* string with all spaces */
if (*str == '\0')
goto out;
end = str;
while (i < size && *end != '\0' && !isspace(*end)) {
++i;
++end;
}
*end = '\0';
out:
return str;
}
EXPORT_SYMBOL(cfs_firststr);
/**
* Extracts tokens from strings.
*
* Looks for \a delim in string \a next, sets \a res to point to
* substring before the delimiter, sets \a next right after the found
* delimiter.
*
* \retval 1 if \a res points to a string of non-whitespace characters
* \retval 0 otherwise
*/
int
cfs_gettok(struct cfs_lstr *next, char delim, struct cfs_lstr *res)
{
char *end;
if (!next->ls_str)
return 0;
/* skip leading white spaces */
while (next->ls_len) {
if (!isspace(*next->ls_str))
break;
next->ls_str++;
next->ls_len--;
}
if (!next->ls_len) /* whitespaces only */
return 0;
if (*next->ls_str == delim) {
/* first non-writespace is the delimiter */
return 0;
}
res->ls_str = next->ls_str;
end = memchr(next->ls_str, delim, next->ls_len);
if (!end) {
/* there is no the delimeter in the string */
end = next->ls_str + next->ls_len;
next->ls_str = NULL;
} else {
next->ls_str = end + 1;
next->ls_len -= (end - res->ls_str + 1);
}
/* skip ending whitespaces */
while (--end != res->ls_str) {
if (!isspace(*end))
break;
}
res->ls_len = end - res->ls_str + 1;
return 1;
}
EXPORT_SYMBOL(cfs_gettok);
/**
* Converts string to integer.
*
* Accepts decimal and hexadecimal number recordings.
*
* \retval 1 if first \a nob chars of \a str convert to decimal or
* hexadecimal integer in the range [\a min, \a max]
* \retval 0 otherwise
*/
int
cfs_str2num_check(char *str, int nob, unsigned int *num,
unsigned int min, unsigned int max)
{
bool all_numbers = true;
char *endp, cache;
int rc;
/**
* kstrouint can only handle strings composed
* of only numbers. We need to scan the string
* passed in for the first non-digit character
* and end the string at that location. If we
* don't find any non-digit character we still
* need to place a '\0' at position nob since
* we are not interested in the rest of the
* string which is longer than nob in size.
* After we are done the character at the
* position we placed '\0' must be restored.
*/
for (endp = str; endp < str + nob; endp++) {
if (!isdigit(*endp)) {
all_numbers = false;
break;
}
}
cache = *endp;
*endp = '\0';
rc = kstrtouint(str, 10, num);
*endp = cache;
if (rc || !all_numbers)
return 0;
return (*num >= min && *num <= max);
}
EXPORT_SYMBOL(cfs_str2num_check);
/**
* Parses \<range_expr\> token of the syntax. If \a bracketed is false,
* \a src should only have a single token which can be \<number\> or \*
*
* \retval pointer to allocated range_expr and initialized
* range_expr::re_lo, range_expr::re_hi and range_expr:re_stride if \a
`* src parses to
* \<number\> |
* \<number\> '-' \<number\> |
* \<number\> '-' \<number\> '/' \<number\>
* \retval 0 will be returned if it can be parsed, otherwise -EINVAL or
* -ENOMEM will be returned.
*/
static int
cfs_range_expr_parse(struct cfs_lstr *src, unsigned int min, unsigned int max,
int bracketed, struct cfs_range_expr **expr)
{
struct cfs_range_expr *re;
struct cfs_lstr tok;
re = kzalloc(sizeof(*re), GFP_NOFS);
if (!re)
return -ENOMEM;
if (src->ls_len == 1 && src->ls_str[0] == '*') {
re->re_lo = min;
re->re_hi = max;
re->re_stride = 1;
goto out;
}
if (cfs_str2num_check(src->ls_str, src->ls_len,
&re->re_lo, min, max)) {
/* <number> is parsed */
re->re_hi = re->re_lo;
re->re_stride = 1;
goto out;
}
if (!bracketed || !cfs_gettok(src, '-', &tok))
goto failed;
if (!cfs_str2num_check(tok.ls_str, tok.ls_len,
&re->re_lo, min, max))
goto failed;
/* <number> - */
if (cfs_str2num_check(src->ls_str, src->ls_len,
&re->re_hi, min, max)) {
/* <number> - <number> is parsed */
re->re_stride = 1;
goto out;
}
/* go to check <number> '-' <number> '/' <number> */
if (cfs_gettok(src, '/', &tok)) {
if (!cfs_str2num_check(tok.ls_str, tok.ls_len,
&re->re_hi, min, max))
goto failed;
/* <number> - <number> / ... */
if (cfs_str2num_check(src->ls_str, src->ls_len,
&re->re_stride, min, max)) {
/* <number> - <number> / <number> is parsed */
goto out;
}
}
out:
*expr = re;
return 0;
failed:
kfree(re);
return -EINVAL;
}
/**
* Print the range expression \a re into specified \a buffer.
* If \a bracketed is true, expression does not need additional
* brackets.
*
* \retval number of characters written
*/
static int
cfs_range_expr_print(char *buffer, int count, struct cfs_range_expr *expr,
bool bracketed)
{
int i;
char s[] = "[";
char e[] = "]";
if (bracketed) {
s[0] = '\0';
e[0] = '\0';
}
if (expr->re_lo == expr->re_hi)
i = scnprintf(buffer, count, "%u", expr->re_lo);
else if (expr->re_stride == 1)
i = scnprintf(buffer, count, "%s%u-%u%s",
s, expr->re_lo, expr->re_hi, e);
else
i = scnprintf(buffer, count, "%s%u-%u/%u%s",
s, expr->re_lo, expr->re_hi, expr->re_stride, e);
return i;
}
/**
* Print a list of range expressions (\a expr_list) into specified \a buffer.
* If the list contains several expressions, separate them with comma
* and surround the list with brackets.
*
* \retval number of characters written
*/
int
cfs_expr_list_print(char *buffer, int count, struct cfs_expr_list *expr_list)
{
struct cfs_range_expr *expr;
int i = 0, j = 0;
int numexprs = 0;
if (count <= 0)
return 0;
list_for_each_entry(expr, &expr_list->el_exprs, re_link)
numexprs++;
if (numexprs > 1)
i += scnprintf(buffer + i, count - i, "[");
list_for_each_entry(expr, &expr_list->el_exprs, re_link) {
if (j++)
i += scnprintf(buffer + i, count - i, ",");
i += cfs_range_expr_print(buffer + i, count - i, expr,
numexprs > 1);
}
if (numexprs > 1)
i += scnprintf(buffer + i, count - i, "]");
return i;
}
EXPORT_SYMBOL(cfs_expr_list_print);
/**
* Matches value (\a value) against ranges expression list \a expr_list.
*
* \retval 1 if \a value matches
* \retval 0 otherwise
*/
int
cfs_expr_list_match(u32 value, struct cfs_expr_list *expr_list)
{
struct cfs_range_expr *expr;
list_for_each_entry(expr, &expr_list->el_exprs, re_link) {
if (value >= expr->re_lo && value <= expr->re_hi &&
!((value - expr->re_lo) % expr->re_stride))
return 1;
}
return 0;
}
EXPORT_SYMBOL(cfs_expr_list_match);
/**
* Convert express list (\a expr_list) to an array of all matched values
*
* \retval N N is total number of all matched values
* \retval 0 if expression list is empty
* \retval < 0 for failure
*/
int
cfs_expr_list_values(struct cfs_expr_list *expr_list, int max, u32 **valpp)
{
struct cfs_range_expr *expr;
u32 *val;
int count = 0;
int i;
list_for_each_entry(expr, &expr_list->el_exprs, re_link) {
for (i = expr->re_lo; i <= expr->re_hi; i++) {
if (!((i - expr->re_lo) % expr->re_stride))
count++;
}
}
if (!count) /* empty expression list */
return 0;
if (count > max) {
CERROR("Number of values %d exceeds max allowed %d\n",
max, count);
return -EINVAL;
}
val = kvmalloc_array(count, sizeof(val[0]), GFP_KERNEL | __GFP_ZERO);
if (!val)
return -ENOMEM;
count = 0;
list_for_each_entry(expr, &expr_list->el_exprs, re_link) {
for (i = expr->re_lo; i <= expr->re_hi; i++) {
if (!((i - expr->re_lo) % expr->re_stride))
val[count++] = i;
}
}
*valpp = val;
return count;
}
EXPORT_SYMBOL(cfs_expr_list_values);
/**
* Frees cfs_range_expr structures of \a expr_list.
*
* \retval none
*/
void
cfs_expr_list_free(struct cfs_expr_list *expr_list)
{
while (!list_empty(&expr_list->el_exprs)) {
struct cfs_range_expr *expr;
expr = list_entry(expr_list->el_exprs.next,
struct cfs_range_expr, re_link);
list_del(&expr->re_link);
kfree(expr);
}
kfree(expr_list);
}
EXPORT_SYMBOL(cfs_expr_list_free);
/**
* Parses \<cfs_expr_list\> token of the syntax.
*
* \retval 0 if \a str parses to \<number\> | \<expr_list\>
* \retval -errno otherwise
*/
int
cfs_expr_list_parse(char *str, int len, unsigned int min, unsigned int max,
struct cfs_expr_list **elpp)
{
struct cfs_expr_list *expr_list;
struct cfs_range_expr *expr;
struct cfs_lstr src;
int rc;
expr_list = kzalloc(sizeof(*expr_list), GFP_NOFS);
if (!expr_list)
return -ENOMEM;
src.ls_str = str;
src.ls_len = len;
INIT_LIST_HEAD(&expr_list->el_exprs);
if (src.ls_str[0] == '[' &&
src.ls_str[src.ls_len - 1] == ']') {
src.ls_str++;
src.ls_len -= 2;
rc = -EINVAL;
while (src.ls_str) {
struct cfs_lstr tok;
if (!cfs_gettok(&src, ',', &tok)) {
rc = -EINVAL;
break;
}
rc = cfs_range_expr_parse(&tok, min, max, 1, &expr);
if (rc)
break;
list_add_tail(&expr->re_link, &expr_list->el_exprs);
}
} else {
rc = cfs_range_expr_parse(&src, min, max, 0, &expr);
if (!rc)
list_add_tail(&expr->re_link, &expr_list->el_exprs);
}
if (rc)
cfs_expr_list_free(expr_list);
else
*elpp = expr_list;
return rc;
}
EXPORT_SYMBOL(cfs_expr_list_parse);
/**
* Frees cfs_expr_list structures of \a list.
*
* For each struct cfs_expr_list structure found on \a list it frees
* range_expr list attached to it and frees the cfs_expr_list itself.
*
* \retval none
*/
void
cfs_expr_list_free_list(struct list_head *list)
{
struct cfs_expr_list *el;
while (!list_empty(list)) {
el = list_entry(list->next, struct cfs_expr_list, el_link);
list_del(&el->el_link);
cfs_expr_list_free(el);
}
}
EXPORT_SYMBOL(cfs_expr_list_free_list);

View File

@ -1,139 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see http://www.gnu.org/licenses
*
* Please visit http://www.xyratex.com/contact if you need additional
* information or have any questions.
*
* GPL HEADER END
*/
/*
* Copyright 2012 Xyratex Technology Limited
*/
/*
* This is crypto api shash wrappers to zlib_adler32.
*/
#include <linux/module.h>
#include <linux/zutil.h>
#include <crypto/internal/hash.h>
#include "linux-crypto.h"
#define CHKSUM_BLOCK_SIZE 1
#define CHKSUM_DIGEST_SIZE 4
static int adler32_cra_init(struct crypto_tfm *tfm)
{
u32 *key = crypto_tfm_ctx(tfm);
*key = 1;
return 0;
}
static int adler32_setkey(struct crypto_shash *hash, const u8 *key,
unsigned int keylen)
{
u32 *mctx = crypto_shash_ctx(hash);
if (keylen != sizeof(u32)) {
crypto_shash_set_flags(hash, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
*mctx = *(u32 *)key;
return 0;
}
static int adler32_init(struct shash_desc *desc)
{
u32 *mctx = crypto_shash_ctx(desc->tfm);
u32 *cksump = shash_desc_ctx(desc);
*cksump = *mctx;
return 0;
}
static int adler32_update(struct shash_desc *desc, const u8 *data,
unsigned int len)
{
u32 *cksump = shash_desc_ctx(desc);
*cksump = zlib_adler32(*cksump, data, len);
return 0;
}
static int __adler32_finup(u32 *cksump, const u8 *data, unsigned int len,
u8 *out)
{
*(u32 *)out = zlib_adler32(*cksump, data, len);
return 0;
}
static int adler32_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return __adler32_finup(shash_desc_ctx(desc), data, len, out);
}
static int adler32_final(struct shash_desc *desc, u8 *out)
{
u32 *cksump = shash_desc_ctx(desc);
*(u32 *)out = *cksump;
return 0;
}
static int adler32_digest(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out)
{
return __adler32_finup(crypto_shash_ctx(desc->tfm), data, len,
out);
}
static struct shash_alg alg = {
.setkey = adler32_setkey,
.init = adler32_init,
.update = adler32_update,
.final = adler32_final,
.finup = adler32_finup,
.digest = adler32_digest,
.descsize = sizeof(u32),
.digestsize = CHKSUM_DIGEST_SIZE,
.base = {
.cra_name = "adler32",
.cra_driver_name = "adler32-zlib",
.cra_priority = 100,
.cra_flags = CRYPTO_ALG_OPTIONAL_KEY,
.cra_blocksize = CHKSUM_BLOCK_SIZE,
.cra_ctxsize = sizeof(u32),
.cra_module = THIS_MODULE,
.cra_init = adler32_cra_init,
}
};
int cfs_crypto_adler32_register(void)
{
return crypto_register_shash(&alg);
}
void cfs_crypto_adler32_unregister(void)
{
crypto_unregister_shash(&alg);
}

View File

@ -1,447 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see http://www.gnu.org/licenses
*
* Please visit http://www.xyratex.com/contact if you need additional
* information or have any questions.
*
* GPL HEADER END
*/
/*
* Copyright 2012 Xyratex Technology Limited
*
* Copyright (c) 2012, Intel Corporation.
*/
#include <crypto/hash.h>
#include <linux/scatterlist.h>
#include <linux/highmem.h>
#include <linux/module.h>
#include <linux/libcfs/libcfs_crypto.h>
#include <linux/libcfs/libcfs.h>
#include "linux-crypto.h"
/**
* Array of hash algorithm speed in MByte per second
*/
static int cfs_crypto_hash_speeds[CFS_HASH_ALG_MAX];
/**
* Initialize the state descriptor for the specified hash algorithm.
*
* An internal routine to allocate the hash-specific state in \a req for
* use with cfs_crypto_hash_digest() to compute the hash of a single message,
* though possibly in multiple chunks. The descriptor internal state should
* be freed with cfs_crypto_hash_final().
*
* \param[in] hash_alg hash algorithm id (CFS_HASH_ALG_*)
* \param[out] type pointer to the hash description in hash_types[]
* array
* \param[in,out] req hash state descriptor to be initialized
* \param[in] key initial hash value/state, NULL to use default
* value
* \param[in] key_len length of \a key
*
* \retval 0 on success
* \retval negative errno on failure
*/
static int cfs_crypto_hash_alloc(enum cfs_crypto_hash_alg hash_alg,
const struct cfs_crypto_hash_type **type,
struct ahash_request **req,
unsigned char *key,
unsigned int key_len)
{
struct crypto_ahash *tfm;
int err = 0;
*type = cfs_crypto_hash_type(hash_alg);
if (!*type) {
CWARN("Unsupported hash algorithm id = %d, max id is %d\n",
hash_alg, CFS_HASH_ALG_MAX);
return -EINVAL;
}
tfm = crypto_alloc_ahash((*type)->cht_name, 0, CRYPTO_ALG_ASYNC);
if (IS_ERR(tfm)) {
CDEBUG(D_INFO, "Failed to alloc crypto hash %s\n",
(*type)->cht_name);
return PTR_ERR(tfm);
}
*req = ahash_request_alloc(tfm, GFP_KERNEL);
if (!*req) {
CDEBUG(D_INFO, "Failed to alloc ahash_request for %s\n",
(*type)->cht_name);
crypto_free_ahash(tfm);
return -ENOMEM;
}
ahash_request_set_callback(*req, 0, NULL, NULL);
if (key)
err = crypto_ahash_setkey(tfm, key, key_len);
else if ((*type)->cht_key)
err = crypto_ahash_setkey(tfm,
(unsigned char *)&((*type)->cht_key),
(*type)->cht_size);
if (err) {
ahash_request_free(*req);
crypto_free_ahash(tfm);
return err;
}
CDEBUG(D_INFO, "Using crypto hash: %s (%s) speed %d MB/s\n",
crypto_ahash_alg_name(tfm), crypto_ahash_driver_name(tfm),
cfs_crypto_hash_speeds[hash_alg]);
err = crypto_ahash_init(*req);
if (err) {
ahash_request_free(*req);
crypto_free_ahash(tfm);
}
return err;
}
/**
* Calculate hash digest for the passed buffer.
*
* This should be used when computing the hash on a single contiguous buffer.
* It combines the hash initialization, computation, and cleanup.
*
* \param[in] hash_alg id of hash algorithm (CFS_HASH_ALG_*)
* \param[in] buf data buffer on which to compute hash
* \param[in] buf_len length of \a buf in bytes
* \param[in] key initial value/state for algorithm,
* if \a key = NULL use default initial value
* \param[in] key_len length of \a key in bytes
* \param[out] hash pointer to computed hash value,
* if \a hash = NULL then \a hash_len is to digest
* size in bytes, retval -ENOSPC
* \param[in,out] hash_len size of \a hash buffer
*
* \retval -EINVAL \a buf, \a buf_len, \a hash_len,
* \a hash_alg invalid
* \retval -ENOENT \a hash_alg is unsupported
* \retval -ENOSPC \a hash is NULL, or \a hash_len less than
* digest size
* \retval 0 for success
* \retval negative errno for other errors from lower
* layers.
*/
int cfs_crypto_hash_digest(enum cfs_crypto_hash_alg hash_alg,
const void *buf, unsigned int buf_len,
unsigned char *key, unsigned int key_len,
unsigned char *hash, unsigned int *hash_len)
{
struct scatterlist sl;
struct ahash_request *req;
int err;
const struct cfs_crypto_hash_type *type;
if (!buf || !buf_len || !hash_len)
return -EINVAL;
err = cfs_crypto_hash_alloc(hash_alg, &type, &req, key, key_len);
if (err)
return err;
if (!hash || *hash_len < type->cht_size) {
*hash_len = type->cht_size;
crypto_free_ahash(crypto_ahash_reqtfm(req));
ahash_request_free(req);
return -ENOSPC;
}
sg_init_one(&sl, buf, buf_len);
ahash_request_set_crypt(req, &sl, hash, sl.length);
err = crypto_ahash_digest(req);
crypto_free_ahash(crypto_ahash_reqtfm(req));
ahash_request_free(req);
return err;
}
EXPORT_SYMBOL(cfs_crypto_hash_digest);
/**
* Allocate and initialize descriptor for hash algorithm.
*
* This should be used to initialize a hash descriptor for multiple calls
* to a single hash function when computing the hash across multiple
* separate buffers or pages using cfs_crypto_hash_update{,_page}().
*
* The hash descriptor should be freed with cfs_crypto_hash_final().
*
* \param[in] hash_alg algorithm id (CFS_HASH_ALG_*)
* \param[in] key initial value/state for algorithm, if \a key = NULL
* use default initial value
* \param[in] key_len length of \a key in bytes
*
* \retval pointer to descriptor of hash instance
* \retval ERR_PTR(errno) in case of error
*/
struct ahash_request *
cfs_crypto_hash_init(enum cfs_crypto_hash_alg hash_alg,
unsigned char *key, unsigned int key_len)
{
struct ahash_request *req;
int err;
const struct cfs_crypto_hash_type *type;
err = cfs_crypto_hash_alloc(hash_alg, &type, &req, key, key_len);
if (err)
return ERR_PTR(err);
return req;
}
EXPORT_SYMBOL(cfs_crypto_hash_init);
/**
* Update hash digest computed on data within the given \a page
*
* \param[in] hreq hash state descriptor
* \param[in] page data page on which to compute the hash
* \param[in] offset offset within \a page at which to start hash
* \param[in] len length of data on which to compute hash
*
* \retval 0 for success
* \retval negative errno on failure
*/
int cfs_crypto_hash_update_page(struct ahash_request *req,
struct page *page, unsigned int offset,
unsigned int len)
{
struct scatterlist sl;
sg_init_table(&sl, 1);
sg_set_page(&sl, page, len, offset & ~PAGE_MASK);
ahash_request_set_crypt(req, &sl, NULL, sl.length);
return crypto_ahash_update(req);
}
EXPORT_SYMBOL(cfs_crypto_hash_update_page);
/**
* Update hash digest computed on the specified data
*
* \param[in] req hash state descriptor
* \param[in] buf data buffer on which to compute the hash
* \param[in] buf_len length of \buf on which to compute hash
*
* \retval 0 for success
* \retval negative errno on failure
*/
int cfs_crypto_hash_update(struct ahash_request *req,
const void *buf, unsigned int buf_len)
{
struct scatterlist sl;
sg_init_one(&sl, buf, buf_len);
ahash_request_set_crypt(req, &sl, NULL, sl.length);
return crypto_ahash_update(req);
}
EXPORT_SYMBOL(cfs_crypto_hash_update);
/**
* Finish hash calculation, copy hash digest to buffer, clean up hash descriptor
*
* \param[in] req hash descriptor
* \param[out] hash pointer to hash buffer to store hash digest
* \param[in,out] hash_len pointer to hash buffer size, if \a req = NULL
* only free \a req instead of computing the hash
*
* \retval 0 for success
* \retval -EOVERFLOW if hash_len is too small for the hash digest
* \retval negative errno for other errors from lower layers
*/
int cfs_crypto_hash_final(struct ahash_request *req,
unsigned char *hash, unsigned int *hash_len)
{
int err;
int size = crypto_ahash_digestsize(crypto_ahash_reqtfm(req));
if (!hash || !hash_len) {
err = 0;
goto free_ahash;
}
if (*hash_len < size) {
err = -EOVERFLOW;
goto free_ahash;
}
ahash_request_set_crypt(req, NULL, hash, 0);
err = crypto_ahash_final(req);
if (!err)
*hash_len = size;
free_ahash:
crypto_free_ahash(crypto_ahash_reqtfm(req));
ahash_request_free(req);
return err;
}
EXPORT_SYMBOL(cfs_crypto_hash_final);
/**
* Compute the speed of specified hash function
*
* Run a speed test on the given hash algorithm on buffer of the given size.
* The speed is stored internally in the cfs_crypto_hash_speeds[] array, and
* is available through the cfs_crypto_hash_speed() function.
*
* \param[in] hash_alg hash algorithm id (CFS_HASH_ALG_*)
* \param[in] buf data buffer on which to compute the hash
* \param[in] buf_len length of \buf on which to compute hash
*/
static void cfs_crypto_performance_test(enum cfs_crypto_hash_alg hash_alg)
{
int buf_len = max(PAGE_SIZE, 1048576UL);
void *buf;
unsigned long start, end;
int bcount, err = 0;
struct page *page;
unsigned char hash[CFS_CRYPTO_HASH_DIGESTSIZE_MAX];
unsigned int hash_len = sizeof(hash);
page = alloc_page(GFP_KERNEL);
if (!page) {
err = -ENOMEM;
goto out_err;
}
buf = kmap(page);
memset(buf, 0xAD, PAGE_SIZE);
kunmap(page);
for (start = jiffies, end = start + msecs_to_jiffies(MSEC_PER_SEC),
bcount = 0; time_before(jiffies, end); bcount++) {
struct ahash_request *hdesc;
int i;
hdesc = cfs_crypto_hash_init(hash_alg, NULL, 0);
if (IS_ERR(hdesc)) {
err = PTR_ERR(hdesc);
break;
}
for (i = 0; i < buf_len / PAGE_SIZE; i++) {
err = cfs_crypto_hash_update_page(hdesc, page, 0,
PAGE_SIZE);
if (err)
break;
}
err = cfs_crypto_hash_final(hdesc, hash, &hash_len);
if (err)
break;
}
end = jiffies;
__free_page(page);
out_err:
if (err) {
cfs_crypto_hash_speeds[hash_alg] = err;
CDEBUG(D_INFO, "Crypto hash algorithm %s test error: rc = %d\n",
cfs_crypto_hash_name(hash_alg), err);
} else {
unsigned long tmp;
tmp = ((bcount * buf_len / jiffies_to_msecs(end - start)) *
1000) / (1024 * 1024);
cfs_crypto_hash_speeds[hash_alg] = (int)tmp;
CDEBUG(D_CONFIG, "Crypto hash algorithm %s speed = %d MB/s\n",
cfs_crypto_hash_name(hash_alg),
cfs_crypto_hash_speeds[hash_alg]);
}
}
/**
* hash speed in Mbytes per second for valid hash algorithm
*
* Return the performance of the specified \a hash_alg that was previously
* computed using cfs_crypto_performance_test().
*
* \param[in] hash_alg hash algorithm id (CFS_HASH_ALG_*)
*
* \retval positive speed of the hash function in MB/s
* \retval -ENOENT if \a hash_alg is unsupported
* \retval negative errno if \a hash_alg speed is unavailable
*/
int cfs_crypto_hash_speed(enum cfs_crypto_hash_alg hash_alg)
{
if (hash_alg < CFS_HASH_ALG_MAX)
return cfs_crypto_hash_speeds[hash_alg];
return -ENOENT;
}
EXPORT_SYMBOL(cfs_crypto_hash_speed);
/**
* Run the performance test for all hash algorithms.
*
* Run the cfs_crypto_performance_test() benchmark for all of the available
* hash functions using a 1MB buffer size. This is a reasonable buffer size
* for Lustre RPCs, even if the actual RPC size is larger or smaller.
*
* Since the setup cost and computation speed of various hash algorithms is
* a function of the buffer size (and possibly internal contention of offload
* engines), this speed only represents an estimate of the actual speed under
* actual usage, but is reasonable for comparing available algorithms.
*
* The actual speeds are available via cfs_crypto_hash_speed() for later
* comparison.
*
* \retval 0 on success
* \retval -ENOMEM if no memory is available for test buffer
*/
static int cfs_crypto_test_hashes(void)
{
enum cfs_crypto_hash_alg hash_alg;
for (hash_alg = 0; hash_alg < CFS_HASH_ALG_MAX; hash_alg++)
cfs_crypto_performance_test(hash_alg);
return 0;
}
static int adler32;
/**
* Register available hash functions
*
* \retval 0
*/
int cfs_crypto_register(void)
{
request_module("crc32c");
if (cfs_crypto_adler32_register() == 0)
adler32 = 1;
/* check all algorithms and do performance test */
cfs_crypto_test_hashes();
return 0;
}
/**
* Unregister previously registered hash functions
*/
void cfs_crypto_unregister(void)
{
if (adler32)
cfs_crypto_adler32_unregister();
adler32 = 0;
}

View File

@ -1,30 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see http://www.gnu.org/licenses
*
* Please visit http://www.xyratex.com/contact if you need additional
* information or have any questions.
*
* GPL HEADER END
*/
/**
* Functions for start/stop shash adler32 algorithm.
*/
int cfs_crypto_adler32_register(void);
void cfs_crypto_adler32_unregister(void);

View File

@ -1,142 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* libcfs/libcfs/linux/linux-debug.c
*
* Author: Phil Schwan <phil@clusterfs.com>
*/
#include <linux/module.h>
#include <linux/kmod.h>
#include <linux/notifier.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/stat.h>
#include <linux/errno.h>
#include <linux/unistd.h>
#include <linux/interrupt.h>
#include <linux/completion.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
# define DEBUG_SUBSYSTEM S_LNET
#include "tracefile.h"
#include <linux/kallsyms.h>
char lnet_debug_log_upcall[1024] = "/usr/lib/lustre/lnet_debug_log_upcall";
/**
* Upcall function once a Lustre log has been dumped.
*
* \param file path of the dumped log
*/
void libcfs_run_debug_log_upcall(char *file)
{
char *argv[3];
int rc;
static const char * const envp[] = {
"HOME=/",
"PATH=/sbin:/bin:/usr/sbin:/usr/bin",
NULL
};
argv[0] = lnet_debug_log_upcall;
LASSERTF(file, "called on a null filename\n");
argv[1] = file; /* only need to pass the path of the file */
argv[2] = NULL;
rc = call_usermodehelper(argv[0], argv, (char **)envp, 1);
if (rc < 0 && rc != -ENOENT) {
CERROR("Error %d invoking LNET debug log upcall %s %s; check /sys/kernel/debug/lnet/debug_log_upcall\n",
rc, argv[0], argv[1]);
} else {
CDEBUG(D_HA, "Invoked LNET debug log upcall %s %s\n",
argv[0], argv[1]);
}
}
/* coverity[+kill] */
void __noreturn lbug_with_loc(struct libcfs_debug_msg_data *msgdata)
{
libcfs_catastrophe = 1;
libcfs_debug_msg(msgdata, "LBUG\n");
if (in_interrupt()) {
panic("LBUG in interrupt.\n");
/* not reached */
}
dump_stack();
if (!libcfs_panic_on_lbug)
libcfs_debug_dumplog();
if (libcfs_panic_on_lbug)
panic("LBUG");
set_current_state(TASK_UNINTERRUPTIBLE);
while (1)
schedule();
}
EXPORT_SYMBOL(lbug_with_loc);
static int panic_notifier(struct notifier_block *self, unsigned long unused1,
void *unused2)
{
if (libcfs_panic_in_progress)
return 0;
libcfs_panic_in_progress = 1;
mb();
return 0;
}
static struct notifier_block libcfs_panic_notifier = {
.notifier_call = panic_notifier,
.next = NULL,
.priority = 10000,
};
void libcfs_register_panic_notifier(void)
{
atomic_notifier_chain_register(&panic_notifier_list,
&libcfs_panic_notifier);
}
void libcfs_unregister_panic_notifier(void)
{
atomic_notifier_chain_unregister(&panic_notifier_list,
&libcfs_panic_notifier);
}

View File

@ -1,258 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#define DEBUG_SUBSYSTEM S_LNET
#define LUSTRE_TRACEFILE_PRIVATE
#include <linux/slab.h>
#include <linux/mm.h>
#include "tracefile.h"
/* percents to share the total debug memory for each type */
static unsigned int pages_factor[CFS_TCD_TYPE_MAX] = {
80, /* 80% pages for CFS_TCD_TYPE_PROC */
10, /* 10% pages for CFS_TCD_TYPE_SOFTIRQ */
10 /* 10% pages for CFS_TCD_TYPE_IRQ */
};
char *cfs_trace_console_buffers[NR_CPUS][CFS_TCD_TYPE_MAX];
static DECLARE_RWSEM(cfs_tracefile_sem);
int cfs_tracefile_init_arch(void)
{
int i;
int j;
struct cfs_trace_cpu_data *tcd;
/* initialize trace_data */
memset(cfs_trace_data, 0, sizeof(cfs_trace_data));
for (i = 0; i < CFS_TCD_TYPE_MAX; i++) {
cfs_trace_data[i] =
kmalloc_array(num_possible_cpus(),
sizeof(union cfs_trace_data_union),
GFP_KERNEL);
if (!cfs_trace_data[i])
goto out;
}
/* arch related info initialized */
cfs_tcd_for_each(tcd, i, j) {
spin_lock_init(&tcd->tcd_lock);
tcd->tcd_pages_factor = pages_factor[i];
tcd->tcd_type = i;
tcd->tcd_cpu = j;
}
for (i = 0; i < num_possible_cpus(); i++)
for (j = 0; j < 3; j++) {
cfs_trace_console_buffers[i][j] =
kmalloc(CFS_TRACE_CONSOLE_BUFFER_SIZE,
GFP_KERNEL);
if (!cfs_trace_console_buffers[i][j])
goto out;
}
return 0;
out:
cfs_tracefile_fini_arch();
pr_err("lnet: Not enough memory\n");
return -ENOMEM;
}
void cfs_tracefile_fini_arch(void)
{
int i;
int j;
for (i = 0; i < num_possible_cpus(); i++)
for (j = 0; j < 3; j++) {
kfree(cfs_trace_console_buffers[i][j]);
cfs_trace_console_buffers[i][j] = NULL;
}
for (i = 0; cfs_trace_data[i]; i++) {
kfree(cfs_trace_data[i]);
cfs_trace_data[i] = NULL;
}
}
void cfs_tracefile_read_lock(void)
{
down_read(&cfs_tracefile_sem);
}
void cfs_tracefile_read_unlock(void)
{
up_read(&cfs_tracefile_sem);
}
void cfs_tracefile_write_lock(void)
{
down_write(&cfs_tracefile_sem);
}
void cfs_tracefile_write_unlock(void)
{
up_write(&cfs_tracefile_sem);
}
enum cfs_trace_buf_type cfs_trace_buf_idx_get(void)
{
if (in_irq())
return CFS_TCD_TYPE_IRQ;
if (in_softirq())
return CFS_TCD_TYPE_SOFTIRQ;
return CFS_TCD_TYPE_PROC;
}
/*
* The walking argument indicates the locking comes from all tcd types
* iterator and we must lock it and dissable local irqs to avoid deadlocks
* with other interrupt locks that might be happening. See LU-1311
* for details.
*/
int cfs_trace_lock_tcd(struct cfs_trace_cpu_data *tcd, int walking)
__acquires(&tcd->tc_lock)
{
__LASSERT(tcd->tcd_type < CFS_TCD_TYPE_MAX);
if (tcd->tcd_type == CFS_TCD_TYPE_IRQ)
spin_lock_irqsave(&tcd->tcd_lock, tcd->tcd_lock_flags);
else if (tcd->tcd_type == CFS_TCD_TYPE_SOFTIRQ)
spin_lock_bh(&tcd->tcd_lock);
else if (unlikely(walking))
spin_lock_irq(&tcd->tcd_lock);
else
spin_lock(&tcd->tcd_lock);
return 1;
}
void cfs_trace_unlock_tcd(struct cfs_trace_cpu_data *tcd, int walking)
__releases(&tcd->tcd_lock)
{
__LASSERT(tcd->tcd_type < CFS_TCD_TYPE_MAX);
if (tcd->tcd_type == CFS_TCD_TYPE_IRQ)
spin_unlock_irqrestore(&tcd->tcd_lock, tcd->tcd_lock_flags);
else if (tcd->tcd_type == CFS_TCD_TYPE_SOFTIRQ)
spin_unlock_bh(&tcd->tcd_lock);
else if (unlikely(walking))
spin_unlock_irq(&tcd->tcd_lock);
else
spin_unlock(&tcd->tcd_lock);
}
void
cfs_set_ptldebug_header(struct ptldebug_header *header,
struct libcfs_debug_msg_data *msgdata,
unsigned long stack)
{
struct timespec64 ts;
ktime_get_real_ts64(&ts);
header->ph_subsys = msgdata->msg_subsys;
header->ph_mask = msgdata->msg_mask;
header->ph_cpu_id = smp_processor_id();
header->ph_type = cfs_trace_buf_idx_get();
/* y2038 safe since all user space treats this as unsigned, but
* will overflow in 2106
*/
header->ph_sec = (u32)ts.tv_sec;
header->ph_usec = ts.tv_nsec / NSEC_PER_USEC;
header->ph_stack = stack;
header->ph_pid = current->pid;
header->ph_line_num = msgdata->msg_line;
header->ph_extern_pid = 0;
}
static char *
dbghdr_to_err_string(struct ptldebug_header *hdr)
{
switch (hdr->ph_subsys) {
case S_LND:
case S_LNET:
return "LNetError";
default:
return "LustreError";
}
}
static char *
dbghdr_to_info_string(struct ptldebug_header *hdr)
{
switch (hdr->ph_subsys) {
case S_LND:
case S_LNET:
return "LNet";
default:
return "Lustre";
}
}
void cfs_print_to_console(struct ptldebug_header *hdr, int mask,
const char *buf, int len, const char *file,
const char *fn)
{
char *prefix = "Lustre", *ptype = NULL;
if (mask & D_EMERG) {
prefix = dbghdr_to_err_string(hdr);
ptype = KERN_EMERG;
} else if (mask & D_ERROR) {
prefix = dbghdr_to_err_string(hdr);
ptype = KERN_ERR;
} else if (mask & D_WARNING) {
prefix = dbghdr_to_info_string(hdr);
ptype = KERN_WARNING;
} else if (mask & (D_CONSOLE | libcfs_printk)) {
prefix = dbghdr_to_info_string(hdr);
ptype = KERN_INFO;
}
if (mask & D_CONSOLE) {
pr_info("%s%s: %.*s", ptype, prefix, len, buf);
} else {
pr_info("%s%s: %d:%d:(%s:%d:%s()) %.*s", ptype, prefix,
hdr->ph_pid, hdr->ph_extern_pid, file,
hdr->ph_line_num, fn, len, buf);
}
}
int cfs_trace_max_debug_mb(void)
{
int total_mb = (totalram_pages >> (20 - PAGE_SHIFT));
return max(512, (total_mb * 80) / 100);
}

View File

@ -1,758 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, 2015 Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#include <linux/miscdevice.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/string.h>
#include <linux/stat.h>
#include <linux/errno.h>
#include <linux/unistd.h>
#include <net/sock.h>
#include <linux/uio.h>
#include <linux/uaccess.h>
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/list.h>
#include <linux/sysctl.h>
#include <linux/debugfs.h>
# define DEBUG_SUBSYSTEM S_LNET
#include <asm/div64.h>
#include <linux/libcfs/libcfs_crypto.h>
#include <linux/lnet/lib-lnet.h>
#include <uapi/linux/lnet/lnet-dlc.h>
#include "tracefile.h"
struct lnet_debugfs_symlink_def {
char *name;
char *target;
};
static struct dentry *lnet_debugfs_root;
BLOCKING_NOTIFIER_HEAD(libcfs_ioctl_list);
EXPORT_SYMBOL(libcfs_ioctl_list);
static inline size_t libcfs_ioctl_packlen(struct libcfs_ioctl_data *data)
{
size_t len = sizeof(*data);
len += cfs_size_round(data->ioc_inllen1);
len += cfs_size_round(data->ioc_inllen2);
return len;
}
static inline bool libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data)
{
if (data->ioc_hdr.ioc_len > BIT(30)) {
CERROR("LIBCFS ioctl: ioc_len larger than 1<<30\n");
return true;
}
if (data->ioc_inllen1 > BIT(30)) {
CERROR("LIBCFS ioctl: ioc_inllen1 larger than 1<<30\n");
return true;
}
if (data->ioc_inllen2 > BIT(30)) {
CERROR("LIBCFS ioctl: ioc_inllen2 larger than 1<<30\n");
return true;
}
if (data->ioc_inlbuf1 && !data->ioc_inllen1) {
CERROR("LIBCFS ioctl: inlbuf1 pointer but 0 length\n");
return true;
}
if (data->ioc_inlbuf2 && !data->ioc_inllen2) {
CERROR("LIBCFS ioctl: inlbuf2 pointer but 0 length\n");
return true;
}
if (data->ioc_pbuf1 && !data->ioc_plen1) {
CERROR("LIBCFS ioctl: pbuf1 pointer but 0 length\n");
return true;
}
if (data->ioc_pbuf2 && !data->ioc_plen2) {
CERROR("LIBCFS ioctl: pbuf2 pointer but 0 length\n");
return true;
}
if (data->ioc_plen1 && !data->ioc_pbuf1) {
CERROR("LIBCFS ioctl: plen1 nonzero but no pbuf1 pointer\n");
return true;
}
if (data->ioc_plen2 && !data->ioc_pbuf2) {
CERROR("LIBCFS ioctl: plen2 nonzero but no pbuf2 pointer\n");
return true;
}
if ((u32)libcfs_ioctl_packlen(data) != data->ioc_hdr.ioc_len) {
CERROR("LIBCFS ioctl: packlen != ioc_len\n");
return true;
}
if (data->ioc_inllen1 &&
data->ioc_bulk[data->ioc_inllen1 - 1] != '\0') {
CERROR("LIBCFS ioctl: inlbuf1 not 0 terminated\n");
return true;
}
if (data->ioc_inllen2 &&
data->ioc_bulk[cfs_size_round(data->ioc_inllen1) +
data->ioc_inllen2 - 1] != '\0') {
CERROR("LIBCFS ioctl: inlbuf2 not 0 terminated\n");
return true;
}
return false;
}
static int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data)
{
if (libcfs_ioctl_is_invalid(data)) {
CERROR("libcfs ioctl: parameter not correctly formatted\n");
return -EINVAL;
}
if (data->ioc_inllen1)
data->ioc_inlbuf1 = &data->ioc_bulk[0];
if (data->ioc_inllen2)
data->ioc_inlbuf2 = &data->ioc_bulk[0] +
cfs_size_round(data->ioc_inllen1);
return 0;
}
static int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp,
const struct libcfs_ioctl_hdr __user *uhdr)
{
struct libcfs_ioctl_hdr hdr;
int err;
if (copy_from_user(&hdr, uhdr, sizeof(hdr)))
return -EFAULT;
if (hdr.ioc_version != LIBCFS_IOCTL_VERSION &&
hdr.ioc_version != LIBCFS_IOCTL_VERSION2) {
CERROR("libcfs ioctl: version mismatch expected %#x, got %#x\n",
LIBCFS_IOCTL_VERSION, hdr.ioc_version);
return -EINVAL;
}
if (hdr.ioc_len < sizeof(hdr)) {
CERROR("libcfs ioctl: user buffer too small for ioctl\n");
return -EINVAL;
}
if (hdr.ioc_len > LIBCFS_IOC_DATA_MAX) {
CERROR("libcfs ioctl: user buffer is too large %d/%d\n",
hdr.ioc_len, LIBCFS_IOC_DATA_MAX);
return -EINVAL;
}
*hdr_pp = kvmalloc(hdr.ioc_len, GFP_KERNEL);
if (!*hdr_pp)
return -ENOMEM;
if (copy_from_user(*hdr_pp, uhdr, hdr.ioc_len)) {
err = -EFAULT;
goto free;
}
if ((*hdr_pp)->ioc_version != hdr.ioc_version ||
(*hdr_pp)->ioc_len != hdr.ioc_len) {
err = -EINVAL;
goto free;
}
return 0;
free:
kvfree(*hdr_pp);
return err;
}
static int libcfs_ioctl(unsigned long cmd, void __user *uparam)
{
struct libcfs_ioctl_data *data = NULL;
struct libcfs_ioctl_hdr *hdr;
int err;
/* 'cmd' and permissions get checked in our arch-specific caller */
err = libcfs_ioctl_getdata(&hdr, uparam);
if (err) {
CDEBUG_LIMIT(D_ERROR,
"libcfs ioctl: data header error %d\n", err);
return err;
}
if (hdr->ioc_version == LIBCFS_IOCTL_VERSION) {
/*
* The libcfs_ioctl_data_adjust() function performs adjustment
* operations on the libcfs_ioctl_data structure to make
* it usable by the code. This doesn't need to be called
* for new data structures added.
*/
data = container_of(hdr, struct libcfs_ioctl_data, ioc_hdr);
err = libcfs_ioctl_data_adjust(data);
if (err)
goto out;
}
CDEBUG(D_IOCTL, "libcfs ioctl cmd %lu\n", cmd);
switch (cmd) {
case IOC_LIBCFS_CLEAR_DEBUG:
libcfs_debug_clear_buffer();
break;
case IOC_LIBCFS_MARK_DEBUG:
if (!data || !data->ioc_inlbuf1 ||
data->ioc_inlbuf1[data->ioc_inllen1 - 1] != '\0') {
err = -EINVAL;
goto out;
}
libcfs_debug_mark_buffer(data->ioc_inlbuf1);
break;
default:
err = blocking_notifier_call_chain(&libcfs_ioctl_list,
cmd, hdr);
if (!(err & NOTIFY_STOP_MASK))
/* No-one claimed the ioctl */
err = -EINVAL;
else
err = notifier_to_errno(err);
if (!err)
if (copy_to_user(uparam, hdr, hdr->ioc_len))
err = -EFAULT;
break;
}
out:
kvfree(hdr);
return err;
}
static long
libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE ||
_IOC_NR(cmd) < IOC_LIBCFS_MIN_NR ||
_IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) {
CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n",
_IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd));
return -EINVAL;
}
return libcfs_ioctl(cmd, (void __user *)arg);
}
static const struct file_operations libcfs_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = libcfs_psdev_ioctl,
};
static struct miscdevice libcfs_dev = {
.minor = MISC_DYNAMIC_MINOR,
.name = "lnet",
.fops = &libcfs_fops,
};
static int libcfs_dev_registered;
int lprocfs_call_handler(void *data, int write, loff_t *ppos,
void __user *buffer, size_t *lenp,
int (*handler)(void *data, int write, loff_t pos,
void __user *buffer, int len))
{
int rc = handler(data, write, *ppos, buffer, *lenp);
if (rc < 0)
return rc;
if (write) {
*ppos += *lenp;
} else {
*lenp = rc;
*ppos += rc;
}
return 0;
}
EXPORT_SYMBOL(lprocfs_call_handler);
static int __proc_dobitmasks(void *data, int write,
loff_t pos, void __user *buffer, int nob)
{
const int tmpstrlen = 512;
char *tmpstr;
int rc;
unsigned int *mask = data;
int is_subsys = (mask == &libcfs_subsystem_debug) ? 1 : 0;
int is_printk = (mask == &libcfs_printk) ? 1 : 0;
rc = cfs_trace_allocate_string_buffer(&tmpstr, tmpstrlen);
if (rc < 0)
return rc;
if (!write) {
libcfs_debug_mask2str(tmpstr, tmpstrlen, *mask, is_subsys);
rc = strlen(tmpstr);
if (pos >= rc) {
rc = 0;
} else {
rc = cfs_trace_copyout_string(buffer, nob,
tmpstr + pos, "\n");
}
} else {
rc = cfs_trace_copyin_string(tmpstr, tmpstrlen, buffer, nob);
if (rc < 0) {
kfree(tmpstr);
return rc;
}
rc = libcfs_debug_str2mask(mask, tmpstr, is_subsys);
/* Always print LBUG/LASSERT to console, so keep this mask */
if (is_printk)
*mask |= D_EMERG;
}
kfree(tmpstr);
return rc;
}
static int proc_dobitmasks(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
__proc_dobitmasks);
}
static int __proc_dump_kernel(void *data, int write,
loff_t pos, void __user *buffer, int nob)
{
if (!write)
return 0;
return cfs_trace_dump_debug_buffer_usrstr(buffer, nob);
}
static int proc_dump_kernel(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
__proc_dump_kernel);
}
static int __proc_daemon_file(void *data, int write,
loff_t pos, void __user *buffer, int nob)
{
if (!write) {
int len = strlen(cfs_tracefile);
if (pos >= len)
return 0;
return cfs_trace_copyout_string(buffer, nob,
cfs_tracefile + pos, "\n");
}
return cfs_trace_daemon_command_usrstr(buffer, nob);
}
static int proc_daemon_file(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
__proc_daemon_file);
}
static int libcfs_force_lbug(struct ctl_table *table, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos)
{
if (write)
LBUG();
return 0;
}
static int proc_fail_loc(struct ctl_table *table, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos)
{
int rc;
long old_fail_loc = cfs_fail_loc;
rc = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
if (old_fail_loc != cfs_fail_loc)
wake_up(&cfs_race_waitq);
return rc;
}
static int __proc_cpt_table(void *data, int write,
loff_t pos, void __user *buffer, int nob)
{
char *buf = NULL;
int len = 4096;
int rc = 0;
if (write)
return -EPERM;
while (1) {
buf = kzalloc(len, GFP_KERNEL);
if (!buf)
return -ENOMEM;
rc = cfs_cpt_table_print(cfs_cpt_tab, buf, len);
if (rc >= 0)
break;
if (rc == -EFBIG) {
kfree(buf);
len <<= 1;
continue;
}
goto out;
}
if (pos >= rc) {
rc = 0;
goto out;
}
rc = cfs_trace_copyout_string(buffer, nob, buf + pos, NULL);
out:
kfree(buf);
return rc;
}
static int proc_cpt_table(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
__proc_cpt_table);
}
static struct ctl_table lnet_table[] = {
{
.procname = "debug",
.data = &libcfs_debug,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &proc_dobitmasks,
},
{
.procname = "subsystem_debug",
.data = &libcfs_subsystem_debug,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &proc_dobitmasks,
},
{
.procname = "printk",
.data = &libcfs_printk,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &proc_dobitmasks,
},
{
.procname = "cpu_partition_table",
.maxlen = 128,
.mode = 0444,
.proc_handler = &proc_cpt_table,
},
{
.procname = "debug_log_upcall",
.data = lnet_debug_log_upcall,
.maxlen = sizeof(lnet_debug_log_upcall),
.mode = 0644,
.proc_handler = &proc_dostring,
},
{
.procname = "catastrophe",
.data = &libcfs_catastrophe,
.maxlen = sizeof(int),
.mode = 0444,
.proc_handler = &proc_dointvec,
},
{
.procname = "dump_kernel",
.maxlen = 256,
.mode = 0200,
.proc_handler = &proc_dump_kernel,
},
{
.procname = "daemon_file",
.mode = 0644,
.maxlen = 256,
.proc_handler = &proc_daemon_file,
},
{
.procname = "force_lbug",
.data = NULL,
.maxlen = 0,
.mode = 0200,
.proc_handler = &libcfs_force_lbug
},
{
.procname = "fail_loc",
.data = &cfs_fail_loc,
.maxlen = sizeof(cfs_fail_loc),
.mode = 0644,
.proc_handler = &proc_fail_loc
},
{
.procname = "fail_val",
.data = &cfs_fail_val,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &proc_dointvec
},
{
.procname = "fail_err",
.data = &cfs_fail_err,
.maxlen = sizeof(cfs_fail_err),
.mode = 0644,
.proc_handler = &proc_dointvec,
},
{
}
};
static const struct lnet_debugfs_symlink_def lnet_debugfs_symlinks[] = {
{ "console_ratelimit",
"/sys/module/libcfs/parameters/libcfs_console_ratelimit"},
{ "debug_path",
"/sys/module/libcfs/parameters/libcfs_debug_file_path"},
{ "panic_on_lbug",
"/sys/module/libcfs/parameters/libcfs_panic_on_lbug"},
{ "libcfs_console_backoff",
"/sys/module/libcfs/parameters/libcfs_console_backoff"},
{ "debug_mb",
"/sys/module/libcfs/parameters/libcfs_debug_mb"},
{ "console_min_delay_centisecs",
"/sys/module/libcfs/parameters/libcfs_console_min_delay"},
{ "console_max_delay_centisecs",
"/sys/module/libcfs/parameters/libcfs_console_max_delay"},
{},
};
static ssize_t lnet_debugfs_read(struct file *filp, char __user *buf,
size_t count, loff_t *ppos)
{
struct ctl_table *table = filp->private_data;
int error;
error = table->proc_handler(table, 0, (void __user *)buf, &count, ppos);
if (!error)
error = count;
return error;
}
static ssize_t lnet_debugfs_write(struct file *filp, const char __user *buf,
size_t count, loff_t *ppos)
{
struct ctl_table *table = filp->private_data;
int error;
error = table->proc_handler(table, 1, (void __user *)buf, &count, ppos);
if (!error)
error = count;
return error;
}
static const struct file_operations lnet_debugfs_file_operations_rw = {
.open = simple_open,
.read = lnet_debugfs_read,
.write = lnet_debugfs_write,
.llseek = default_llseek,
};
static const struct file_operations lnet_debugfs_file_operations_ro = {
.open = simple_open,
.read = lnet_debugfs_read,
.llseek = default_llseek,
};
static const struct file_operations lnet_debugfs_file_operations_wo = {
.open = simple_open,
.write = lnet_debugfs_write,
.llseek = default_llseek,
};
static const struct file_operations *lnet_debugfs_fops_select(umode_t mode)
{
if (!(mode & 0222))
return &lnet_debugfs_file_operations_ro;
if (!(mode & 0444))
return &lnet_debugfs_file_operations_wo;
return &lnet_debugfs_file_operations_rw;
}
void lustre_insert_debugfs(struct ctl_table *table)
{
if (!lnet_debugfs_root)
lnet_debugfs_root = debugfs_create_dir("lnet", NULL);
/* Even if we cannot create, just ignore it altogether) */
if (IS_ERR_OR_NULL(lnet_debugfs_root))
return;
/*
* We don't save the dentry returned because we don't call
* debugfs_remove() but rather remove_recursive()
*/
for (; table->procname; table++)
debugfs_create_file(table->procname, table->mode,
lnet_debugfs_root, table,
lnet_debugfs_fops_select(table->mode));
}
EXPORT_SYMBOL_GPL(lustre_insert_debugfs);
static void lustre_insert_debugfs_links(
const struct lnet_debugfs_symlink_def *symlinks)
{
for (; symlinks && symlinks->name; symlinks++)
debugfs_create_symlink(symlinks->name, lnet_debugfs_root,
symlinks->target);
}
static void lustre_remove_debugfs(void)
{
debugfs_remove_recursive(lnet_debugfs_root);
lnet_debugfs_root = NULL;
}
static DEFINE_MUTEX(libcfs_startup);
static int libcfs_active;
int libcfs_setup(void)
{
int rc = -EINVAL;
mutex_lock(&libcfs_startup);
if (libcfs_active)
goto out;
if (!libcfs_dev_registered)
goto err;
rc = libcfs_debug_init(5 * 1024 * 1024);
if (rc < 0) {
pr_err("LustreError: libcfs_debug_init: %d\n", rc);
goto err;
}
rc = cfs_cpu_init();
if (rc)
goto err;
cfs_rehash_wq = alloc_workqueue("cfs_rh", WQ_SYSFS, 4);
if (!cfs_rehash_wq) {
CERROR("Failed to start rehash workqueue.\n");
rc = -ENOMEM;
goto err;
}
rc = cfs_crypto_register();
if (rc) {
CERROR("cfs_crypto_register: error %d\n", rc);
goto err;
}
lustre_insert_debugfs(lnet_table);
if (!IS_ERR_OR_NULL(lnet_debugfs_root))
lustre_insert_debugfs_links(lnet_debugfs_symlinks);
CDEBUG(D_OTHER, "portals setup OK\n");
out:
libcfs_active = 1;
mutex_unlock(&libcfs_startup);
return 0;
err:
cfs_crypto_unregister();
if (cfs_rehash_wq)
destroy_workqueue(cfs_rehash_wq);
cfs_cpu_fini();
libcfs_debug_cleanup();
mutex_unlock(&libcfs_startup);
return rc;
}
EXPORT_SYMBOL(libcfs_setup);
static int libcfs_init(void)
{
int rc;
rc = misc_register(&libcfs_dev);
if (rc)
CERROR("misc_register: error %d\n", rc);
else
libcfs_dev_registered = 1;
return rc;
}
static void libcfs_exit(void)
{
int rc;
lustre_remove_debugfs();
if (cfs_rehash_wq)
destroy_workqueue(cfs_rehash_wq);
cfs_crypto_unregister();
if (libcfs_dev_registered)
misc_deregister(&libcfs_dev);
cfs_cpu_fini();
rc = libcfs_debug_cleanup();
if (rc)
pr_err("LustreError: libcfs_debug_cleanup: %d\n", rc);
}
MODULE_AUTHOR("OpenSFS, Inc. <http://www.lustre.org/>");
MODULE_DESCRIPTION("Lustre helper library");
MODULE_VERSION(LIBCFS_VERSION);
MODULE_LICENSE("GPL");
module_init(libcfs_init);
module_exit(libcfs_exit);

File diff suppressed because it is too large Load Diff

View File

@ -1,274 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#ifndef __LIBCFS_TRACEFILE_H__
#define __LIBCFS_TRACEFILE_H__
#include <linux/spinlock.h>
#include <linux/list.h>
#include <linux/cache.h>
#include <linux/threads.h>
#include <linux/limits.h>
#include <linux/smp.h>
#include <linux/libcfs/libcfs.h>
enum cfs_trace_buf_type {
CFS_TCD_TYPE_PROC = 0,
CFS_TCD_TYPE_SOFTIRQ,
CFS_TCD_TYPE_IRQ,
CFS_TCD_TYPE_MAX
};
/* trace file lock routines */
#define TRACEFILE_NAME_SIZE 1024
extern char cfs_tracefile[TRACEFILE_NAME_SIZE];
extern long long cfs_tracefile_size;
/**
* The path of debug log dump upcall script.
*/
extern char lnet_debug_log_upcall[1024];
void libcfs_run_debug_log_upcall(char *file);
int cfs_tracefile_init_arch(void);
void cfs_tracefile_fini_arch(void);
void cfs_tracefile_read_lock(void);
void cfs_tracefile_read_unlock(void);
void cfs_tracefile_write_lock(void);
void cfs_tracefile_write_unlock(void);
int cfs_tracefile_dump_all_pages(char *filename);
void cfs_trace_debug_print(void);
void cfs_trace_flush_pages(void);
int cfs_trace_start_thread(void);
void cfs_trace_stop_thread(void);
int cfs_tracefile_init(int max_pages);
void cfs_tracefile_exit(void);
int cfs_trace_copyin_string(char *knl_buffer, int knl_buffer_nob,
const char __user *usr_buffer, int usr_buffer_nob);
int cfs_trace_copyout_string(char __user *usr_buffer, int usr_buffer_nob,
const char *knl_str, char *append);
int cfs_trace_allocate_string_buffer(char **str, int nob);
int cfs_trace_dump_debug_buffer_usrstr(void __user *usr_str, int usr_str_nob);
int cfs_trace_daemon_command(char *str);
int cfs_trace_daemon_command_usrstr(void __user *usr_str, int usr_str_nob);
int cfs_trace_set_debug_mb(int mb);
int cfs_trace_get_debug_mb(void);
void libcfs_debug_dumplog_internal(void *arg);
void libcfs_register_panic_notifier(void);
void libcfs_unregister_panic_notifier(void);
extern int libcfs_panic_in_progress;
int cfs_trace_max_debug_mb(void);
#define TCD_MAX_PAGES (5 << (20 - PAGE_SHIFT))
#define TCD_STOCK_PAGES (TCD_MAX_PAGES)
#define CFS_TRACEFILE_SIZE (500 << 20)
#ifdef LUSTRE_TRACEFILE_PRIVATE
/*
* Private declare for tracefile
*/
#define TCD_MAX_PAGES (5 << (20 - PAGE_SHIFT))
#define TCD_STOCK_PAGES (TCD_MAX_PAGES)
#define CFS_TRACEFILE_SIZE (500 << 20)
/*
* Size of a buffer for sprinting console messages if we can't get a page
* from system
*/
#define CFS_TRACE_CONSOLE_BUFFER_SIZE 1024
union cfs_trace_data_union {
struct cfs_trace_cpu_data {
/*
* Even though this structure is meant to be per-CPU, locking
* is needed because in some places the data may be accessed
* from other CPUs. This lock is directly used in trace_get_tcd
* and trace_put_tcd, which are called in libcfs_debug_vmsg2 and
* tcd_for_each_type_lock
*/
spinlock_t tcd_lock;
unsigned long tcd_lock_flags;
/*
* pages with trace records not yet processed by tracefiled.
*/
struct list_head tcd_pages;
/* number of pages on ->tcd_pages */
unsigned long tcd_cur_pages;
/*
* pages with trace records already processed by
* tracefiled. These pages are kept in memory, so that some
* portion of log can be written in the event of LBUG. This
* list is maintained in LRU order.
*
* Pages are moved to ->tcd_daemon_pages by tracefiled()
* (put_pages_on_daemon_list()). LRU pages from this list are
* discarded when list grows too large.
*/
struct list_head tcd_daemon_pages;
/* number of pages on ->tcd_daemon_pages */
unsigned long tcd_cur_daemon_pages;
/*
* Maximal number of pages allowed on ->tcd_pages and
* ->tcd_daemon_pages each.
* Always TCD_MAX_PAGES * tcd_pages_factor / 100 in current
* implementation.
*/
unsigned long tcd_max_pages;
/*
* preallocated pages to write trace records into. Pages from
* ->tcd_stock_pages are moved to ->tcd_pages by
* portals_debug_msg().
*
* This list is necessary, because on some platforms it's
* impossible to perform efficient atomic page allocation in a
* non-blockable context.
*
* Such platforms fill ->tcd_stock_pages "on occasion", when
* tracing code is entered in blockable context.
*
* trace_get_tage_try() tries to get a page from
* ->tcd_stock_pages first and resorts to atomic page
* allocation only if this queue is empty. ->tcd_stock_pages
* is replenished when tracing code is entered in blocking
* context (darwin-tracefile.c:trace_get_tcd()). We try to
* maintain TCD_STOCK_PAGES (40 by default) pages in this
* queue. Atomic allocation is only required if more than
* TCD_STOCK_PAGES pagesful are consumed by trace records all
* emitted in non-blocking contexts. Which is quite unlikely.
*/
struct list_head tcd_stock_pages;
/* number of pages on ->tcd_stock_pages */
unsigned long tcd_cur_stock_pages;
unsigned short tcd_shutting_down;
unsigned short tcd_cpu;
unsigned short tcd_type;
/* The factors to share debug memory. */
unsigned short tcd_pages_factor;
} tcd;
char __pad[L1_CACHE_ALIGN(sizeof(struct cfs_trace_cpu_data))];
};
#define TCD_MAX_TYPES 8
extern union cfs_trace_data_union (*cfs_trace_data[TCD_MAX_TYPES])[NR_CPUS];
#define cfs_tcd_for_each(tcd, i, j) \
for (i = 0; cfs_trace_data[i]; i++) \
for (j = 0, ((tcd) = &(*cfs_trace_data[i])[j].tcd); \
j < num_possible_cpus(); \
j++, (tcd) = &(*cfs_trace_data[i])[j].tcd)
#define cfs_tcd_for_each_type_lock(tcd, i, cpu) \
for (i = 0; cfs_trace_data[i] && \
(tcd = &(*cfs_trace_data[i])[cpu].tcd) && \
cfs_trace_lock_tcd(tcd, 1); cfs_trace_unlock_tcd(tcd, 1), i++)
void cfs_set_ptldebug_header(struct ptldebug_header *header,
struct libcfs_debug_msg_data *m,
unsigned long stack);
void cfs_print_to_console(struct ptldebug_header *hdr, int mask,
const char *buf, int len, const char *file,
const char *fn);
int cfs_trace_lock_tcd(struct cfs_trace_cpu_data *tcd, int walking);
void cfs_trace_unlock_tcd(struct cfs_trace_cpu_data *tcd, int walking);
extern char *cfs_trace_console_buffers[NR_CPUS][CFS_TCD_TYPE_MAX];
enum cfs_trace_buf_type cfs_trace_buf_idx_get(void);
static inline char *
cfs_trace_get_console_buffer(void)
{
unsigned int i = get_cpu();
unsigned int j = cfs_trace_buf_idx_get();
return cfs_trace_console_buffers[i][j];
}
static inline struct cfs_trace_cpu_data *
cfs_trace_get_tcd(void)
{
struct cfs_trace_cpu_data *tcd =
&(*cfs_trace_data[cfs_trace_buf_idx_get()])[get_cpu()].tcd;
cfs_trace_lock_tcd(tcd, 0);
return tcd;
}
static inline void cfs_trace_put_tcd(struct cfs_trace_cpu_data *tcd)
{
cfs_trace_unlock_tcd(tcd, 0);
put_cpu();
}
int cfs_trace_refill_stock(struct cfs_trace_cpu_data *tcd, gfp_t gfp,
struct list_head *stock);
void cfs_trace_assertion_failed(const char *str,
struct libcfs_debug_msg_data *m);
/* ASSERTION that is safe to use within the debug system */
#define __LASSERT(cond) \
do { \
if (unlikely(!(cond))) { \
LIBCFS_DEBUG_MSG_DATA_DECL(msgdata, D_EMERG, NULL); \
cfs_trace_assertion_failed("ASSERTION("#cond") failed", \
&msgdata); \
} \
} while (0)
#define __LASSERT_TAGE_INVARIANT(tage) \
do { \
__LASSERT(tage); \
__LASSERT(tage->page); \
__LASSERT(tage->used <= PAGE_SIZE); \
__LASSERT(page_count(tage->page) > 0); \
} while (0)
#endif /* LUSTRE_TRACEFILE_PRIVATE */
#endif /* __LIBCFS_TRACEFILE_H__ */

View File

@ -1,10 +0,0 @@
# SPDX-License-Identifier: GPL-2.0
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
obj-$(CONFIG_LNET) += lnet.o
lnet-y := api-ni.o config.o nidstrings.o net_fault.o \
lib-me.o lib-msg.o lib-eq.o lib-md.o lib-ptl.o \
lib-socket.o lib-move.o module.o lo.o \
router.o router_proc.o acceptor.o peer.o

View File

@ -1,501 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/completion.h>
#include <net/sock.h>
#include <linux/lnet/lib-lnet.h>
static int accept_port = 988;
static int accept_backlog = 127;
static int accept_timeout = 5;
static struct {
int pta_shutdown;
struct socket *pta_sock;
struct completion pta_signal;
} lnet_acceptor_state = {
.pta_shutdown = 1
};
int
lnet_acceptor_port(void)
{
return accept_port;
}
EXPORT_SYMBOL(lnet_acceptor_port);
static inline int
lnet_accept_magic(__u32 magic, __u32 constant)
{
return (magic == constant ||
magic == __swab32(constant));
}
static char *accept = "secure";
module_param(accept, charp, 0444);
MODULE_PARM_DESC(accept, "Accept connections (secure|all|none)");
module_param(accept_port, int, 0444);
MODULE_PARM_DESC(accept_port, "Acceptor's port (same on all nodes)");
module_param(accept_backlog, int, 0444);
MODULE_PARM_DESC(accept_backlog, "Acceptor's listen backlog");
module_param(accept_timeout, int, 0644);
MODULE_PARM_DESC(accept_timeout, "Acceptor's timeout (seconds)");
static char *accept_type;
static int
lnet_acceptor_get_tunables(void)
{
/*
* Userland acceptor uses 'accept_type' instead of 'accept', due to
* conflict with 'accept(2)', but kernel acceptor still uses 'accept'
* for compatibility. Hence the trick.
*/
accept_type = accept;
return 0;
}
int
lnet_acceptor_timeout(void)
{
return accept_timeout;
}
EXPORT_SYMBOL(lnet_acceptor_timeout);
void
lnet_connect_console_error(int rc, lnet_nid_t peer_nid,
__u32 peer_ip, int peer_port)
{
switch (rc) {
/* "normal" errors */
case -ECONNREFUSED:
CNETERR("Connection to %s at host %pI4h on port %d was refused: check that Lustre is running on that node.\n",
libcfs_nid2str(peer_nid),
&peer_ip, peer_port);
break;
case -EHOSTUNREACH:
case -ENETUNREACH:
CNETERR("Connection to %s at host %pI4h was unreachable: the network or that node may be down, or Lustre may be misconfigured.\n",
libcfs_nid2str(peer_nid), &peer_ip);
break;
case -ETIMEDOUT:
CNETERR("Connection to %s at host %pI4h on port %d took too long: that node may be hung or experiencing high load.\n",
libcfs_nid2str(peer_nid),
&peer_ip, peer_port);
break;
case -ECONNRESET:
LCONSOLE_ERROR_MSG(0x11b, "Connection to %s at host %pI4h on port %d was reset: is it running a compatible version of Lustre and is %s one of its NIDs?\n",
libcfs_nid2str(peer_nid),
&peer_ip, peer_port,
libcfs_nid2str(peer_nid));
break;
case -EPROTO:
LCONSOLE_ERROR_MSG(0x11c, "Protocol error connecting to %s at host %pI4h on port %d: is it running a compatible version of Lustre?\n",
libcfs_nid2str(peer_nid),
&peer_ip, peer_port);
break;
case -EADDRINUSE:
LCONSOLE_ERROR_MSG(0x11d, "No privileged ports available to connect to %s at host %pI4h on port %d\n",
libcfs_nid2str(peer_nid),
&peer_ip, peer_port);
break;
default:
LCONSOLE_ERROR_MSG(0x11e, "Unexpected error %d connecting to %s at host %pI4h on port %d\n",
rc, libcfs_nid2str(peer_nid),
&peer_ip, peer_port);
break;
}
}
EXPORT_SYMBOL(lnet_connect_console_error);
int
lnet_connect(struct socket **sockp, lnet_nid_t peer_nid,
__u32 local_ip, __u32 peer_ip, int peer_port)
{
struct lnet_acceptor_connreq cr;
struct socket *sock;
int rc;
int port;
int fatal;
BUILD_BUG_ON(sizeof(cr) > 16); /* too big to be on the stack */
for (port = LNET_ACCEPTOR_MAX_RESERVED_PORT;
port >= LNET_ACCEPTOR_MIN_RESERVED_PORT;
--port) {
/* Iterate through reserved ports. */
rc = lnet_sock_connect(&sock, &fatal, local_ip, port, peer_ip,
peer_port);
if (rc) {
if (fatal)
goto failed;
continue;
}
BUILD_BUG_ON(LNET_PROTO_ACCEPTOR_VERSION != 1);
cr.acr_magic = LNET_PROTO_ACCEPTOR_MAGIC;
cr.acr_version = LNET_PROTO_ACCEPTOR_VERSION;
cr.acr_nid = peer_nid;
if (the_lnet.ln_testprotocompat) {
/* single-shot proto check */
lnet_net_lock(LNET_LOCK_EX);
if (the_lnet.ln_testprotocompat & 4) {
cr.acr_version++;
the_lnet.ln_testprotocompat &= ~4;
}
if (the_lnet.ln_testprotocompat & 8) {
cr.acr_magic = LNET_PROTO_MAGIC;
the_lnet.ln_testprotocompat &= ~8;
}
lnet_net_unlock(LNET_LOCK_EX);
}
rc = lnet_sock_write(sock, &cr, sizeof(cr), accept_timeout);
if (rc)
goto failed_sock;
*sockp = sock;
return 0;
}
rc = -EADDRINUSE;
goto failed;
failed_sock:
sock_release(sock);
failed:
lnet_connect_console_error(rc, peer_nid, peer_ip, peer_port);
return rc;
}
EXPORT_SYMBOL(lnet_connect);
static int
lnet_accept(struct socket *sock, __u32 magic)
{
struct lnet_acceptor_connreq cr;
__u32 peer_ip;
int peer_port;
int rc;
int flip;
struct lnet_ni *ni;
char *str;
LASSERT(sizeof(cr) <= 16); /* not too big for the stack */
rc = lnet_sock_getaddr(sock, 1, &peer_ip, &peer_port);
LASSERT(!rc); /* we succeeded before */
if (!lnet_accept_magic(magic, LNET_PROTO_ACCEPTOR_MAGIC)) {
if (lnet_accept_magic(magic, LNET_PROTO_MAGIC)) {
/*
* future version compatibility!
* When LNET unifies protocols over all LNDs, the first
* thing sent will be a version query. I send back
* LNET_PROTO_ACCEPTOR_MAGIC to tell her I'm "old"
*/
memset(&cr, 0, sizeof(cr));
cr.acr_magic = LNET_PROTO_ACCEPTOR_MAGIC;
cr.acr_version = LNET_PROTO_ACCEPTOR_VERSION;
rc = lnet_sock_write(sock, &cr, sizeof(cr),
accept_timeout);
if (rc)
CERROR("Error sending magic+version in response to LNET magic from %pI4h: %d\n",
&peer_ip, rc);
return -EPROTO;
}
if (lnet_accept_magic(magic, LNET_PROTO_TCP_MAGIC))
str = "'old' socknal/tcpnal";
else
str = "unrecognised";
LCONSOLE_ERROR_MSG(0x11f, "Refusing connection from %pI4h magic %08x: %s acceptor protocol\n",
&peer_ip, magic, str);
return -EPROTO;
}
flip = (magic != LNET_PROTO_ACCEPTOR_MAGIC);
rc = lnet_sock_read(sock, &cr.acr_version, sizeof(cr.acr_version),
accept_timeout);
if (rc) {
CERROR("Error %d reading connection request version from %pI4h\n",
rc, &peer_ip);
return -EIO;
}
if (flip)
__swab32s(&cr.acr_version);
if (cr.acr_version != LNET_PROTO_ACCEPTOR_VERSION) {
/*
* future version compatibility!
* An acceptor-specific protocol rev will first send a version
* query. I send back my current version to tell her I'm
* "old".
*/
int peer_version = cr.acr_version;
memset(&cr, 0, sizeof(cr));
cr.acr_magic = LNET_PROTO_ACCEPTOR_MAGIC;
cr.acr_version = LNET_PROTO_ACCEPTOR_VERSION;
rc = lnet_sock_write(sock, &cr, sizeof(cr), accept_timeout);
if (rc)
CERROR("Error sending magic+version in response to version %d from %pI4h: %d\n",
peer_version, &peer_ip, rc);
return -EPROTO;
}
rc = lnet_sock_read(sock, &cr.acr_nid,
sizeof(cr) -
offsetof(struct lnet_acceptor_connreq, acr_nid),
accept_timeout);
if (rc) {
CERROR("Error %d reading connection request from %pI4h\n",
rc, &peer_ip);
return -EIO;
}
if (flip)
__swab64s(&cr.acr_nid);
ni = lnet_net2ni(LNET_NIDNET(cr.acr_nid));
if (!ni || /* no matching net */
ni->ni_nid != cr.acr_nid) { /* right NET, wrong NID! */
if (ni)
lnet_ni_decref(ni);
LCONSOLE_ERROR_MSG(0x120, "Refusing connection from %pI4h for %s: No matching NI\n",
&peer_ip, libcfs_nid2str(cr.acr_nid));
return -EPERM;
}
if (!ni->ni_lnd->lnd_accept) {
/* This catches a request for the loopback LND */
lnet_ni_decref(ni);
LCONSOLE_ERROR_MSG(0x121, "Refusing connection from %pI4h for %s: NI doesn not accept IP connections\n",
&peer_ip, libcfs_nid2str(cr.acr_nid));
return -EPERM;
}
CDEBUG(D_NET, "Accept %s from %pI4h\n",
libcfs_nid2str(cr.acr_nid), &peer_ip);
rc = ni->ni_lnd->lnd_accept(ni, sock);
lnet_ni_decref(ni);
return rc;
}
static int
lnet_acceptor(void *arg)
{
struct socket *newsock;
int rc;
__u32 magic;
__u32 peer_ip;
int peer_port;
int secure = (int)((long)arg);
LASSERT(!lnet_acceptor_state.pta_sock);
rc = lnet_sock_listen(&lnet_acceptor_state.pta_sock, 0, accept_port,
accept_backlog);
if (rc) {
if (rc == -EADDRINUSE)
LCONSOLE_ERROR_MSG(0x122, "Can't start acceptor on port %d: port already in use\n",
accept_port);
else
LCONSOLE_ERROR_MSG(0x123, "Can't start acceptor on port %d: unexpected error %d\n",
accept_port, rc);
lnet_acceptor_state.pta_sock = NULL;
} else {
LCONSOLE(0, "Accept %s, port %d\n", accept_type, accept_port);
}
/* set init status and unblock parent */
lnet_acceptor_state.pta_shutdown = rc;
complete(&lnet_acceptor_state.pta_signal);
if (rc)
return rc;
while (!lnet_acceptor_state.pta_shutdown) {
rc = lnet_sock_accept(&newsock, lnet_acceptor_state.pta_sock);
if (rc) {
if (rc != -EAGAIN) {
CWARN("Accept error %d: pausing...\n", rc);
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(HZ);
}
continue;
}
/* maybe the LNet acceptor thread has been waken */
if (lnet_acceptor_state.pta_shutdown) {
sock_release(newsock);
break;
}
rc = lnet_sock_getaddr(newsock, 1, &peer_ip, &peer_port);
if (rc) {
CERROR("Can't determine new connection's address\n");
goto failed;
}
if (secure && peer_port > LNET_ACCEPTOR_MAX_RESERVED_PORT) {
CERROR("Refusing connection from %pI4h: insecure port %d\n",
&peer_ip, peer_port);
goto failed;
}
rc = lnet_sock_read(newsock, &magic, sizeof(magic),
accept_timeout);
if (rc) {
CERROR("Error %d reading connection request from %pI4h\n",
rc, &peer_ip);
goto failed;
}
rc = lnet_accept(newsock, magic);
if (rc)
goto failed;
continue;
failed:
sock_release(newsock);
}
sock_release(lnet_acceptor_state.pta_sock);
lnet_acceptor_state.pta_sock = NULL;
CDEBUG(D_NET, "Acceptor stopping\n");
/* unblock lnet_acceptor_stop() */
complete(&lnet_acceptor_state.pta_signal);
return 0;
}
static inline int
accept2secure(const char *acc, long *sec)
{
if (!strcmp(acc, "secure")) {
*sec = 1;
return 1;
} else if (!strcmp(acc, "all")) {
*sec = 0;
return 1;
} else if (!strcmp(acc, "none")) {
return 0;
}
LCONSOLE_ERROR_MSG(0x124, "Can't parse 'accept=\"%s\"'\n",
acc);
return -EINVAL;
}
int
lnet_acceptor_start(void)
{
struct task_struct *task;
int rc;
long rc2;
long secure;
/* if acceptor is already running return immediately */
if (!lnet_acceptor_state.pta_shutdown)
return 0;
LASSERT(!lnet_acceptor_state.pta_sock);
rc = lnet_acceptor_get_tunables();
if (rc)
return rc;
init_completion(&lnet_acceptor_state.pta_signal);
rc = accept2secure(accept_type, &secure);
if (rc <= 0)
return rc;
if (!lnet_count_acceptor_nis()) /* not required */
return 0;
task = kthread_run(lnet_acceptor, (void *)(uintptr_t)secure,
"acceptor_%03ld", secure);
if (IS_ERR(task)) {
rc2 = PTR_ERR(task);
CERROR("Can't start acceptor thread: %ld\n", rc2);
return -ESRCH;
}
/* wait for acceptor to startup */
wait_for_completion(&lnet_acceptor_state.pta_signal);
if (!lnet_acceptor_state.pta_shutdown) {
/* started OK */
LASSERT(lnet_acceptor_state.pta_sock);
return 0;
}
LASSERT(!lnet_acceptor_state.pta_sock);
return -ENETDOWN;
}
void
lnet_acceptor_stop(void)
{
struct sock *sk;
if (lnet_acceptor_state.pta_shutdown) /* not running */
return;
lnet_acceptor_state.pta_shutdown = 1;
sk = lnet_acceptor_state.pta_sock->sk;
/* awake any sleepers using safe method */
sk->sk_state_change(sk);
/* block until acceptor signals exit */
wait_for_completion(&lnet_acceptor_state.pta_signal);
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,426 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/lnet/lib-eq.c
*
* Library level Event queue management routines
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/lnet/lib-lnet.h>
/**
* Create an event queue that has room for \a count number of events.
*
* The event queue is circular and older events will be overwritten by new
* ones if they are not removed in time by the user using the functions
* LNetEQGet(), LNetEQWait(), or LNetEQPoll(). It is up to the user to
* determine the appropriate size of the event queue to prevent this loss
* of events. Note that when EQ handler is specified in \a callback, no
* event loss can happen, since the handler is run for each event deposited
* into the EQ.
*
* \param count The number of events to be stored in the event queue. It
* will be rounded up to the next power of two.
* \param callback A handler function that runs when an event is deposited
* into the EQ. The constant value LNET_EQ_HANDLER_NONE can be used to
* indicate that no event handler is desired.
* \param handle On successful return, this location will hold a handle for
* the newly created EQ.
*
* \retval 0 On success.
* \retval -EINVAL If an parameter is not valid.
* \retval -ENOMEM If memory for the EQ can't be allocated.
*
* \see lnet_eq_handler_t for the discussion on EQ handler semantics.
*/
int
LNetEQAlloc(unsigned int count, lnet_eq_handler_t callback,
struct lnet_handle_eq *handle)
{
struct lnet_eq *eq;
LASSERT(the_lnet.ln_refcount > 0);
/*
* We need count to be a power of 2 so that when eq_{enq,deq}_seq
* overflow, they don't skip entries, so the queue has the same
* apparent capacity at all times
*/
if (count)
count = roundup_pow_of_two(count);
if (callback != LNET_EQ_HANDLER_NONE && count)
CWARN("EQ callback is guaranteed to get every event, do you still want to set eqcount %d for polling event which will have locking overhead? Please contact with developer to confirm\n", count);
/*
* count can be 0 if only need callback, we can eliminate
* overhead of enqueue event
*/
if (!count && callback == LNET_EQ_HANDLER_NONE)
return -EINVAL;
eq = kzalloc(sizeof(*eq), GFP_NOFS);
if (!eq)
return -ENOMEM;
if (count) {
eq->eq_events = kvmalloc_array(count, sizeof(struct lnet_event),
GFP_KERNEL | __GFP_ZERO);
if (!eq->eq_events)
goto failed;
/*
* NB allocator has set all event sequence numbers to 0,
* so all them should be earlier than eq_deq_seq
*/
}
eq->eq_deq_seq = 1;
eq->eq_enq_seq = 1;
eq->eq_size = count;
eq->eq_callback = callback;
eq->eq_refs = cfs_percpt_alloc(lnet_cpt_table(),
sizeof(*eq->eq_refs[0]));
if (!eq->eq_refs)
goto failed;
/* MUST hold both exclusive lnet_res_lock */
lnet_res_lock(LNET_LOCK_EX);
/*
* NB: hold lnet_eq_wait_lock for EQ link/unlink, so we can do
* both EQ lookup and poll event with only lnet_eq_wait_lock
*/
lnet_eq_wait_lock();
lnet_res_lh_initialize(&the_lnet.ln_eq_container, &eq->eq_lh);
list_add(&eq->eq_list, &the_lnet.ln_eq_container.rec_active);
lnet_eq_wait_unlock();
lnet_res_unlock(LNET_LOCK_EX);
lnet_eq2handle(handle, eq);
return 0;
failed:
kvfree(eq->eq_events);
if (eq->eq_refs)
cfs_percpt_free(eq->eq_refs);
kfree(eq);
return -ENOMEM;
}
EXPORT_SYMBOL(LNetEQAlloc);
/**
* Release the resources associated with an event queue if it's idle;
* otherwise do nothing and it's up to the user to try again.
*
* \param eqh A handle for the event queue to be released.
*
* \retval 0 If the EQ is not in use and freed.
* \retval -ENOENT If \a eqh does not point to a valid EQ.
* \retval -EBUSY If the EQ is still in use by some MDs.
*/
int
LNetEQFree(struct lnet_handle_eq eqh)
{
struct lnet_eq *eq;
struct lnet_event *events = NULL;
int **refs = NULL;
int *ref;
int rc = 0;
int size = 0;
int i;
LASSERT(the_lnet.ln_refcount > 0);
lnet_res_lock(LNET_LOCK_EX);
/*
* NB: hold lnet_eq_wait_lock for EQ link/unlink, so we can do
* both EQ lookup and poll event with only lnet_eq_wait_lock
*/
lnet_eq_wait_lock();
eq = lnet_handle2eq(&eqh);
if (!eq) {
rc = -ENOENT;
goto out;
}
cfs_percpt_for_each(ref, i, eq->eq_refs) {
LASSERT(*ref >= 0);
if (!*ref)
continue;
CDEBUG(D_NET, "Event equeue (%d: %d) busy on destroy.\n",
i, *ref);
rc = -EBUSY;
goto out;
}
/* stash for free after lock dropped */
events = eq->eq_events;
size = eq->eq_size;
refs = eq->eq_refs;
lnet_res_lh_invalidate(&eq->eq_lh);
list_del(&eq->eq_list);
kfree(eq);
out:
lnet_eq_wait_unlock();
lnet_res_unlock(LNET_LOCK_EX);
kvfree(events);
if (refs)
cfs_percpt_free(refs);
return rc;
}
EXPORT_SYMBOL(LNetEQFree);
void
lnet_eq_enqueue_event(struct lnet_eq *eq, struct lnet_event *ev)
{
/* MUST called with resource lock hold but w/o lnet_eq_wait_lock */
int index;
if (!eq->eq_size) {
LASSERT(eq->eq_callback != LNET_EQ_HANDLER_NONE);
eq->eq_callback(ev);
return;
}
lnet_eq_wait_lock();
ev->sequence = eq->eq_enq_seq++;
LASSERT(is_power_of_2(eq->eq_size));
index = ev->sequence & (eq->eq_size - 1);
eq->eq_events[index] = *ev;
if (eq->eq_callback != LNET_EQ_HANDLER_NONE)
eq->eq_callback(ev);
/* Wake anyone waiting in LNetEQPoll() */
if (waitqueue_active(&the_lnet.ln_eq_waitq))
wake_up_all(&the_lnet.ln_eq_waitq);
lnet_eq_wait_unlock();
}
static int
lnet_eq_dequeue_event(struct lnet_eq *eq, struct lnet_event *ev)
{
int new_index = eq->eq_deq_seq & (eq->eq_size - 1);
struct lnet_event *new_event = &eq->eq_events[new_index];
int rc;
/* must called with lnet_eq_wait_lock hold */
if (LNET_SEQ_GT(eq->eq_deq_seq, new_event->sequence))
return 0;
/* We've got a new event... */
*ev = *new_event;
CDEBUG(D_INFO, "event: %p, sequence: %lu, eq->size: %u\n",
new_event, eq->eq_deq_seq, eq->eq_size);
/* ...but did it overwrite an event we've not seen yet? */
if (eq->eq_deq_seq == new_event->sequence) {
rc = 1;
} else {
/*
* don't complain with CERROR: some EQs are sized small
* anyway; if it's important, the caller should complain
*/
CDEBUG(D_NET, "Event Queue Overflow: eq seq %lu ev seq %lu\n",
eq->eq_deq_seq, new_event->sequence);
rc = -EOVERFLOW;
}
eq->eq_deq_seq = new_event->sequence + 1;
return rc;
}
/**
* A nonblocking function that can be used to get the next event in an EQ.
* If an event handler is associated with the EQ, the handler will run before
* this function returns successfully. The event is removed from the queue.
*
* \param eventq A handle for the event queue.
* \param event On successful return (1 or -EOVERFLOW), this location will
* hold the next event in the EQ.
*
* \retval 0 No pending event in the EQ.
* \retval 1 Indicates success.
* \retval -ENOENT If \a eventq does not point to a valid EQ.
* \retval -EOVERFLOW Indicates success (i.e., an event is returned) and that
* at least one event between this event and the last event obtained from the
* EQ has been dropped due to limited space in the EQ.
*/
/**
* Block the calling process until there is an event in the EQ.
* If an event handler is associated with the EQ, the handler will run before
* this function returns successfully. This function returns the next event
* in the EQ and removes it from the EQ.
*
* \param eventq A handle for the event queue.
* \param event On successful return (1 or -EOVERFLOW), this location will
* hold the next event in the EQ.
*
* \retval 1 Indicates success.
* \retval -ENOENT If \a eventq does not point to a valid EQ.
* \retval -EOVERFLOW Indicates success (i.e., an event is returned) and that
* at least one event between this event and the last event obtained from the
* EQ has been dropped due to limited space in the EQ.
*/
static int
lnet_eq_wait_locked(int *timeout_ms, long state)
__must_hold(&the_lnet.ln_eq_wait_lock)
{
int tms = *timeout_ms;
int wait;
wait_queue_entry_t wl;
unsigned long now;
if (!tms)
return -ENXIO; /* don't want to wait and no new event */
init_waitqueue_entry(&wl, current);
set_current_state(state);
add_wait_queue(&the_lnet.ln_eq_waitq, &wl);
lnet_eq_wait_unlock();
if (tms < 0) {
schedule();
} else {
now = jiffies;
schedule_timeout(msecs_to_jiffies(tms));
tms -= jiffies_to_msecs(jiffies - now);
if (tms < 0) /* no more wait but may have new event */
tms = 0;
}
wait = tms; /* might need to call here again */
*timeout_ms = tms;
lnet_eq_wait_lock();
remove_wait_queue(&the_lnet.ln_eq_waitq, &wl);
return wait;
}
/**
* Block the calling process until there's an event from a set of EQs or
* timeout happens.
*
* If an event handler is associated with the EQ, the handler will run before
* this function returns successfully, in which case the corresponding event
* is consumed.
*
* LNetEQPoll() provides a timeout to allow applications to poll, block for a
* fixed period, or block indefinitely.
*
* \param eventqs,neq An array of EQ handles, and size of the array.
* \param timeout_ms Time in milliseconds to wait for an event to occur on
* one of the EQs. The constant LNET_TIME_FOREVER can be used to indicate an
* infinite timeout.
* \param interruptible, if true, use TASK_INTERRUPTIBLE, else TASK_NOLOAD
* \param event,which On successful return (1 or -EOVERFLOW), \a event will
* hold the next event in the EQs, and \a which will contain the index of the
* EQ from which the event was taken.
*
* \retval 0 No pending event in the EQs after timeout.
* \retval 1 Indicates success.
* \retval -EOVERFLOW Indicates success (i.e., an event is returned) and that
* at least one event between this event and the last event obtained from the
* EQ indicated by \a which has been dropped due to limited space in the EQ.
* \retval -ENOENT If there's an invalid handle in \a eventqs.
*/
int
LNetEQPoll(struct lnet_handle_eq *eventqs, int neq, int timeout_ms,
int interruptible,
struct lnet_event *event, int *which)
{
int wait = 1;
int rc;
int i;
LASSERT(the_lnet.ln_refcount > 0);
if (neq < 1)
return -ENOENT;
lnet_eq_wait_lock();
for (;;) {
for (i = 0; i < neq; i++) {
struct lnet_eq *eq = lnet_handle2eq(&eventqs[i]);
if (!eq) {
lnet_eq_wait_unlock();
return -ENOENT;
}
rc = lnet_eq_dequeue_event(eq, event);
if (rc) {
lnet_eq_wait_unlock();
*which = i;
return rc;
}
}
if (!wait)
break;
/*
* return value of lnet_eq_wait_locked:
* -1 : did nothing and it's sure no new event
* 1 : sleep inside and wait until new event
* 0 : don't want to wait anymore, but might have new event
* so need to call dequeue again
*/
wait = lnet_eq_wait_locked(&timeout_ms,
interruptible ? TASK_INTERRUPTIBLE
: TASK_NOLOAD);
if (wait < 0) /* no new event */
break;
}
lnet_eq_wait_unlock();
return 0;
}

View File

@ -1,463 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/lnet/lib-md.c
*
* Memory Descriptor management routines
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/lnet/lib-lnet.h>
/* must be called with lnet_res_lock held */
void
lnet_md_unlink(struct lnet_libmd *md)
{
if (!(md->md_flags & LNET_MD_FLAG_ZOMBIE)) {
/* first unlink attempt... */
struct lnet_me *me = md->md_me;
md->md_flags |= LNET_MD_FLAG_ZOMBIE;
/*
* Disassociate from ME (if any),
* and unlink it if it was created
* with LNET_UNLINK
*/
if (me) {
/* detach MD from portal */
lnet_ptl_detach_md(me, md);
if (me->me_unlink == LNET_UNLINK)
lnet_me_unlink(me);
}
/* ensure all future handle lookups fail */
lnet_res_lh_invalidate(&md->md_lh);
}
if (md->md_refcount) {
CDEBUG(D_NET, "Queueing unlink of md %p\n", md);
return;
}
CDEBUG(D_NET, "Unlinking md %p\n", md);
if (md->md_eq) {
int cpt = lnet_cpt_of_cookie(md->md_lh.lh_cookie);
LASSERT(*md->md_eq->eq_refs[cpt] > 0);
(*md->md_eq->eq_refs[cpt])--;
}
LASSERT(!list_empty(&md->md_list));
list_del_init(&md->md_list);
kfree(md);
}
static int
lnet_md_build(struct lnet_libmd *lmd, struct lnet_md *umd, int unlink)
{
int i;
unsigned int niov;
int total_length = 0;
lmd->md_me = NULL;
lmd->md_start = umd->start;
lmd->md_offset = 0;
lmd->md_max_size = umd->max_size;
lmd->md_options = umd->options;
lmd->md_user_ptr = umd->user_ptr;
lmd->md_eq = NULL;
lmd->md_threshold = umd->threshold;
lmd->md_refcount = 0;
lmd->md_flags = (unlink == LNET_UNLINK) ? LNET_MD_FLAG_AUTO_UNLINK : 0;
if (umd->options & LNET_MD_IOVEC) {
if (umd->options & LNET_MD_KIOV) /* Can't specify both */
return -EINVAL;
niov = umd->length;
lmd->md_niov = umd->length;
memcpy(lmd->md_iov.iov, umd->start,
niov * sizeof(lmd->md_iov.iov[0]));
for (i = 0; i < (int)niov; i++) {
/* We take the base address on trust */
/* invalid length */
if (lmd->md_iov.iov[i].iov_len <= 0)
return -EINVAL;
total_length += lmd->md_iov.iov[i].iov_len;
}
lmd->md_length = total_length;
if ((umd->options & LNET_MD_MAX_SIZE) && /* use max size */
(umd->max_size < 0 ||
umd->max_size > total_length)) /* illegal max_size */
return -EINVAL;
} else if (umd->options & LNET_MD_KIOV) {
niov = umd->length;
lmd->md_niov = umd->length;
memcpy(lmd->md_iov.kiov, umd->start,
niov * sizeof(lmd->md_iov.kiov[0]));
for (i = 0; i < (int)niov; i++) {
/* We take the page pointer on trust */
if (lmd->md_iov.kiov[i].bv_offset +
lmd->md_iov.kiov[i].bv_len > PAGE_SIZE)
return -EINVAL; /* invalid length */
total_length += lmd->md_iov.kiov[i].bv_len;
}
lmd->md_length = total_length;
if ((umd->options & LNET_MD_MAX_SIZE) && /* max size used */
(umd->max_size < 0 ||
umd->max_size > total_length)) /* illegal max_size */
return -EINVAL;
} else { /* contiguous */
lmd->md_length = umd->length;
niov = 1;
lmd->md_niov = 1;
lmd->md_iov.iov[0].iov_base = umd->start;
lmd->md_iov.iov[0].iov_len = umd->length;
if ((umd->options & LNET_MD_MAX_SIZE) && /* max size used */
(umd->max_size < 0 ||
umd->max_size > (int)umd->length)) /* illegal max_size */
return -EINVAL;
}
return 0;
}
/* must be called with resource lock held */
static int
lnet_md_link(struct lnet_libmd *md, struct lnet_handle_eq eq_handle, int cpt)
{
struct lnet_res_container *container = the_lnet.ln_md_containers[cpt];
/*
* NB we are passed an allocated, but inactive md.
* if we return success, caller may lnet_md_unlink() it.
* otherwise caller may only kfree() it.
*/
/*
* This implementation doesn't know how to create START events or
* disable END events. Best to LASSERT our caller is compliant so
* we find out quickly...
*/
/*
* TODO - reevaluate what should be here in light of
* the removal of the start and end events
* maybe there we shouldn't even allow LNET_EQ_NONE!)
* LASSERT(!eq);
*/
if (!LNetEQHandleIsInvalid(eq_handle)) {
md->md_eq = lnet_handle2eq(&eq_handle);
if (!md->md_eq)
return -ENOENT;
(*md->md_eq->eq_refs[cpt])++;
}
lnet_res_lh_initialize(container, &md->md_lh);
LASSERT(list_empty(&md->md_list));
list_add(&md->md_list, &container->rec_active);
return 0;
}
/* must be called with lnet_res_lock held */
void
lnet_md_deconstruct(struct lnet_libmd *lmd, struct lnet_md *umd)
{
/* NB this doesn't copy out all the iov entries so when a
* discontiguous MD is copied out, the target gets to know the
* original iov pointer (in start) and the number of entries it had
* and that's all.
*/
umd->start = lmd->md_start;
umd->length = !(lmd->md_options &
(LNET_MD_IOVEC | LNET_MD_KIOV)) ?
lmd->md_length : lmd->md_niov;
umd->threshold = lmd->md_threshold;
umd->max_size = lmd->md_max_size;
umd->options = lmd->md_options;
umd->user_ptr = lmd->md_user_ptr;
lnet_eq2handle(&umd->eq_handle, lmd->md_eq);
}
static int
lnet_md_validate(struct lnet_md *umd)
{
if (!umd->start && umd->length) {
CERROR("MD start pointer can not be NULL with length %u\n",
umd->length);
return -EINVAL;
}
if ((umd->options & (LNET_MD_KIOV | LNET_MD_IOVEC)) &&
umd->length > LNET_MAX_IOV) {
CERROR("Invalid option: too many fragments %u, %d max\n",
umd->length, LNET_MAX_IOV);
return -EINVAL;
}
return 0;
}
/**
* Create a memory descriptor and attach it to a ME
*
* \param meh A handle for a ME to associate the new MD with.
* \param umd Provides initial values for the user-visible parts of a MD.
* Other than its use for initialization, there is no linkage between this
* structure and the MD maintained by the LNet.
* \param unlink A flag to indicate whether the MD is automatically unlinked
* when it becomes inactive, either because the operation threshold drops to
* zero or because the available memory becomes less than \a umd.max_size.
* (Note that the check for unlinking a MD only occurs after the completion
* of a successful operation on the MD.) The value LNET_UNLINK enables auto
* unlinking; the value LNET_RETAIN disables it.
* \param handle On successful returns, a handle to the newly created MD is
* saved here. This handle can be used later in LNetMDUnlink().
*
* \retval 0 On success.
* \retval -EINVAL If \a umd is not valid.
* \retval -ENOMEM If new MD cannot be allocated.
* \retval -ENOENT Either \a meh or \a umd.eq_handle does not point to a
* valid object. Note that it's OK to supply a NULL \a umd.eq_handle by
* calling LNetInvalidateHandle() on it.
* \retval -EBUSY If the ME pointed to by \a meh is already associated with
* a MD.
*/
int
LNetMDAttach(struct lnet_handle_me meh, struct lnet_md umd,
enum lnet_unlink unlink, struct lnet_handle_md *handle)
{
LIST_HEAD(matches);
LIST_HEAD(drops);
struct lnet_me *me;
struct lnet_libmd *md;
int cpt;
int rc;
LASSERT(the_lnet.ln_refcount > 0);
if (lnet_md_validate(&umd))
return -EINVAL;
if (!(umd.options & (LNET_MD_OP_GET | LNET_MD_OP_PUT))) {
CERROR("Invalid option: no MD_OP set\n");
return -EINVAL;
}
md = lnet_md_alloc(&umd);
if (!md)
return -ENOMEM;
rc = lnet_md_build(md, &umd, unlink);
if (rc)
goto out_free;
cpt = lnet_cpt_of_cookie(meh.cookie);
lnet_res_lock(cpt);
me = lnet_handle2me(&meh);
if (!me)
rc = -ENOENT;
else if (me->me_md)
rc = -EBUSY;
else
rc = lnet_md_link(md, umd.eq_handle, cpt);
if (rc)
goto out_unlock;
/*
* attach this MD to portal of ME and check if it matches any
* blocked msgs on this portal
*/
lnet_ptl_attach_md(me, md, &matches, &drops);
lnet_md2handle(handle, md);
lnet_res_unlock(cpt);
lnet_drop_delayed_msg_list(&drops, "Bad match");
lnet_recv_delayed_msg_list(&matches);
return 0;
out_unlock:
lnet_res_unlock(cpt);
out_free:
kfree(md);
return rc;
}
EXPORT_SYMBOL(LNetMDAttach);
/**
* Create a "free floating" memory descriptor - a MD that is not associated
* with a ME. Such MDs are usually used in LNetPut() and LNetGet() operations.
*
* \param umd,unlink See the discussion for LNetMDAttach().
* \param handle On successful returns, a handle to the newly created MD is
* saved here. This handle can be used later in LNetMDUnlink(), LNetPut(),
* and LNetGet() operations.
*
* \retval 0 On success.
* \retval -EINVAL If \a umd is not valid.
* \retval -ENOMEM If new MD cannot be allocated.
* \retval -ENOENT \a umd.eq_handle does not point to a valid EQ. Note that
* it's OK to supply a NULL \a umd.eq_handle by calling
* LNetInvalidateHandle() on it.
*/
int
LNetMDBind(struct lnet_md umd, enum lnet_unlink unlink,
struct lnet_handle_md *handle)
{
struct lnet_libmd *md;
int cpt;
int rc;
LASSERT(the_lnet.ln_refcount > 0);
if (lnet_md_validate(&umd))
return -EINVAL;
if ((umd.options & (LNET_MD_OP_GET | LNET_MD_OP_PUT))) {
CERROR("Invalid option: GET|PUT illegal on active MDs\n");
return -EINVAL;
}
md = lnet_md_alloc(&umd);
if (!md)
return -ENOMEM;
rc = lnet_md_build(md, &umd, unlink);
if (rc)
goto out_free;
cpt = lnet_res_lock_current();
rc = lnet_md_link(md, umd.eq_handle, cpt);
if (rc)
goto out_unlock;
lnet_md2handle(handle, md);
lnet_res_unlock(cpt);
return 0;
out_unlock:
lnet_res_unlock(cpt);
out_free:
kfree(md);
return rc;
}
EXPORT_SYMBOL(LNetMDBind);
/**
* Unlink the memory descriptor from any ME it may be linked to and release
* the internal resources associated with it. As a result, active messages
* associated with the MD may get aborted.
*
* This function does not free the memory region associated with the MD;
* i.e., the memory the user allocated for this MD. If the ME associated with
* this MD is not NULL and was created with auto unlink enabled, the ME is
* unlinked as well (see LNetMEAttach()).
*
* Explicitly unlinking a MD via this function call has the same behavior as
* a MD that has been automatically unlinked, except that no LNET_EVENT_UNLINK
* is generated in the latter case.
*
* An unlinked event can be reported in two ways:
* - If there's no pending operations on the MD, it's unlinked immediately
* and an LNET_EVENT_UNLINK event is logged before this function returns.
* - Otherwise, the MD is only marked for deletion when this function
* returns, and the unlinked event will be piggybacked on the event of
* the completion of the last operation by setting the unlinked field of
* the event. No dedicated LNET_EVENT_UNLINK event is generated.
*
* Note that in both cases the unlinked field of the event is always set; no
* more event will happen on the MD after such an event is logged.
*
* \param mdh A handle for the MD to be unlinked.
*
* \retval 0 On success.
* \retval -ENOENT If \a mdh does not point to a valid MD object.
*/
int
LNetMDUnlink(struct lnet_handle_md mdh)
{
struct lnet_event ev;
struct lnet_libmd *md;
int cpt;
LASSERT(the_lnet.ln_refcount > 0);
cpt = lnet_cpt_of_cookie(mdh.cookie);
lnet_res_lock(cpt);
md = lnet_handle2md(&mdh);
if (!md) {
lnet_res_unlock(cpt);
return -ENOENT;
}
md->md_flags |= LNET_MD_FLAG_ABORTED;
/*
* If the MD is busy, lnet_md_unlink just marks it for deletion, and
* when the LND is done, the completion event flags that the MD was
* unlinked. Otherwise, we enqueue an event now...
*/
if (md->md_eq && !md->md_refcount) {
lnet_build_unlink_event(md, &ev);
lnet_eq_enqueue_event(md->md_eq, &ev);
}
lnet_md_unlink(md);
lnet_res_unlock(cpt);
return 0;
}
EXPORT_SYMBOL(LNetMDUnlink);

View File

@ -1,274 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/lnet/lib-me.c
*
* Match Entry management routines
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/lnet/lib-lnet.h>
/**
* Create and attach a match entry to the match list of \a portal. The new
* ME is empty, i.e. not associated with a memory descriptor. LNetMDAttach()
* can be used to attach a MD to an empty ME.
*
* \param portal The portal table index where the ME should be attached.
* \param match_id Specifies the match criteria for the process ID of
* the requester. The constants LNET_PID_ANY and LNET_NID_ANY can be
* used to wildcard either of the identifiers in the lnet_process_id
* structure.
* \param match_bits,ignore_bits Specify the match criteria to apply
* to the match bits in the incoming request. The ignore bits are used
* to mask out insignificant bits in the incoming match bits. The resulting
* bits are then compared to the ME's match bits to determine if the
* incoming request meets the match criteria.
* \param unlink Indicates whether the ME should be unlinked when the memory
* descriptor associated with it is unlinked (Note that the check for
* unlinking a ME only occurs when the memory descriptor is unlinked.).
* Valid values are LNET_RETAIN and LNET_UNLINK.
* \param pos Indicates whether the new ME should be prepended or
* appended to the match list. Allowed constants: LNET_INS_BEFORE,
* LNET_INS_AFTER.
* \param handle On successful returns, a handle to the newly created ME
* object is saved here. This handle can be used later in LNetMEInsert(),
* LNetMEUnlink(), or LNetMDAttach() functions.
*
* \retval 0 On success.
* \retval -EINVAL If \a portal is invalid.
* \retval -ENOMEM If new ME object cannot be allocated.
*/
int
LNetMEAttach(unsigned int portal,
struct lnet_process_id match_id,
__u64 match_bits, __u64 ignore_bits,
enum lnet_unlink unlink, enum lnet_ins_pos pos,
struct lnet_handle_me *handle)
{
struct lnet_match_table *mtable;
struct lnet_me *me;
struct list_head *head;
LASSERT(the_lnet.ln_refcount > 0);
if ((int)portal >= the_lnet.ln_nportals)
return -EINVAL;
mtable = lnet_mt_of_attach(portal, match_id,
match_bits, ignore_bits, pos);
if (!mtable) /* can't match portal type */
return -EPERM;
me = kzalloc(sizeof(*me), GFP_NOFS);
if (!me)
return -ENOMEM;
lnet_res_lock(mtable->mt_cpt);
me->me_portal = portal;
me->me_match_id = match_id;
me->me_match_bits = match_bits;
me->me_ignore_bits = ignore_bits;
me->me_unlink = unlink;
me->me_md = NULL;
lnet_res_lh_initialize(the_lnet.ln_me_containers[mtable->mt_cpt],
&me->me_lh);
if (ignore_bits)
head = &mtable->mt_mhash[LNET_MT_HASH_IGNORE];
else
head = lnet_mt_match_head(mtable, match_id, match_bits);
me->me_pos = head - &mtable->mt_mhash[0];
if (pos == LNET_INS_AFTER || pos == LNET_INS_LOCAL)
list_add_tail(&me->me_list, head);
else
list_add(&me->me_list, head);
lnet_me2handle(handle, me);
lnet_res_unlock(mtable->mt_cpt);
return 0;
}
EXPORT_SYMBOL(LNetMEAttach);
/**
* Create and a match entry and insert it before or after the ME pointed to by
* \a current_meh. The new ME is empty, i.e. not associated with a memory
* descriptor. LNetMDAttach() can be used to attach a MD to an empty ME.
*
* This function is identical to LNetMEAttach() except for the position
* where the new ME is inserted.
*
* \param current_meh A handle for a ME. The new ME will be inserted
* immediately before or immediately after this ME.
* \param match_id,match_bits,ignore_bits,unlink,pos,handle See the discussion
* for LNetMEAttach().
*
* \retval 0 On success.
* \retval -ENOMEM If new ME object cannot be allocated.
* \retval -ENOENT If \a current_meh does not point to a valid match entry.
*/
int
LNetMEInsert(struct lnet_handle_me current_meh,
struct lnet_process_id match_id,
__u64 match_bits, __u64 ignore_bits,
enum lnet_unlink unlink, enum lnet_ins_pos pos,
struct lnet_handle_me *handle)
{
struct lnet_me *current_me;
struct lnet_me *new_me;
struct lnet_portal *ptl;
int cpt;
LASSERT(the_lnet.ln_refcount > 0);
if (pos == LNET_INS_LOCAL)
return -EPERM;
new_me = kzalloc(sizeof(*new_me), GFP_NOFS);
if (!new_me)
return -ENOMEM;
cpt = lnet_cpt_of_cookie(current_meh.cookie);
lnet_res_lock(cpt);
current_me = lnet_handle2me(&current_meh);
if (!current_me) {
kfree(new_me);
lnet_res_unlock(cpt);
return -ENOENT;
}
LASSERT(current_me->me_portal < the_lnet.ln_nportals);
ptl = the_lnet.ln_portals[current_me->me_portal];
if (lnet_ptl_is_unique(ptl)) {
/* nosense to insertion on unique portal */
kfree(new_me);
lnet_res_unlock(cpt);
return -EPERM;
}
new_me->me_pos = current_me->me_pos;
new_me->me_portal = current_me->me_portal;
new_me->me_match_id = match_id;
new_me->me_match_bits = match_bits;
new_me->me_ignore_bits = ignore_bits;
new_me->me_unlink = unlink;
new_me->me_md = NULL;
lnet_res_lh_initialize(the_lnet.ln_me_containers[cpt], &new_me->me_lh);
if (pos == LNET_INS_AFTER)
list_add(&new_me->me_list, &current_me->me_list);
else
list_add_tail(&new_me->me_list, &current_me->me_list);
lnet_me2handle(handle, new_me);
lnet_res_unlock(cpt);
return 0;
}
EXPORT_SYMBOL(LNetMEInsert);
/**
* Unlink a match entry from its match list.
*
* This operation also releases any resources associated with the ME. If a
* memory descriptor is attached to the ME, then it will be unlinked as well
* and an unlink event will be generated. It is an error to use the ME handle
* after calling LNetMEUnlink().
*
* \param meh A handle for the ME to be unlinked.
*
* \retval 0 On success.
* \retval -ENOENT If \a meh does not point to a valid ME.
* \see LNetMDUnlink() for the discussion on delivering unlink event.
*/
int
LNetMEUnlink(struct lnet_handle_me meh)
{
struct lnet_me *me;
struct lnet_libmd *md;
struct lnet_event ev;
int cpt;
LASSERT(the_lnet.ln_refcount > 0);
cpt = lnet_cpt_of_cookie(meh.cookie);
lnet_res_lock(cpt);
me = lnet_handle2me(&meh);
if (!me) {
lnet_res_unlock(cpt);
return -ENOENT;
}
md = me->me_md;
if (md) {
md->md_flags |= LNET_MD_FLAG_ABORTED;
if (md->md_eq && !md->md_refcount) {
lnet_build_unlink_event(md, &ev);
lnet_eq_enqueue_event(md->md_eq, &ev);
}
}
lnet_me_unlink(me);
lnet_res_unlock(cpt);
return 0;
}
EXPORT_SYMBOL(LNetMEUnlink);
/* call with lnet_res_lock please */
void
lnet_me_unlink(struct lnet_me *me)
{
list_del(&me->me_list);
if (me->me_md) {
struct lnet_libmd *md = me->me_md;
/* detach MD from portal of this ME */
lnet_ptl_detach_md(me, md);
lnet_md_unlink(md);
}
lnet_res_lh_invalidate(&me->me_lh);
kfree(me);
}

File diff suppressed because it is too large Load Diff

View File

@ -1,625 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2003, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/lnet/lib-msg.c
*
* Message decoding, parsing and finalizing routines
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/lnet/lib-lnet.h>
void
lnet_build_unlink_event(struct lnet_libmd *md, struct lnet_event *ev)
{
memset(ev, 0, sizeof(*ev));
ev->status = 0;
ev->unlinked = 1;
ev->type = LNET_EVENT_UNLINK;
lnet_md_deconstruct(md, &ev->md);
lnet_md2handle(&ev->md_handle, md);
}
/*
* Don't need any lock, must be called after lnet_commit_md
*/
void
lnet_build_msg_event(struct lnet_msg *msg, enum lnet_event_kind ev_type)
{
struct lnet_hdr *hdr = &msg->msg_hdr;
struct lnet_event *ev = &msg->msg_ev;
LASSERT(!msg->msg_routing);
ev->type = ev_type;
if (ev_type == LNET_EVENT_SEND) {
/* event for active message */
ev->target.nid = le64_to_cpu(hdr->dest_nid);
ev->target.pid = le32_to_cpu(hdr->dest_pid);
ev->initiator.nid = LNET_NID_ANY;
ev->initiator.pid = the_lnet.ln_pid;
ev->sender = LNET_NID_ANY;
} else {
/* event for passive message */
ev->target.pid = hdr->dest_pid;
ev->target.nid = hdr->dest_nid;
ev->initiator.pid = hdr->src_pid;
ev->initiator.nid = hdr->src_nid;
ev->rlength = hdr->payload_length;
ev->sender = msg->msg_from;
ev->mlength = msg->msg_wanted;
ev->offset = msg->msg_offset;
}
switch (ev_type) {
default:
LBUG();
case LNET_EVENT_PUT: /* passive PUT */
ev->pt_index = hdr->msg.put.ptl_index;
ev->match_bits = hdr->msg.put.match_bits;
ev->hdr_data = hdr->msg.put.hdr_data;
return;
case LNET_EVENT_GET: /* passive GET */
ev->pt_index = hdr->msg.get.ptl_index;
ev->match_bits = hdr->msg.get.match_bits;
ev->hdr_data = 0;
return;
case LNET_EVENT_ACK: /* ACK */
ev->match_bits = hdr->msg.ack.match_bits;
ev->mlength = hdr->msg.ack.mlength;
return;
case LNET_EVENT_REPLY: /* REPLY */
return;
case LNET_EVENT_SEND: /* active message */
if (msg->msg_type == LNET_MSG_PUT) {
ev->pt_index = le32_to_cpu(hdr->msg.put.ptl_index);
ev->match_bits = le64_to_cpu(hdr->msg.put.match_bits);
ev->offset = le32_to_cpu(hdr->msg.put.offset);
ev->mlength =
ev->rlength = le32_to_cpu(hdr->payload_length);
ev->hdr_data = le64_to_cpu(hdr->msg.put.hdr_data);
} else {
LASSERT(msg->msg_type == LNET_MSG_GET);
ev->pt_index = le32_to_cpu(hdr->msg.get.ptl_index);
ev->match_bits = le64_to_cpu(hdr->msg.get.match_bits);
ev->mlength =
ev->rlength = le32_to_cpu(hdr->msg.get.sink_length);
ev->offset = le32_to_cpu(hdr->msg.get.src_offset);
ev->hdr_data = 0;
}
return;
}
}
void
lnet_msg_commit(struct lnet_msg *msg, int cpt)
{
struct lnet_msg_container *container = the_lnet.ln_msg_containers[cpt];
struct lnet_counters *counters = the_lnet.ln_counters[cpt];
/* routed message can be committed for both receiving and sending */
LASSERT(!msg->msg_tx_committed);
if (msg->msg_sending) {
LASSERT(!msg->msg_receiving);
msg->msg_tx_cpt = cpt;
msg->msg_tx_committed = 1;
if (msg->msg_rx_committed) { /* routed message REPLY */
LASSERT(msg->msg_onactivelist);
return;
}
} else {
LASSERT(!msg->msg_sending);
msg->msg_rx_cpt = cpt;
msg->msg_rx_committed = 1;
}
LASSERT(!msg->msg_onactivelist);
msg->msg_onactivelist = 1;
list_add(&msg->msg_activelist, &container->msc_active);
counters->msgs_alloc++;
if (counters->msgs_alloc > counters->msgs_max)
counters->msgs_max = counters->msgs_alloc;
}
static void
lnet_msg_decommit_tx(struct lnet_msg *msg, int status)
{
struct lnet_counters *counters;
struct lnet_event *ev = &msg->msg_ev;
LASSERT(msg->msg_tx_committed);
if (status)
goto out;
counters = the_lnet.ln_counters[msg->msg_tx_cpt];
switch (ev->type) {
default: /* routed message */
LASSERT(msg->msg_routing);
LASSERT(msg->msg_rx_committed);
LASSERT(!ev->type);
counters->route_length += msg->msg_len;
counters->route_count++;
goto out;
case LNET_EVENT_PUT:
/* should have been decommitted */
LASSERT(!msg->msg_rx_committed);
/* overwritten while sending ACK */
LASSERT(msg->msg_type == LNET_MSG_ACK);
msg->msg_type = LNET_MSG_PUT; /* fix type */
break;
case LNET_EVENT_SEND:
LASSERT(!msg->msg_rx_committed);
if (msg->msg_type == LNET_MSG_PUT)
counters->send_length += msg->msg_len;
break;
case LNET_EVENT_GET:
LASSERT(msg->msg_rx_committed);
/*
* overwritten while sending reply, we should never be
* here for optimized GET
*/
LASSERT(msg->msg_type == LNET_MSG_REPLY);
msg->msg_type = LNET_MSG_GET; /* fix type */
break;
}
counters->send_count++;
out:
lnet_return_tx_credits_locked(msg);
msg->msg_tx_committed = 0;
}
static void
lnet_msg_decommit_rx(struct lnet_msg *msg, int status)
{
struct lnet_counters *counters;
struct lnet_event *ev = &msg->msg_ev;
LASSERT(!msg->msg_tx_committed); /* decommitted or never committed */
LASSERT(msg->msg_rx_committed);
if (status)
goto out;
counters = the_lnet.ln_counters[msg->msg_rx_cpt];
switch (ev->type) {
default:
LASSERT(!ev->type);
LASSERT(msg->msg_routing);
goto out;
case LNET_EVENT_ACK:
LASSERT(msg->msg_type == LNET_MSG_ACK);
break;
case LNET_EVENT_GET:
/*
* type is "REPLY" if it's an optimized GET on passive side,
* because optimized GET will never be committed for sending,
* so message type wouldn't be changed back to "GET" by
* lnet_msg_decommit_tx(), see details in lnet_parse_get()
*/
LASSERT(msg->msg_type == LNET_MSG_REPLY ||
msg->msg_type == LNET_MSG_GET);
counters->send_length += msg->msg_wanted;
break;
case LNET_EVENT_PUT:
LASSERT(msg->msg_type == LNET_MSG_PUT);
break;
case LNET_EVENT_REPLY:
/*
* type is "GET" if it's an optimized GET on active side,
* see details in lnet_create_reply_msg()
*/
LASSERT(msg->msg_type == LNET_MSG_GET ||
msg->msg_type == LNET_MSG_REPLY);
break;
}
counters->recv_count++;
if (ev->type == LNET_EVENT_PUT || ev->type == LNET_EVENT_REPLY)
counters->recv_length += msg->msg_wanted;
out:
lnet_return_rx_credits_locked(msg);
msg->msg_rx_committed = 0;
}
void
lnet_msg_decommit(struct lnet_msg *msg, int cpt, int status)
{
int cpt2 = cpt;
LASSERT(msg->msg_tx_committed || msg->msg_rx_committed);
LASSERT(msg->msg_onactivelist);
if (msg->msg_tx_committed) { /* always decommit for sending first */
LASSERT(cpt == msg->msg_tx_cpt);
lnet_msg_decommit_tx(msg, status);
}
if (msg->msg_rx_committed) {
/* forwarding msg committed for both receiving and sending */
if (cpt != msg->msg_rx_cpt) {
lnet_net_unlock(cpt);
cpt2 = msg->msg_rx_cpt;
lnet_net_lock(cpt2);
}
lnet_msg_decommit_rx(msg, status);
}
list_del(&msg->msg_activelist);
msg->msg_onactivelist = 0;
the_lnet.ln_counters[cpt2]->msgs_alloc--;
if (cpt2 != cpt) {
lnet_net_unlock(cpt2);
lnet_net_lock(cpt);
}
}
void
lnet_msg_attach_md(struct lnet_msg *msg, struct lnet_libmd *md,
unsigned int offset, unsigned int mlen)
{
/* NB: @offset and @len are only useful for receiving */
/*
* Here, we attach the MD on lnet_msg and mark it busy and
* decrementing its threshold. Come what may, the lnet_msg "owns"
* the MD until a call to lnet_msg_detach_md or lnet_finalize()
* signals completion.
*/
LASSERT(!msg->msg_routing);
msg->msg_md = md;
if (msg->msg_receiving) { /* committed for receiving */
msg->msg_offset = offset;
msg->msg_wanted = mlen;
}
md->md_refcount++;
if (md->md_threshold != LNET_MD_THRESH_INF) {
LASSERT(md->md_threshold > 0);
md->md_threshold--;
}
/* build umd in event */
lnet_md2handle(&msg->msg_ev.md_handle, md);
lnet_md_deconstruct(md, &msg->msg_ev.md);
}
void
lnet_msg_detach_md(struct lnet_msg *msg, int status)
{
struct lnet_libmd *md = msg->msg_md;
int unlink;
/* Now it's safe to drop my caller's ref */
md->md_refcount--;
LASSERT(md->md_refcount >= 0);
unlink = lnet_md_unlinkable(md);
if (md->md_eq) {
msg->msg_ev.status = status;
msg->msg_ev.unlinked = unlink;
lnet_eq_enqueue_event(md->md_eq, &msg->msg_ev);
}
if (unlink)
lnet_md_unlink(md);
msg->msg_md = NULL;
}
static int
lnet_complete_msg_locked(struct lnet_msg *msg, int cpt)
{
struct lnet_handle_wire ack_wmd;
int rc;
int status = msg->msg_ev.status;
LASSERT(msg->msg_onactivelist);
if (!status && msg->msg_ack) {
/* Only send an ACK if the PUT completed successfully */
lnet_msg_decommit(msg, cpt, 0);
msg->msg_ack = 0;
lnet_net_unlock(cpt);
LASSERT(msg->msg_ev.type == LNET_EVENT_PUT);
LASSERT(!msg->msg_routing);
ack_wmd = msg->msg_hdr.msg.put.ack_wmd;
lnet_prep_send(msg, LNET_MSG_ACK, msg->msg_ev.initiator, 0, 0);
msg->msg_hdr.msg.ack.dst_wmd = ack_wmd;
msg->msg_hdr.msg.ack.match_bits = msg->msg_ev.match_bits;
msg->msg_hdr.msg.ack.mlength = cpu_to_le32(msg->msg_ev.mlength);
/*
* NB: we probably want to use NID of msg::msg_from as 3rd
* parameter (router NID) if it's routed message
*/
rc = lnet_send(msg->msg_ev.target.nid, msg, LNET_NID_ANY);
lnet_net_lock(cpt);
/*
* NB: message is committed for sending, we should return
* on success because LND will finalize this message later.
*
* Also, there is possibility that message is committed for
* sending and also failed before delivering to LND,
* i.e: ENOMEM, in that case we can't fall through either
* because CPT for sending can be different with CPT for
* receiving, so we should return back to lnet_finalize()
* to make sure we are locking the correct partition.
*/
return rc;
} else if (!status && /* OK so far */
(msg->msg_routing && !msg->msg_sending)) {
/* not forwarded */
LASSERT(!msg->msg_receiving); /* called back recv already */
lnet_net_unlock(cpt);
rc = lnet_send(LNET_NID_ANY, msg, LNET_NID_ANY);
lnet_net_lock(cpt);
/*
* NB: message is committed for sending, we should return
* on success because LND will finalize this message later.
*
* Also, there is possibility that message is committed for
* sending and also failed before delivering to LND,
* i.e: ENOMEM, in that case we can't fall through either:
* - The rule is message must decommit for sending first if
* the it's committed for both sending and receiving
* - CPT for sending can be different with CPT for receiving,
* so we should return back to lnet_finalize() to make
* sure we are locking the correct partition.
*/
return rc;
}
lnet_msg_decommit(msg, cpt, status);
kfree(msg);
return 0;
}
void
lnet_finalize(struct lnet_ni *ni, struct lnet_msg *msg, int status)
{
struct lnet_msg_container *container;
int my_slot;
int cpt;
int rc;
int i;
LASSERT(!in_interrupt());
if (!msg)
return;
msg->msg_ev.status = status;
if (msg->msg_md) {
cpt = lnet_cpt_of_cookie(msg->msg_md->md_lh.lh_cookie);
lnet_res_lock(cpt);
lnet_msg_detach_md(msg, status);
lnet_res_unlock(cpt);
}
again:
rc = 0;
if (!msg->msg_tx_committed && !msg->msg_rx_committed) {
/* not committed to network yet */
LASSERT(!msg->msg_onactivelist);
kfree(msg);
return;
}
/*
* NB: routed message can be committed for both receiving and sending,
* we should finalize in LIFO order and keep counters correct.
* (finalize sending first then finalize receiving)
*/
cpt = msg->msg_tx_committed ? msg->msg_tx_cpt : msg->msg_rx_cpt;
lnet_net_lock(cpt);
container = the_lnet.ln_msg_containers[cpt];
list_add_tail(&msg->msg_list, &container->msc_finalizing);
/*
* Recursion breaker. Don't complete the message here if I am (or
* enough other threads are) already completing messages
*/
my_slot = -1;
for (i = 0; i < container->msc_nfinalizers; i++) {
if (container->msc_finalizers[i] == current)
break;
if (my_slot < 0 && !container->msc_finalizers[i])
my_slot = i;
}
if (i < container->msc_nfinalizers || my_slot < 0) {
lnet_net_unlock(cpt);
return;
}
container->msc_finalizers[my_slot] = current;
while (!list_empty(&container->msc_finalizing)) {
msg = list_entry(container->msc_finalizing.next,
struct lnet_msg, msg_list);
list_del(&msg->msg_list);
/*
* NB drops and regains the lnet lock if it actually does
* anything, so my finalizing friends can chomp along too
*/
rc = lnet_complete_msg_locked(msg, cpt);
if (rc)
break;
}
if (unlikely(!list_empty(&the_lnet.ln_delay_rules))) {
lnet_net_unlock(cpt);
lnet_delay_rule_check();
lnet_net_lock(cpt);
}
container->msc_finalizers[my_slot] = NULL;
lnet_net_unlock(cpt);
if (rc)
goto again;
}
EXPORT_SYMBOL(lnet_finalize);
void
lnet_msg_container_cleanup(struct lnet_msg_container *container)
{
int count = 0;
if (!container->msc_init)
return;
while (!list_empty(&container->msc_active)) {
struct lnet_msg *msg;
msg = list_entry(container->msc_active.next,
struct lnet_msg, msg_activelist);
LASSERT(msg->msg_onactivelist);
msg->msg_onactivelist = 0;
list_del(&msg->msg_activelist);
kfree(msg);
count++;
}
if (count > 0)
CERROR("%d active msg on exit\n", count);
kvfree(container->msc_finalizers);
container->msc_finalizers = NULL;
container->msc_init = 0;
}
int
lnet_msg_container_setup(struct lnet_msg_container *container, int cpt)
{
container->msc_init = 1;
INIT_LIST_HEAD(&container->msc_active);
INIT_LIST_HEAD(&container->msc_finalizing);
/* number of CPUs */
container->msc_nfinalizers = cfs_cpt_weight(lnet_cpt_table(), cpt);
container->msc_finalizers = kvzalloc_cpt(container->msc_nfinalizers *
sizeof(*container->msc_finalizers),
GFP_KERNEL, cpt);
if (!container->msc_finalizers) {
CERROR("Failed to allocate message finalizers\n");
lnet_msg_container_cleanup(container);
return -ENOMEM;
}
return 0;
}
void
lnet_msg_containers_destroy(void)
{
struct lnet_msg_container *container;
int i;
if (!the_lnet.ln_msg_containers)
return;
cfs_percpt_for_each(container, i, the_lnet.ln_msg_containers)
lnet_msg_container_cleanup(container);
cfs_percpt_free(the_lnet.ln_msg_containers);
the_lnet.ln_msg_containers = NULL;
}
int
lnet_msg_containers_create(void)
{
struct lnet_msg_container *container;
int rc;
int i;
the_lnet.ln_msg_containers = cfs_percpt_alloc(lnet_cpt_table(),
sizeof(*container));
if (!the_lnet.ln_msg_containers) {
CERROR("Failed to allocate cpu-partition data for network\n");
return -ENOMEM;
}
cfs_percpt_for_each(container, i, the_lnet.ln_msg_containers) {
rc = lnet_msg_container_setup(container, i);
if (rc) {
lnet_msg_containers_destroy();
return rc;
}
}
return 0;
}

View File

@ -1,987 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* GPL HEADER END
*/
/*
* Copyright (c) 2012, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/lnet/lib-ptl.c
*
* portal & match routines
*
* Author: liang@whamcloud.com
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/lnet/lib-lnet.h>
/* NB: add /proc interfaces in upcoming patches */
int portal_rotor = LNET_PTL_ROTOR_HASH_RT;
module_param(portal_rotor, int, 0644);
MODULE_PARM_DESC(portal_rotor, "redirect PUTs to different cpu-partitions");
static int
lnet_ptl_match_type(unsigned int index, struct lnet_process_id match_id,
__u64 mbits, __u64 ignore_bits)
{
struct lnet_portal *ptl = the_lnet.ln_portals[index];
int unique;
unique = !ignore_bits &&
match_id.nid != LNET_NID_ANY &&
match_id.pid != LNET_PID_ANY;
LASSERT(!lnet_ptl_is_unique(ptl) || !lnet_ptl_is_wildcard(ptl));
/* prefer to check w/o any lock */
if (likely(lnet_ptl_is_unique(ptl) || lnet_ptl_is_wildcard(ptl)))
goto match;
/* unset, new portal */
lnet_ptl_lock(ptl);
/* check again with lock */
if (unlikely(lnet_ptl_is_unique(ptl) || lnet_ptl_is_wildcard(ptl))) {
lnet_ptl_unlock(ptl);
goto match;
}
/* still not set */
if (unique)
lnet_ptl_setopt(ptl, LNET_PTL_MATCH_UNIQUE);
else
lnet_ptl_setopt(ptl, LNET_PTL_MATCH_WILDCARD);
lnet_ptl_unlock(ptl);
return 1;
match:
if ((lnet_ptl_is_unique(ptl) && !unique) ||
(lnet_ptl_is_wildcard(ptl) && unique))
return 0;
return 1;
}
static void
lnet_ptl_enable_mt(struct lnet_portal *ptl, int cpt)
{
struct lnet_match_table *mtable = ptl->ptl_mtables[cpt];
int i;
/* with hold of both lnet_res_lock(cpt) and lnet_ptl_lock */
LASSERT(lnet_ptl_is_wildcard(ptl));
mtable->mt_enabled = 1;
ptl->ptl_mt_maps[ptl->ptl_mt_nmaps] = cpt;
for (i = ptl->ptl_mt_nmaps - 1; i >= 0; i--) {
LASSERT(ptl->ptl_mt_maps[i] != cpt);
if (ptl->ptl_mt_maps[i] < cpt)
break;
/* swap to order */
ptl->ptl_mt_maps[i + 1] = ptl->ptl_mt_maps[i];
ptl->ptl_mt_maps[i] = cpt;
}
ptl->ptl_mt_nmaps++;
}
static void
lnet_ptl_disable_mt(struct lnet_portal *ptl, int cpt)
{
struct lnet_match_table *mtable = ptl->ptl_mtables[cpt];
int i;
/* with hold of both lnet_res_lock(cpt) and lnet_ptl_lock */
LASSERT(lnet_ptl_is_wildcard(ptl));
if (LNET_CPT_NUMBER == 1)
return; /* never disable the only match-table */
mtable->mt_enabled = 0;
LASSERT(ptl->ptl_mt_nmaps > 0 &&
ptl->ptl_mt_nmaps <= LNET_CPT_NUMBER);
/* remove it from mt_maps */
ptl->ptl_mt_nmaps--;
for (i = 0; i < ptl->ptl_mt_nmaps; i++) {
if (ptl->ptl_mt_maps[i] >= cpt) /* overwrite it */
ptl->ptl_mt_maps[i] = ptl->ptl_mt_maps[i + 1];
}
}
static int
lnet_try_match_md(struct lnet_libmd *md,
struct lnet_match_info *info, struct lnet_msg *msg)
{
/*
* ALWAYS called holding the lnet_res_lock, and can't lnet_res_unlock;
* lnet_match_blocked_msg() relies on this to avoid races
*/
unsigned int offset;
unsigned int mlength;
struct lnet_me *me = md->md_me;
/* MD exhausted */
if (lnet_md_exhausted(md))
return LNET_MATCHMD_NONE | LNET_MATCHMD_EXHAUSTED;
/* mismatched MD op */
if (!(md->md_options & info->mi_opc))
return LNET_MATCHMD_NONE;
/* mismatched ME nid/pid? */
if (me->me_match_id.nid != LNET_NID_ANY &&
me->me_match_id.nid != info->mi_id.nid)
return LNET_MATCHMD_NONE;
if (me->me_match_id.pid != LNET_PID_ANY &&
me->me_match_id.pid != info->mi_id.pid)
return LNET_MATCHMD_NONE;
/* mismatched ME matchbits? */
if ((me->me_match_bits ^ info->mi_mbits) & ~me->me_ignore_bits)
return LNET_MATCHMD_NONE;
/* Hurrah! This _is_ a match; check it out... */
if (!(md->md_options & LNET_MD_MANAGE_REMOTE))
offset = md->md_offset;
else
offset = info->mi_roffset;
if (md->md_options & LNET_MD_MAX_SIZE) {
mlength = md->md_max_size;
LASSERT(md->md_offset + mlength <= md->md_length);
} else {
mlength = md->md_length - offset;
}
if (info->mi_rlength <= mlength) { /* fits in allowed space */
mlength = info->mi_rlength;
} else if (!(md->md_options & LNET_MD_TRUNCATE)) {
/* this packet _really_ is too big */
CERROR("Matching packet from %s, match %llu length %d too big: %d left, %d allowed\n",
libcfs_id2str(info->mi_id), info->mi_mbits,
info->mi_rlength, md->md_length - offset, mlength);
return LNET_MATCHMD_DROP;
}
/* Commit to this ME/MD */
CDEBUG(D_NET, "Incoming %s index %x from %s of length %d/%d into md %#llx [%d] + %d\n",
(info->mi_opc == LNET_MD_OP_PUT) ? "put" : "get",
info->mi_portal, libcfs_id2str(info->mi_id), mlength,
info->mi_rlength, md->md_lh.lh_cookie, md->md_niov, offset);
lnet_msg_attach_md(msg, md, offset, mlength);
md->md_offset = offset + mlength;
if (!lnet_md_exhausted(md))
return LNET_MATCHMD_OK;
/*
* Auto-unlink NOW, so the ME gets unlinked if required.
* We bumped md->md_refcount above so the MD just gets flagged
* for unlink when it is finalized.
*/
if (md->md_flags & LNET_MD_FLAG_AUTO_UNLINK)
lnet_md_unlink(md);
return LNET_MATCHMD_OK | LNET_MATCHMD_EXHAUSTED;
}
static struct lnet_match_table *
lnet_match2mt(struct lnet_portal *ptl, struct lnet_process_id id, __u64 mbits)
{
if (LNET_CPT_NUMBER == 1)
return ptl->ptl_mtables[0]; /* the only one */
/* if it's a unique portal, return match-table hashed by NID */
return lnet_ptl_is_unique(ptl) ?
ptl->ptl_mtables[lnet_cpt_of_nid(id.nid)] : NULL;
}
struct lnet_match_table *
lnet_mt_of_attach(unsigned int index, struct lnet_process_id id,
__u64 mbits, __u64 ignore_bits, enum lnet_ins_pos pos)
{
struct lnet_portal *ptl;
struct lnet_match_table *mtable;
/* NB: called w/o lock */
LASSERT(index < the_lnet.ln_nportals);
if (!lnet_ptl_match_type(index, id, mbits, ignore_bits))
return NULL;
ptl = the_lnet.ln_portals[index];
mtable = lnet_match2mt(ptl, id, mbits);
if (mtable) /* unique portal or only one match-table */
return mtable;
/* it's a wildcard portal */
switch (pos) {
default:
return NULL;
case LNET_INS_BEFORE:
case LNET_INS_AFTER:
/*
* posted by no affinity thread, always hash to specific
* match-table to avoid buffer stealing which is heavy
*/
return ptl->ptl_mtables[ptl->ptl_index % LNET_CPT_NUMBER];
case LNET_INS_LOCAL:
/* posted by cpu-affinity thread */
return ptl->ptl_mtables[lnet_cpt_current()];
}
}
static struct lnet_match_table *
lnet_mt_of_match(struct lnet_match_info *info, struct lnet_msg *msg)
{
struct lnet_match_table *mtable;
struct lnet_portal *ptl;
unsigned int nmaps;
unsigned int rotor;
unsigned int cpt;
bool routed;
/* NB: called w/o lock */
LASSERT(info->mi_portal < the_lnet.ln_nportals);
ptl = the_lnet.ln_portals[info->mi_portal];
LASSERT(lnet_ptl_is_wildcard(ptl) || lnet_ptl_is_unique(ptl));
mtable = lnet_match2mt(ptl, info->mi_id, info->mi_mbits);
if (mtable)
return mtable;
/* it's a wildcard portal */
routed = LNET_NIDNET(msg->msg_hdr.src_nid) !=
LNET_NIDNET(msg->msg_hdr.dest_nid);
if (portal_rotor == LNET_PTL_ROTOR_OFF ||
(portal_rotor != LNET_PTL_ROTOR_ON && !routed)) {
cpt = lnet_cpt_current();
if (ptl->ptl_mtables[cpt]->mt_enabled)
return ptl->ptl_mtables[cpt];
}
rotor = ptl->ptl_rotor++; /* get round-robin factor */
if (portal_rotor == LNET_PTL_ROTOR_HASH_RT && routed)
cpt = lnet_cpt_of_nid(msg->msg_hdr.src_nid);
else
cpt = rotor % LNET_CPT_NUMBER;
if (!ptl->ptl_mtables[cpt]->mt_enabled) {
/* is there any active entry for this portal? */
nmaps = ptl->ptl_mt_nmaps;
/* map to an active mtable to avoid heavy "stealing" */
if (nmaps) {
/*
* NB: there is possibility that ptl_mt_maps is being
* changed because we are not under protection of
* lnet_ptl_lock, but it shouldn't hurt anything
*/
cpt = ptl->ptl_mt_maps[rotor % nmaps];
}
}
return ptl->ptl_mtables[cpt];
}
static int
lnet_mt_test_exhausted(struct lnet_match_table *mtable, int pos)
{
__u64 *bmap;
int i;
if (!lnet_ptl_is_wildcard(the_lnet.ln_portals[mtable->mt_portal]))
return 0;
if (pos < 0) { /* check all bits */
for (i = 0; i < LNET_MT_EXHAUSTED_BMAP; i++) {
if (mtable->mt_exhausted[i] != (__u64)(-1))
return 0;
}
return 1;
}
LASSERT(pos <= LNET_MT_HASH_IGNORE);
/* mtable::mt_mhash[pos] is marked as exhausted or not */
bmap = &mtable->mt_exhausted[pos >> LNET_MT_BITS_U64];
pos &= (1 << LNET_MT_BITS_U64) - 1;
return (*bmap & BIT(pos));
}
static void
lnet_mt_set_exhausted(struct lnet_match_table *mtable, int pos, int exhausted)
{
__u64 *bmap;
LASSERT(lnet_ptl_is_wildcard(the_lnet.ln_portals[mtable->mt_portal]));
LASSERT(pos <= LNET_MT_HASH_IGNORE);
/* set mtable::mt_mhash[pos] as exhausted/non-exhausted */
bmap = &mtable->mt_exhausted[pos >> LNET_MT_BITS_U64];
pos &= (1 << LNET_MT_BITS_U64) - 1;
if (!exhausted)
*bmap &= ~(1ULL << pos);
else
*bmap |= 1ULL << pos;
}
struct list_head *
lnet_mt_match_head(struct lnet_match_table *mtable,
struct lnet_process_id id, __u64 mbits)
{
struct lnet_portal *ptl = the_lnet.ln_portals[mtable->mt_portal];
unsigned long hash = mbits;
if (!lnet_ptl_is_wildcard(ptl)) {
hash += id.nid + id.pid;
LASSERT(lnet_ptl_is_unique(ptl));
hash = hash_long(hash, LNET_MT_HASH_BITS);
}
return &mtable->mt_mhash[hash & LNET_MT_HASH_MASK];
}
int
lnet_mt_match_md(struct lnet_match_table *mtable,
struct lnet_match_info *info, struct lnet_msg *msg)
{
struct list_head *head;
struct lnet_me *me;
struct lnet_me *tmp;
int exhausted = 0;
int rc;
/* any ME with ignore bits? */
if (!list_empty(&mtable->mt_mhash[LNET_MT_HASH_IGNORE]))
head = &mtable->mt_mhash[LNET_MT_HASH_IGNORE];
else
head = lnet_mt_match_head(mtable, info->mi_id, info->mi_mbits);
again:
/* NB: only wildcard portal needs to return LNET_MATCHMD_EXHAUSTED */
if (lnet_ptl_is_wildcard(the_lnet.ln_portals[mtable->mt_portal]))
exhausted = LNET_MATCHMD_EXHAUSTED;
list_for_each_entry_safe(me, tmp, head, me_list) {
/* ME attached but MD not attached yet */
if (!me->me_md)
continue;
LASSERT(me == me->me_md->md_me);
rc = lnet_try_match_md(me->me_md, info, msg);
if (!(rc & LNET_MATCHMD_EXHAUSTED))
exhausted = 0; /* mlist is not empty */
if (rc & LNET_MATCHMD_FINISH) {
/*
* don't return EXHAUSTED bit because we don't know
* whether the mlist is empty or not
*/
return rc & ~LNET_MATCHMD_EXHAUSTED;
}
}
if (exhausted == LNET_MATCHMD_EXHAUSTED) { /* @head is exhausted */
lnet_mt_set_exhausted(mtable, head - mtable->mt_mhash, 1);
if (!lnet_mt_test_exhausted(mtable, -1))
exhausted = 0;
}
if (!exhausted && head == &mtable->mt_mhash[LNET_MT_HASH_IGNORE]) {
head = lnet_mt_match_head(mtable, info->mi_id, info->mi_mbits);
goto again; /* re-check MEs w/o ignore-bits */
}
if (info->mi_opc == LNET_MD_OP_GET ||
!lnet_ptl_is_lazy(the_lnet.ln_portals[info->mi_portal]))
return exhausted | LNET_MATCHMD_DROP;
return exhausted | LNET_MATCHMD_NONE;
}
static int
lnet_ptl_match_early(struct lnet_portal *ptl, struct lnet_msg *msg)
{
int rc;
/*
* message arrived before any buffer posting on this portal,
* simply delay or drop this message
*/
if (likely(lnet_ptl_is_wildcard(ptl) || lnet_ptl_is_unique(ptl)))
return 0;
lnet_ptl_lock(ptl);
/* check it again with hold of lock */
if (lnet_ptl_is_wildcard(ptl) || lnet_ptl_is_unique(ptl)) {
lnet_ptl_unlock(ptl);
return 0;
}
if (lnet_ptl_is_lazy(ptl)) {
if (msg->msg_rx_ready_delay) {
msg->msg_rx_delayed = 1;
list_add_tail(&msg->msg_list,
&ptl->ptl_msg_delayed);
}
rc = LNET_MATCHMD_NONE;
} else {
rc = LNET_MATCHMD_DROP;
}
lnet_ptl_unlock(ptl);
return rc;
}
static int
lnet_ptl_match_delay(struct lnet_portal *ptl,
struct lnet_match_info *info, struct lnet_msg *msg)
{
int first = ptl->ptl_mt_maps[0]; /* read w/o lock */
int rc = 0;
int i;
/**
* Steal buffer from other CPTs, and delay msg if nothing to
* steal. This function is more expensive than a regular
* match, but we don't expect it can happen a lot. The return
* code contains one of LNET_MATCHMD_OK, LNET_MATCHMD_DROP, or
* LNET_MATCHMD_NONE.
*/
LASSERT(lnet_ptl_is_wildcard(ptl));
for (i = 0; i < LNET_CPT_NUMBER; i++) {
struct lnet_match_table *mtable;
int cpt;
cpt = (first + i) % LNET_CPT_NUMBER;
mtable = ptl->ptl_mtables[cpt];
if (i && i != LNET_CPT_NUMBER - 1 && !mtable->mt_enabled)
continue;
lnet_res_lock(cpt);
lnet_ptl_lock(ptl);
if (!i) {
/* The first try, add to stealing list. */
list_add_tail(&msg->msg_list,
&ptl->ptl_msg_stealing);
}
if (!list_empty(&msg->msg_list)) {
/* On stealing list. */
rc = lnet_mt_match_md(mtable, info, msg);
if ((rc & LNET_MATCHMD_EXHAUSTED) &&
mtable->mt_enabled)
lnet_ptl_disable_mt(ptl, cpt);
if (rc & LNET_MATCHMD_FINISH) {
/* Match found, remove from stealing list. */
list_del_init(&msg->msg_list);
} else if (i == LNET_CPT_NUMBER - 1 || /* (1) */
!ptl->ptl_mt_nmaps || /* (2) */
(ptl->ptl_mt_nmaps == 1 && /* (3) */
ptl->ptl_mt_maps[0] == cpt)) {
/**
* No match found, and this is either
* (1) the last cpt to check, or
* (2) there is no active cpt, or
* (3) this is the only active cpt.
* There is nothing to steal: delay or
* drop the message.
*/
list_del_init(&msg->msg_list);
if (lnet_ptl_is_lazy(ptl)) {
msg->msg_rx_delayed = 1;
list_add_tail(&msg->msg_list,
&ptl->ptl_msg_delayed);
rc = LNET_MATCHMD_NONE;
} else {
rc = LNET_MATCHMD_DROP;
}
} else {
/* Do another iteration. */
rc = 0;
}
} else {
/**
* No longer on stealing list: another thread
* matched the message in lnet_ptl_attach_md().
* We are now expected to handle the message.
*/
rc = !msg->msg_md ?
LNET_MATCHMD_DROP : LNET_MATCHMD_OK;
}
lnet_ptl_unlock(ptl);
lnet_res_unlock(cpt);
/**
* Note that test (1) above ensures that we always
* exit the loop through this break statement.
*
* LNET_MATCHMD_NONE means msg was added to the
* delayed queue, and we may no longer reference it
* after lnet_ptl_unlock() and lnet_res_unlock().
*/
if (rc & (LNET_MATCHMD_FINISH | LNET_MATCHMD_NONE))
break;
}
return rc;
}
int
lnet_ptl_match_md(struct lnet_match_info *info, struct lnet_msg *msg)
{
struct lnet_match_table *mtable;
struct lnet_portal *ptl;
int rc;
CDEBUG(D_NET, "Request from %s of length %d into portal %d MB=%#llx\n",
libcfs_id2str(info->mi_id), info->mi_rlength, info->mi_portal,
info->mi_mbits);
if (info->mi_portal >= the_lnet.ln_nportals) {
CERROR("Invalid portal %d not in [0-%d]\n",
info->mi_portal, the_lnet.ln_nportals);
return LNET_MATCHMD_DROP;
}
ptl = the_lnet.ln_portals[info->mi_portal];
rc = lnet_ptl_match_early(ptl, msg);
if (rc) /* matched or delayed early message */
return rc;
mtable = lnet_mt_of_match(info, msg);
lnet_res_lock(mtable->mt_cpt);
if (the_lnet.ln_shutdown) {
rc = LNET_MATCHMD_DROP;
goto out1;
}
rc = lnet_mt_match_md(mtable, info, msg);
if ((rc & LNET_MATCHMD_EXHAUSTED) && mtable->mt_enabled) {
lnet_ptl_lock(ptl);
lnet_ptl_disable_mt(ptl, mtable->mt_cpt);
lnet_ptl_unlock(ptl);
}
if (rc & LNET_MATCHMD_FINISH) /* matched or dropping */
goto out1;
if (!msg->msg_rx_ready_delay)
goto out1;
LASSERT(lnet_ptl_is_lazy(ptl));
LASSERT(!msg->msg_rx_delayed);
/* NB: we don't expect "delay" can happen a lot */
if (lnet_ptl_is_unique(ptl) || LNET_CPT_NUMBER == 1) {
lnet_ptl_lock(ptl);
msg->msg_rx_delayed = 1;
list_add_tail(&msg->msg_list, &ptl->ptl_msg_delayed);
lnet_ptl_unlock(ptl);
lnet_res_unlock(mtable->mt_cpt);
rc = LNET_MATCHMD_NONE;
} else {
lnet_res_unlock(mtable->mt_cpt);
rc = lnet_ptl_match_delay(ptl, info, msg);
}
/* LNET_MATCHMD_NONE means msg was added to the delay queue */
if (rc & LNET_MATCHMD_NONE) {
CDEBUG(D_NET,
"Delaying %s from %s ptl %d MB %#llx off %d len %d\n",
info->mi_opc == LNET_MD_OP_PUT ? "PUT" : "GET",
libcfs_id2str(info->mi_id), info->mi_portal,
info->mi_mbits, info->mi_roffset, info->mi_rlength);
}
goto out0;
out1:
lnet_res_unlock(mtable->mt_cpt);
out0:
/* EXHAUSTED bit is only meaningful for internal functions */
return rc & ~LNET_MATCHMD_EXHAUSTED;
}
void
lnet_ptl_detach_md(struct lnet_me *me, struct lnet_libmd *md)
{
LASSERT(me->me_md == md && md->md_me == me);
me->me_md = NULL;
md->md_me = NULL;
}
/* called with lnet_res_lock held */
void
lnet_ptl_attach_md(struct lnet_me *me, struct lnet_libmd *md,
struct list_head *matches, struct list_head *drops)
{
struct lnet_portal *ptl = the_lnet.ln_portals[me->me_portal];
struct lnet_match_table *mtable;
struct list_head *head;
struct lnet_msg *tmp;
struct lnet_msg *msg;
int exhausted = 0;
int cpt;
LASSERT(!md->md_refcount); /* a brand new MD */
me->me_md = md;
md->md_me = me;
cpt = lnet_cpt_of_cookie(md->md_lh.lh_cookie);
mtable = ptl->ptl_mtables[cpt];
if (list_empty(&ptl->ptl_msg_stealing) &&
list_empty(&ptl->ptl_msg_delayed) &&
!lnet_mt_test_exhausted(mtable, me->me_pos))
return;
lnet_ptl_lock(ptl);
head = &ptl->ptl_msg_stealing;
again:
list_for_each_entry_safe(msg, tmp, head, msg_list) {
struct lnet_match_info info;
struct lnet_hdr *hdr;
int rc;
LASSERT(msg->msg_rx_delayed || head == &ptl->ptl_msg_stealing);
hdr = &msg->msg_hdr;
info.mi_id.nid = hdr->src_nid;
info.mi_id.pid = hdr->src_pid;
info.mi_opc = LNET_MD_OP_PUT;
info.mi_portal = hdr->msg.put.ptl_index;
info.mi_rlength = hdr->payload_length;
info.mi_roffset = hdr->msg.put.offset;
info.mi_mbits = hdr->msg.put.match_bits;
rc = lnet_try_match_md(md, &info, msg);
exhausted = (rc & LNET_MATCHMD_EXHAUSTED);
if (rc & LNET_MATCHMD_NONE) {
if (exhausted)
break;
continue;
}
/* Hurrah! This _is_ a match */
LASSERT(rc & LNET_MATCHMD_FINISH);
list_del_init(&msg->msg_list);
if (head == &ptl->ptl_msg_stealing) {
if (exhausted)
break;
/* stealing thread will handle the message */
continue;
}
if (rc & LNET_MATCHMD_OK) {
list_add_tail(&msg->msg_list, matches);
CDEBUG(D_NET, "Resuming delayed PUT from %s portal %d match %llu offset %d length %d.\n",
libcfs_id2str(info.mi_id),
info.mi_portal, info.mi_mbits,
info.mi_roffset, info.mi_rlength);
} else {
list_add_tail(&msg->msg_list, drops);
}
if (exhausted)
break;
}
if (!exhausted && head == &ptl->ptl_msg_stealing) {
head = &ptl->ptl_msg_delayed;
goto again;
}
if (lnet_ptl_is_wildcard(ptl) && !exhausted) {
lnet_mt_set_exhausted(mtable, me->me_pos, 0);
if (!mtable->mt_enabled)
lnet_ptl_enable_mt(ptl, cpt);
}
lnet_ptl_unlock(ptl);
}
static void
lnet_ptl_cleanup(struct lnet_portal *ptl)
{
struct lnet_match_table *mtable;
int i;
if (!ptl->ptl_mtables) /* uninitialized portal */
return;
LASSERT(list_empty(&ptl->ptl_msg_delayed));
LASSERT(list_empty(&ptl->ptl_msg_stealing));
cfs_percpt_for_each(mtable, i, ptl->ptl_mtables) {
struct list_head *mhash;
struct lnet_me *me;
int j;
if (!mtable->mt_mhash) /* uninitialized match-table */
continue;
mhash = mtable->mt_mhash;
/* cleanup ME */
for (j = 0; j < LNET_MT_HASH_SIZE + 1; j++) {
while (!list_empty(&mhash[j])) {
me = list_entry(mhash[j].next,
struct lnet_me, me_list);
CERROR("Active ME %p on exit\n", me);
list_del(&me->me_list);
kfree(me);
}
}
/* the extra entry is for MEs with ignore bits */
kvfree(mhash);
}
cfs_percpt_free(ptl->ptl_mtables);
ptl->ptl_mtables = NULL;
}
static int
lnet_ptl_setup(struct lnet_portal *ptl, int index)
{
struct lnet_match_table *mtable;
struct list_head *mhash;
int i;
int j;
ptl->ptl_mtables = cfs_percpt_alloc(lnet_cpt_table(),
sizeof(struct lnet_match_table));
if (!ptl->ptl_mtables) {
CERROR("Failed to create match table for portal %d\n", index);
return -ENOMEM;
}
ptl->ptl_index = index;
INIT_LIST_HEAD(&ptl->ptl_msg_delayed);
INIT_LIST_HEAD(&ptl->ptl_msg_stealing);
spin_lock_init(&ptl->ptl_lock);
cfs_percpt_for_each(mtable, i, ptl->ptl_mtables) {
/* the extra entry is for MEs with ignore bits */
mhash = kvzalloc_cpt(sizeof(*mhash) * (LNET_MT_HASH_SIZE + 1),
GFP_KERNEL, i);
if (!mhash) {
CERROR("Failed to create match hash for portal %d\n",
index);
goto failed;
}
memset(&mtable->mt_exhausted[0], -1,
sizeof(mtable->mt_exhausted[0]) *
LNET_MT_EXHAUSTED_BMAP);
mtable->mt_mhash = mhash;
for (j = 0; j < LNET_MT_HASH_SIZE + 1; j++)
INIT_LIST_HEAD(&mhash[j]);
mtable->mt_portal = index;
mtable->mt_cpt = i;
}
return 0;
failed:
lnet_ptl_cleanup(ptl);
return -ENOMEM;
}
void
lnet_portals_destroy(void)
{
int i;
if (!the_lnet.ln_portals)
return;
for (i = 0; i < the_lnet.ln_nportals; i++)
lnet_ptl_cleanup(the_lnet.ln_portals[i]);
cfs_array_free(the_lnet.ln_portals);
the_lnet.ln_portals = NULL;
the_lnet.ln_nportals = 0;
}
int
lnet_portals_create(void)
{
int size;
int i;
size = offsetof(struct lnet_portal, ptl_mt_maps[LNET_CPT_NUMBER]);
the_lnet.ln_portals = cfs_array_alloc(MAX_PORTALS, size);
if (!the_lnet.ln_portals) {
CERROR("Failed to allocate portals table\n");
return -ENOMEM;
}
the_lnet.ln_nportals = MAX_PORTALS;
for (i = 0; i < the_lnet.ln_nportals; i++) {
if (lnet_ptl_setup(the_lnet.ln_portals[i], i)) {
lnet_portals_destroy();
return -ENOMEM;
}
}
return 0;
}
/**
* Turn on the lazy portal attribute. Use with caution!
*
* This portal attribute only affects incoming PUT requests to the portal,
* and is off by default. By default, if there's no matching MD for an
* incoming PUT request, it is simply dropped. With the lazy attribute on,
* such requests are queued indefinitely until either a matching MD is
* posted to the portal or the lazy attribute is turned off.
*
* It would prevent dropped requests, however it should be regarded as the
* last line of defense - i.e. users must keep a close watch on active
* buffers on a lazy portal and once it becomes too low post more buffers as
* soon as possible. This is because delayed requests usually have detrimental
* effects on underlying network connections. A few delayed requests often
* suffice to bring an underlying connection to a complete halt, due to flow
* control mechanisms.
*
* There's also a DOS attack risk. If users don't post match-all MDs on a
* lazy portal, a malicious peer can easily stop a service by sending some
* PUT requests with match bits that won't match any MD. A routed server is
* especially vulnerable since the connections to its neighbor routers are
* shared among all clients.
*
* \param portal Index of the portal to enable the lazy attribute on.
*
* \retval 0 On success.
* \retval -EINVAL If \a portal is not a valid index.
*/
int
LNetSetLazyPortal(int portal)
{
struct lnet_portal *ptl;
if (portal < 0 || portal >= the_lnet.ln_nportals)
return -EINVAL;
CDEBUG(D_NET, "Setting portal %d lazy\n", portal);
ptl = the_lnet.ln_portals[portal];
lnet_res_lock(LNET_LOCK_EX);
lnet_ptl_lock(ptl);
lnet_ptl_setopt(ptl, LNET_PTL_LAZY);
lnet_ptl_unlock(ptl);
lnet_res_unlock(LNET_LOCK_EX);
return 0;
}
EXPORT_SYMBOL(LNetSetLazyPortal);
int
lnet_clear_lazy_portal(struct lnet_ni *ni, int portal, char *reason)
{
struct lnet_portal *ptl;
LIST_HEAD(zombies);
if (portal < 0 || portal >= the_lnet.ln_nportals)
return -EINVAL;
ptl = the_lnet.ln_portals[portal];
lnet_res_lock(LNET_LOCK_EX);
lnet_ptl_lock(ptl);
if (!lnet_ptl_is_lazy(ptl)) {
lnet_ptl_unlock(ptl);
lnet_res_unlock(LNET_LOCK_EX);
return 0;
}
if (ni) {
struct lnet_msg *msg, *tmp;
/* grab all messages which are on the NI passed in */
list_for_each_entry_safe(msg, tmp, &ptl->ptl_msg_delayed,
msg_list) {
if (msg->msg_rxpeer->lp_ni == ni)
list_move(&msg->msg_list, &zombies);
}
} else {
if (the_lnet.ln_shutdown)
CWARN("Active lazy portal %d on exit\n", portal);
else
CDEBUG(D_NET, "clearing portal %d lazy\n", portal);
/* grab all the blocked messages atomically */
list_splice_init(&ptl->ptl_msg_delayed, &zombies);
lnet_ptl_unsetopt(ptl, LNET_PTL_LAZY);
}
lnet_ptl_unlock(ptl);
lnet_res_unlock(LNET_LOCK_EX);
lnet_drop_delayed_msg_list(&zombies, reason);
return 0;
}
/**
* Turn off the lazy portal attribute. Delayed requests on the portal,
* if any, will be all dropped when this function returns.
*
* \param portal Index of the portal to disable the lazy attribute on.
*
* \retval 0 On success.
* \retval -EINVAL If \a portal is not a valid index.
*/
int
LNetClearLazyPortal(int portal)
{
return lnet_clear_lazy_portal(NULL, portal,
"Clearing lazy portal attr");
}
EXPORT_SYMBOL(LNetClearLazyPortal);

View File

@ -1,585 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Seagate, Inc.
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/if.h>
#include <linux/in.h>
#include <linux/net.h>
#include <linux/file.h>
#include <linux/pagemap.h>
/* For sys_open & sys_close */
#include <linux/syscalls.h>
#include <net/sock.h>
#include <linux/lnet/lib-lnet.h>
static int
kernel_sock_unlocked_ioctl(struct file *filp, int cmd, unsigned long arg)
{
mm_segment_t oldfs = get_fs();
int err;
set_fs(KERNEL_DS);
err = filp->f_op->unlocked_ioctl(filp, cmd, arg);
set_fs(oldfs);
return err;
}
static int
lnet_sock_ioctl(int cmd, unsigned long arg)
{
struct file *sock_filp;
struct socket *sock;
int rc;
rc = sock_create(PF_INET, SOCK_STREAM, 0, &sock);
if (rc) {
CERROR("Can't create socket: %d\n", rc);
return rc;
}
sock_filp = sock_alloc_file(sock, 0, NULL);
if (IS_ERR(sock_filp))
return PTR_ERR(sock_filp);
rc = kernel_sock_unlocked_ioctl(sock_filp, cmd, arg);
fput(sock_filp);
return rc;
}
int
lnet_ipif_query(char *name, int *up, __u32 *ip, __u32 *mask)
{
struct ifreq ifr;
int nob;
int rc;
__be32 val;
nob = strnlen(name, IFNAMSIZ);
if (nob == IFNAMSIZ) {
CERROR("Interface name %s too long\n", name);
return -EINVAL;
}
BUILD_BUG_ON(sizeof(ifr.ifr_name) < IFNAMSIZ);
if (strlen(name) > sizeof(ifr.ifr_name) - 1)
return -E2BIG;
strncpy(ifr.ifr_name, name, sizeof(ifr.ifr_name));
rc = lnet_sock_ioctl(SIOCGIFFLAGS, (unsigned long)&ifr);
if (rc) {
CERROR("Can't get flags for interface %s\n", name);
return rc;
}
if (!(ifr.ifr_flags & IFF_UP)) {
CDEBUG(D_NET, "Interface %s down\n", name);
*up = 0;
*ip = *mask = 0;
return 0;
}
*up = 1;
if (strlen(name) > sizeof(ifr.ifr_name) - 1)
return -E2BIG;
strncpy(ifr.ifr_name, name, sizeof(ifr.ifr_name));
ifr.ifr_addr.sa_family = AF_INET;
rc = lnet_sock_ioctl(SIOCGIFADDR, (unsigned long)&ifr);
if (rc) {
CERROR("Can't get IP address for interface %s\n", name);
return rc;
}
val = ((struct sockaddr_in *)&ifr.ifr_addr)->sin_addr.s_addr;
*ip = ntohl(val);
if (strlen(name) > sizeof(ifr.ifr_name) - 1)
return -E2BIG;
strncpy(ifr.ifr_name, name, sizeof(ifr.ifr_name));
ifr.ifr_addr.sa_family = AF_INET;
rc = lnet_sock_ioctl(SIOCGIFNETMASK, (unsigned long)&ifr);
if (rc) {
CERROR("Can't get netmask for interface %s\n", name);
return rc;
}
val = ((struct sockaddr_in *)&ifr.ifr_netmask)->sin_addr.s_addr;
*mask = ntohl(val);
return 0;
}
EXPORT_SYMBOL(lnet_ipif_query);
int
lnet_ipif_enumerate(char ***namesp)
{
/* Allocate and fill in 'names', returning # interfaces/error */
char **names;
int toobig;
int nalloc;
int nfound;
struct ifreq *ifr;
struct ifconf ifc;
int rc;
int nob;
int i;
nalloc = 16; /* first guess at max interfaces */
toobig = 0;
for (;;) {
if (nalloc * sizeof(*ifr) > PAGE_SIZE) {
toobig = 1;
nalloc = PAGE_SIZE / sizeof(*ifr);
CWARN("Too many interfaces: only enumerating first %d\n",
nalloc);
}
ifr = kzalloc(nalloc * sizeof(*ifr), GFP_KERNEL);
if (!ifr) {
CERROR("ENOMEM enumerating up to %d interfaces\n",
nalloc);
rc = -ENOMEM;
goto out0;
}
ifc.ifc_buf = (char *)ifr;
ifc.ifc_len = nalloc * sizeof(*ifr);
rc = lnet_sock_ioctl(SIOCGIFCONF, (unsigned long)&ifc);
if (rc < 0) {
CERROR("Error %d enumerating interfaces\n", rc);
goto out1;
}
LASSERT(!rc);
nfound = ifc.ifc_len / sizeof(*ifr);
LASSERT(nfound <= nalloc);
if (nfound < nalloc || toobig)
break;
kfree(ifr);
nalloc *= 2;
}
if (!nfound)
goto out1;
names = kzalloc(nfound * sizeof(*names), GFP_KERNEL);
if (!names) {
rc = -ENOMEM;
goto out1;
}
for (i = 0; i < nfound; i++) {
nob = strnlen(ifr[i].ifr_name, IFNAMSIZ);
if (nob == IFNAMSIZ) {
/* no space for terminating NULL */
CERROR("interface name %.*s too long (%d max)\n",
nob, ifr[i].ifr_name, IFNAMSIZ);
rc = -ENAMETOOLONG;
goto out2;
}
names[i] = kmalloc(IFNAMSIZ, GFP_KERNEL);
if (!names[i]) {
rc = -ENOMEM;
goto out2;
}
memcpy(names[i], ifr[i].ifr_name, nob);
names[i][nob] = 0;
}
*namesp = names;
rc = nfound;
out2:
if (rc < 0)
lnet_ipif_free_enumeration(names, nfound);
out1:
kfree(ifr);
out0:
return rc;
}
EXPORT_SYMBOL(lnet_ipif_enumerate);
void
lnet_ipif_free_enumeration(char **names, int n)
{
int i;
LASSERT(n > 0);
for (i = 0; i < n && names[i]; i++)
kfree(names[i]);
kfree(names);
}
EXPORT_SYMBOL(lnet_ipif_free_enumeration);
int
lnet_sock_write(struct socket *sock, void *buffer, int nob, int timeout)
{
int rc;
long jiffies_left = timeout * msecs_to_jiffies(MSEC_PER_SEC);
unsigned long then;
struct timeval tv;
struct kvec iov = { .iov_base = buffer, .iov_len = nob };
struct msghdr msg = {NULL,};
LASSERT(nob > 0);
/*
* Caller may pass a zero timeout if she thinks the socket buffer is
* empty enough to take the whole message immediately
*/
iov_iter_kvec(&msg.msg_iter, WRITE | ITER_KVEC, &iov, 1, nob);
for (;;) {
msg.msg_flags = !timeout ? MSG_DONTWAIT : 0;
if (timeout) {
/* Set send timeout to remaining time */
jiffies_to_timeval(jiffies_left, &tv);
rc = kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO,
(char *)&tv, sizeof(tv));
if (rc) {
CERROR("Can't set socket send timeout %ld.%06d: %d\n",
(long)tv.tv_sec, (int)tv.tv_usec, rc);
return rc;
}
}
then = jiffies;
rc = kernel_sendmsg(sock, &msg, &iov, 1, nob);
jiffies_left -= jiffies - then;
if (rc < 0)
return rc;
if (!rc) {
CERROR("Unexpected zero rc\n");
return -ECONNABORTED;
}
if (!msg_data_left(&msg))
break;
if (jiffies_left <= 0)
return -EAGAIN;
}
return 0;
}
EXPORT_SYMBOL(lnet_sock_write);
int
lnet_sock_read(struct socket *sock, void *buffer, int nob, int timeout)
{
int rc;
long jiffies_left = timeout * msecs_to_jiffies(MSEC_PER_SEC);
unsigned long then;
struct timeval tv;
struct kvec iov = {
.iov_base = buffer,
.iov_len = nob
};
struct msghdr msg = {
.msg_flags = 0
};
LASSERT(nob > 0);
LASSERT(jiffies_left > 0);
iov_iter_kvec(&msg.msg_iter, READ | ITER_KVEC, &iov, 1, nob);
for (;;) {
/* Set receive timeout to remaining time */
jiffies_to_timeval(jiffies_left, &tv);
rc = kernel_setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO,
(char *)&tv, sizeof(tv));
if (rc) {
CERROR("Can't set socket recv timeout %ld.%06d: %d\n",
(long)tv.tv_sec, (int)tv.tv_usec, rc);
return rc;
}
then = jiffies;
rc = sock_recvmsg(sock, &msg, 0);
jiffies_left -= jiffies - then;
if (rc < 0)
return rc;
if (!rc)
return -ECONNRESET;
if (!msg_data_left(&msg))
return 0;
if (jiffies_left <= 0)
return -ETIMEDOUT;
}
}
EXPORT_SYMBOL(lnet_sock_read);
static int
lnet_sock_create(struct socket **sockp, int *fatal, __u32 local_ip,
int local_port)
{
struct sockaddr_in locaddr;
struct socket *sock;
int rc;
int option;
/* All errors are fatal except bind failure if the port is in use */
*fatal = 1;
rc = sock_create(PF_INET, SOCK_STREAM, 0, &sock);
*sockp = sock;
if (rc) {
CERROR("Can't create socket: %d\n", rc);
return rc;
}
option = 1;
rc = kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEADDR,
(char *)&option, sizeof(option));
if (rc) {
CERROR("Can't set SO_REUSEADDR for socket: %d\n", rc);
goto failed;
}
if (local_ip || local_port) {
memset(&locaddr, 0, sizeof(locaddr));
locaddr.sin_family = AF_INET;
locaddr.sin_port = htons(local_port);
if (!local_ip)
locaddr.sin_addr.s_addr = htonl(INADDR_ANY);
else
locaddr.sin_addr.s_addr = htonl(local_ip);
rc = kernel_bind(sock, (struct sockaddr *)&locaddr,
sizeof(locaddr));
if (rc == -EADDRINUSE) {
CDEBUG(D_NET, "Port %d already in use\n", local_port);
*fatal = 0;
goto failed;
}
if (rc) {
CERROR("Error trying to bind to port %d: %d\n",
local_port, rc);
goto failed;
}
}
return 0;
failed:
sock_release(sock);
return rc;
}
int
lnet_sock_setbuf(struct socket *sock, int txbufsize, int rxbufsize)
{
int option;
int rc;
if (txbufsize) {
option = txbufsize;
rc = kernel_setsockopt(sock, SOL_SOCKET, SO_SNDBUF,
(char *)&option, sizeof(option));
if (rc) {
CERROR("Can't set send buffer %d: %d\n",
option, rc);
return rc;
}
}
if (rxbufsize) {
option = rxbufsize;
rc = kernel_setsockopt(sock, SOL_SOCKET, SO_RCVBUF,
(char *)&option, sizeof(option));
if (rc) {
CERROR("Can't set receive buffer %d: %d\n",
option, rc);
return rc;
}
}
return 0;
}
EXPORT_SYMBOL(lnet_sock_setbuf);
int
lnet_sock_getaddr(struct socket *sock, bool remote, __u32 *ip, int *port)
{
struct sockaddr_in sin;
int rc;
if (remote)
rc = kernel_getpeername(sock, (struct sockaddr *)&sin);
else
rc = kernel_getsockname(sock, (struct sockaddr *)&sin);
if (rc < 0) {
CERROR("Error %d getting sock %s IP/port\n",
rc, remote ? "peer" : "local");
return rc;
}
if (ip)
*ip = ntohl(sin.sin_addr.s_addr);
if (port)
*port = ntohs(sin.sin_port);
return 0;
}
EXPORT_SYMBOL(lnet_sock_getaddr);
int
lnet_sock_getbuf(struct socket *sock, int *txbufsize, int *rxbufsize)
{
if (txbufsize)
*txbufsize = sock->sk->sk_sndbuf;
if (rxbufsize)
*rxbufsize = sock->sk->sk_rcvbuf;
return 0;
}
EXPORT_SYMBOL(lnet_sock_getbuf);
int
lnet_sock_listen(struct socket **sockp, __u32 local_ip, int local_port,
int backlog)
{
int fatal;
int rc;
rc = lnet_sock_create(sockp, &fatal, local_ip, local_port);
if (rc) {
if (!fatal)
CERROR("Can't create socket: port %d already in use\n",
local_port);
return rc;
}
rc = kernel_listen(*sockp, backlog);
if (!rc)
return 0;
CERROR("Can't set listen backlog %d: %d\n", backlog, rc);
sock_release(*sockp);
return rc;
}
int
lnet_sock_accept(struct socket **newsockp, struct socket *sock)
{
wait_queue_entry_t wait;
struct socket *newsock;
int rc;
/*
* XXX this should add a ref to sock->ops->owner, if
* TCP could be a module
*/
rc = sock_create_lite(PF_PACKET, sock->type, IPPROTO_TCP, &newsock);
if (rc) {
CERROR("Can't allocate socket\n");
return rc;
}
newsock->ops = sock->ops;
rc = sock->ops->accept(sock, newsock, O_NONBLOCK, false);
if (rc == -EAGAIN) {
/* Nothing ready, so wait for activity */
init_waitqueue_entry(&wait, current);
add_wait_queue(sk_sleep(sock->sk), &wait);
set_current_state(TASK_INTERRUPTIBLE);
schedule();
remove_wait_queue(sk_sleep(sock->sk), &wait);
rc = sock->ops->accept(sock, newsock, O_NONBLOCK, false);
}
if (rc)
goto failed;
*newsockp = newsock;
return 0;
failed:
sock_release(newsock);
return rc;
}
int
lnet_sock_connect(struct socket **sockp, int *fatal, __u32 local_ip,
int local_port, __u32 peer_ip, int peer_port)
{
struct sockaddr_in srvaddr;
int rc;
rc = lnet_sock_create(sockp, fatal, local_ip, local_port);
if (rc)
return rc;
memset(&srvaddr, 0, sizeof(srvaddr));
srvaddr.sin_family = AF_INET;
srvaddr.sin_port = htons(peer_port);
srvaddr.sin_addr.s_addr = htonl(peer_ip);
rc = kernel_connect(*sockp, (struct sockaddr *)&srvaddr,
sizeof(srvaddr), 0);
if (!rc)
return 0;
/*
* EADDRNOTAVAIL probably means we're already connected to the same
* peer/port on the same local port on a differently typed
* connection. Let our caller retry with a different local
* port...
*/
*fatal = !(rc == -EADDRNOTAVAIL);
CDEBUG_LIMIT(*fatal ? D_NETERROR : D_NET,
"Error %d connecting %pI4h/%d -> %pI4h/%d\n", rc,
&local_ip, local_port, &peer_ip, peer_port);
sock_release(*sockp);
return rc;
}

View File

@ -1,105 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/lnet/lib-lnet.h>
static int
lolnd_send(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg)
{
LASSERT(!lntmsg->msg_routing);
LASSERT(!lntmsg->msg_target_is_router);
return lnet_parse(ni, &lntmsg->msg_hdr, ni->ni_nid, lntmsg, 0);
}
static int
lolnd_recv(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg,
int delayed, struct iov_iter *to, unsigned int rlen)
{
struct lnet_msg *sendmsg = private;
if (lntmsg) { /* not discarding */
if (sendmsg->msg_iov)
lnet_copy_iov2iter(to,
sendmsg->msg_niov,
sendmsg->msg_iov,
sendmsg->msg_offset,
iov_iter_count(to));
else
lnet_copy_kiov2iter(to,
sendmsg->msg_niov,
sendmsg->msg_kiov,
sendmsg->msg_offset,
iov_iter_count(to));
lnet_finalize(ni, lntmsg, 0);
}
lnet_finalize(ni, sendmsg, 0);
return 0;
}
static int lolnd_instanced;
static void
lolnd_shutdown(struct lnet_ni *ni)
{
CDEBUG(D_NET, "shutdown\n");
LASSERT(lolnd_instanced);
lolnd_instanced = 0;
}
static int
lolnd_startup(struct lnet_ni *ni)
{
LASSERT(ni->ni_lnd == &the_lolnd);
LASSERT(!lolnd_instanced);
lolnd_instanced = 1;
return 0;
}
struct lnet_lnd the_lolnd = {
/* .lnd_list = */ {&the_lolnd.lnd_list, &the_lolnd.lnd_list},
/* .lnd_refcount = */ 0,
/* .lnd_type = */ LOLND,
/* .lnd_startup = */ lolnd_startup,
/* .lnd_shutdown = */ lolnd_shutdown,
/* .lnt_ctl = */ NULL,
/* .lnd_send = */ lolnd_send,
/* .lnd_recv = */ lolnd_recv,
/* .lnd_eager_recv = */ NULL,
/* .lnd_notify = */ NULL,
/* .lnd_accept = */ NULL
};

View File

@ -1,239 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/lnet/lib-lnet.h>
#include <uapi/linux/lnet/lnet-dlc.h>
static int config_on_load;
module_param(config_on_load, int, 0444);
MODULE_PARM_DESC(config_on_load, "configure network at module load");
static struct mutex lnet_config_mutex;
static int
lnet_configure(void *arg)
{
/* 'arg' only there so I can be passed to cfs_create_thread() */
int rc = 0;
mutex_lock(&lnet_config_mutex);
if (!the_lnet.ln_niinit_self) {
rc = try_module_get(THIS_MODULE);
if (rc != 1)
goto out;
rc = LNetNIInit(LNET_PID_LUSTRE);
if (rc >= 0) {
the_lnet.ln_niinit_self = 1;
rc = 0;
} else {
module_put(THIS_MODULE);
}
}
out:
mutex_unlock(&lnet_config_mutex);
return rc;
}
static int
lnet_unconfigure(void)
{
int refcount;
mutex_lock(&lnet_config_mutex);
if (the_lnet.ln_niinit_self) {
the_lnet.ln_niinit_self = 0;
LNetNIFini();
module_put(THIS_MODULE);
}
mutex_lock(&the_lnet.ln_api_mutex);
refcount = the_lnet.ln_refcount;
mutex_unlock(&the_lnet.ln_api_mutex);
mutex_unlock(&lnet_config_mutex);
return !refcount ? 0 : -EBUSY;
}
static int
lnet_dyn_configure(struct libcfs_ioctl_hdr *hdr)
{
struct lnet_ioctl_config_data *conf =
(struct lnet_ioctl_config_data *)hdr;
int rc;
if (conf->cfg_hdr.ioc_len < sizeof(*conf))
return -EINVAL;
mutex_lock(&lnet_config_mutex);
if (!the_lnet.ln_niinit_self) {
rc = -EINVAL;
goto out_unlock;
}
rc = lnet_dyn_add_ni(LNET_PID_LUSTRE, conf);
out_unlock:
mutex_unlock(&lnet_config_mutex);
return rc;
}
static int
lnet_dyn_unconfigure(struct libcfs_ioctl_hdr *hdr)
{
struct lnet_ioctl_config_data *conf =
(struct lnet_ioctl_config_data *)hdr;
int rc;
if (conf->cfg_hdr.ioc_len < sizeof(*conf))
return -EINVAL;
mutex_lock(&lnet_config_mutex);
if (!the_lnet.ln_niinit_self) {
rc = -EINVAL;
goto out_unlock;
}
rc = lnet_dyn_del_ni(conf->cfg_net);
out_unlock:
mutex_unlock(&lnet_config_mutex);
return rc;
}
static int
lnet_ioctl(struct notifier_block *nb,
unsigned long cmd, void *vdata)
{
int rc;
struct libcfs_ioctl_hdr *hdr = vdata;
switch (cmd) {
case IOC_LIBCFS_CONFIGURE: {
struct libcfs_ioctl_data *data =
(struct libcfs_ioctl_data *)hdr;
if (data->ioc_hdr.ioc_len < sizeof(*data)) {
rc = -EINVAL;
} else {
the_lnet.ln_nis_from_mod_params = data->ioc_flags;
rc = lnet_configure(NULL);
}
break;
}
case IOC_LIBCFS_UNCONFIGURE:
rc = lnet_unconfigure();
break;
case IOC_LIBCFS_ADD_NET:
rc = lnet_dyn_configure(hdr);
break;
case IOC_LIBCFS_DEL_NET:
rc = lnet_dyn_unconfigure(hdr);
break;
default:
/*
* Passing LNET_PID_ANY only gives me a ref if the net is up
* already; I'll need it to ensure the net can't go down while
* I'm called into it
*/
rc = LNetNIInit(LNET_PID_ANY);
if (rc >= 0) {
rc = LNetCtl(cmd, hdr);
LNetNIFini();
}
break;
}
return notifier_from_ioctl_errno(rc);
}
static struct notifier_block lnet_ioctl_handler = {
.notifier_call = lnet_ioctl,
};
static int __init lnet_init(void)
{
int rc;
mutex_init(&lnet_config_mutex);
rc = libcfs_setup();
if (rc)
return rc;
rc = lnet_lib_init();
if (rc) {
CERROR("lnet_lib_init: error %d\n", rc);
return rc;
}
rc = blocking_notifier_chain_register(&libcfs_ioctl_list,
&lnet_ioctl_handler);
LASSERT(!rc);
if (config_on_load) {
/*
* Have to schedule a separate thread to avoid deadlocking
* in modload
*/
(void)kthread_run(lnet_configure, NULL, "lnet_initd");
}
return 0;
}
static void __exit lnet_exit(void)
{
int rc;
rc = blocking_notifier_chain_unregister(&libcfs_ioctl_list,
&lnet_ioctl_handler);
LASSERT(!rc);
lnet_lib_exit();
}
MODULE_AUTHOR("OpenSFS, Inc. <http://www.lustre.org/>");
MODULE_DESCRIPTION("Lustre Networking layer");
MODULE_VERSION(LNET_VERSION);
MODULE_LICENSE("GPL");
module_init(lnet_init);
module_exit(lnet_exit);

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,456 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/lnet/peer.c
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/lnet/lib-lnet.h>
#include <uapi/linux/lnet/lnet-dlc.h>
int
lnet_peer_tables_create(void)
{
struct lnet_peer_table *ptable;
struct list_head *hash;
int i;
int j;
the_lnet.ln_peer_tables = cfs_percpt_alloc(lnet_cpt_table(),
sizeof(*ptable));
if (!the_lnet.ln_peer_tables) {
CERROR("Failed to allocate cpu-partition peer tables\n");
return -ENOMEM;
}
cfs_percpt_for_each(ptable, i, the_lnet.ln_peer_tables) {
INIT_LIST_HEAD(&ptable->pt_deathrow);
hash = kvmalloc_cpt(LNET_PEER_HASH_SIZE * sizeof(*hash),
GFP_KERNEL, i);
if (!hash) {
CERROR("Failed to create peer hash table\n");
lnet_peer_tables_destroy();
return -ENOMEM;
}
for (j = 0; j < LNET_PEER_HASH_SIZE; j++)
INIT_LIST_HEAD(&hash[j]);
ptable->pt_hash = hash; /* sign of initialization */
}
return 0;
}
void
lnet_peer_tables_destroy(void)
{
struct lnet_peer_table *ptable;
struct list_head *hash;
int i;
int j;
if (!the_lnet.ln_peer_tables)
return;
cfs_percpt_for_each(ptable, i, the_lnet.ln_peer_tables) {
hash = ptable->pt_hash;
if (!hash) /* not initialized */
break;
LASSERT(list_empty(&ptable->pt_deathrow));
ptable->pt_hash = NULL;
for (j = 0; j < LNET_PEER_HASH_SIZE; j++)
LASSERT(list_empty(&hash[j]));
kvfree(hash);
}
cfs_percpt_free(the_lnet.ln_peer_tables);
the_lnet.ln_peer_tables = NULL;
}
static void
lnet_peer_table_cleanup_locked(struct lnet_ni *ni,
struct lnet_peer_table *ptable)
{
int i;
struct lnet_peer *lp;
struct lnet_peer *tmp;
for (i = 0; i < LNET_PEER_HASH_SIZE; i++) {
list_for_each_entry_safe(lp, tmp, &ptable->pt_hash[i],
lp_hashlist) {
if (ni && ni != lp->lp_ni)
continue;
list_del_init(&lp->lp_hashlist);
/* Lose hash table's ref */
ptable->pt_zombies++;
lnet_peer_decref_locked(lp);
}
}
}
static void
lnet_peer_table_deathrow_wait_locked(struct lnet_peer_table *ptable,
int cpt_locked)
{
int i;
for (i = 3; ptable->pt_zombies; i++) {
lnet_net_unlock(cpt_locked);
if (is_power_of_2(i)) {
CDEBUG(D_WARNING,
"Waiting for %d zombies on peer table\n",
ptable->pt_zombies);
}
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(HZ >> 1);
lnet_net_lock(cpt_locked);
}
}
static void
lnet_peer_table_del_rtrs_locked(struct lnet_ni *ni,
struct lnet_peer_table *ptable,
int cpt_locked)
{
struct lnet_peer *lp;
struct lnet_peer *tmp;
lnet_nid_t lp_nid;
int i;
for (i = 0; i < LNET_PEER_HASH_SIZE; i++) {
list_for_each_entry_safe(lp, tmp, &ptable->pt_hash[i],
lp_hashlist) {
if (ni != lp->lp_ni)
continue;
if (!lp->lp_rtr_refcount)
continue;
lp_nid = lp->lp_nid;
lnet_net_unlock(cpt_locked);
lnet_del_route(LNET_NIDNET(LNET_NID_ANY), lp_nid);
lnet_net_lock(cpt_locked);
}
}
}
void
lnet_peer_tables_cleanup(struct lnet_ni *ni)
{
struct lnet_peer_table *ptable;
struct list_head deathrow;
struct lnet_peer *lp;
struct lnet_peer *temp;
int i;
INIT_LIST_HEAD(&deathrow);
LASSERT(the_lnet.ln_shutdown || ni);
/*
* If just deleting the peers for a NI, get rid of any routes these
* peers are gateways for.
*/
cfs_percpt_for_each(ptable, i, the_lnet.ln_peer_tables) {
lnet_net_lock(i);
lnet_peer_table_del_rtrs_locked(ni, ptable, i);
lnet_net_unlock(i);
}
/*
* Start the process of moving the applicable peers to
* deathrow.
*/
cfs_percpt_for_each(ptable, i, the_lnet.ln_peer_tables) {
lnet_net_lock(i);
lnet_peer_table_cleanup_locked(ni, ptable);
lnet_net_unlock(i);
}
/* Cleanup all entries on deathrow. */
cfs_percpt_for_each(ptable, i, the_lnet.ln_peer_tables) {
lnet_net_lock(i);
lnet_peer_table_deathrow_wait_locked(ptable, i);
list_splice_init(&ptable->pt_deathrow, &deathrow);
lnet_net_unlock(i);
}
list_for_each_entry_safe(lp, temp, &deathrow, lp_hashlist) {
list_del(&lp->lp_hashlist);
kfree(lp);
}
}
void
lnet_destroy_peer_locked(struct lnet_peer *lp)
{
struct lnet_peer_table *ptable;
LASSERT(!lp->lp_refcount);
LASSERT(!lp->lp_rtr_refcount);
LASSERT(list_empty(&lp->lp_txq));
LASSERT(list_empty(&lp->lp_hashlist));
LASSERT(!lp->lp_txqnob);
ptable = the_lnet.ln_peer_tables[lp->lp_cpt];
LASSERT(ptable->pt_number > 0);
ptable->pt_number--;
lnet_ni_decref_locked(lp->lp_ni, lp->lp_cpt);
lp->lp_ni = NULL;
list_add(&lp->lp_hashlist, &ptable->pt_deathrow);
LASSERT(ptable->pt_zombies > 0);
ptable->pt_zombies--;
}
struct lnet_peer *
lnet_find_peer_locked(struct lnet_peer_table *ptable, lnet_nid_t nid)
{
struct list_head *peers;
struct lnet_peer *lp;
LASSERT(!the_lnet.ln_shutdown);
peers = &ptable->pt_hash[lnet_nid2peerhash(nid)];
list_for_each_entry(lp, peers, lp_hashlist) {
if (lp->lp_nid == nid) {
lnet_peer_addref_locked(lp);
return lp;
}
}
return NULL;
}
int
lnet_nid2peer_locked(struct lnet_peer **lpp, lnet_nid_t nid, int cpt)
{
struct lnet_peer_table *ptable;
struct lnet_peer *lp = NULL;
struct lnet_peer *lp2;
int cpt2;
int rc = 0;
*lpp = NULL;
if (the_lnet.ln_shutdown) /* it's shutting down */
return -ESHUTDOWN;
/* cpt can be LNET_LOCK_EX if it's called from router functions */
cpt2 = cpt != LNET_LOCK_EX ? cpt : lnet_cpt_of_nid_locked(nid);
ptable = the_lnet.ln_peer_tables[cpt2];
lp = lnet_find_peer_locked(ptable, nid);
if (lp) {
*lpp = lp;
return 0;
}
if (!list_empty(&ptable->pt_deathrow)) {
lp = list_entry(ptable->pt_deathrow.next,
struct lnet_peer, lp_hashlist);
list_del(&lp->lp_hashlist);
}
/*
* take extra refcount in case another thread has shutdown LNet
* and destroyed locks and peer-table before I finish the allocation
*/
ptable->pt_number++;
lnet_net_unlock(cpt);
if (lp)
memset(lp, 0, sizeof(*lp));
else
lp = kzalloc_cpt(sizeof(*lp), GFP_NOFS, cpt2);
if (!lp) {
rc = -ENOMEM;
lnet_net_lock(cpt);
goto out;
}
INIT_LIST_HEAD(&lp->lp_txq);
INIT_LIST_HEAD(&lp->lp_rtrq);
INIT_LIST_HEAD(&lp->lp_routes);
lp->lp_notify = 0;
lp->lp_notifylnd = 0;
lp->lp_notifying = 0;
lp->lp_alive_count = 0;
lp->lp_timestamp = 0;
lp->lp_alive = !lnet_peers_start_down(); /* 1 bit!! */
lp->lp_last_alive = jiffies; /* assumes alive */
lp->lp_last_query = 0; /* haven't asked NI yet */
lp->lp_ping_timestamp = 0;
lp->lp_ping_feats = LNET_PING_FEAT_INVAL;
lp->lp_nid = nid;
lp->lp_cpt = cpt2;
lp->lp_refcount = 2; /* 1 for caller; 1 for hash */
lp->lp_rtr_refcount = 0;
lnet_net_lock(cpt);
if (the_lnet.ln_shutdown) {
rc = -ESHUTDOWN;
goto out;
}
lp2 = lnet_find_peer_locked(ptable, nid);
if (lp2) {
*lpp = lp2;
goto out;
}
lp->lp_ni = lnet_net2ni_locked(LNET_NIDNET(nid), cpt2);
if (!lp->lp_ni) {
rc = -EHOSTUNREACH;
goto out;
}
lp->lp_txcredits = lp->lp_ni->ni_peertxcredits;
lp->lp_mintxcredits = lp->lp_ni->ni_peertxcredits;
lp->lp_rtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
lp->lp_minrtrcredits = lnet_peer_buffer_credits(lp->lp_ni);
list_add_tail(&lp->lp_hashlist,
&ptable->pt_hash[lnet_nid2peerhash(nid)]);
ptable->pt_version++;
*lpp = lp;
return 0;
out:
if (lp)
list_add(&lp->lp_hashlist, &ptable->pt_deathrow);
ptable->pt_number--;
return rc;
}
void
lnet_debug_peer(lnet_nid_t nid)
{
char *aliveness = "NA";
struct lnet_peer *lp;
int rc;
int cpt;
cpt = lnet_cpt_of_nid(nid);
lnet_net_lock(cpt);
rc = lnet_nid2peer_locked(&lp, nid, cpt);
if (rc) {
lnet_net_unlock(cpt);
CDEBUG(D_WARNING, "No peer %s\n", libcfs_nid2str(nid));
return;
}
if (lnet_isrouter(lp) || lnet_peer_aliveness_enabled(lp))
aliveness = lp->lp_alive ? "up" : "down";
CDEBUG(D_WARNING, "%-24s %4d %5s %5d %5d %5d %5d %5d %ld\n",
libcfs_nid2str(lp->lp_nid), lp->lp_refcount,
aliveness, lp->lp_ni->ni_peertxcredits,
lp->lp_rtrcredits, lp->lp_minrtrcredits,
lp->lp_txcredits, lp->lp_mintxcredits, lp->lp_txqnob);
lnet_peer_decref_locked(lp);
lnet_net_unlock(cpt);
}
int
lnet_get_peer_info(__u32 peer_index, __u64 *nid,
char aliveness[LNET_MAX_STR_LEN],
__u32 *cpt_iter, __u32 *refcount,
__u32 *ni_peer_tx_credits, __u32 *peer_tx_credits,
__u32 *peer_rtr_credits, __u32 *peer_min_rtr_credits,
__u32 *peer_tx_qnob)
{
struct lnet_peer_table *peer_table;
struct lnet_peer *lp;
bool found = false;
int lncpt, j;
/* get the number of CPTs */
lncpt = cfs_percpt_number(the_lnet.ln_peer_tables);
/*
* if the cpt number to be examined is >= the number of cpts in
* the system then indicate that there are no more cpts to examin
*/
if (*cpt_iter >= lncpt)
return -ENOENT;
/* get the current table */
peer_table = the_lnet.ln_peer_tables[*cpt_iter];
/* if the ptable is NULL then there are no more cpts to examine */
if (!peer_table)
return -ENOENT;
lnet_net_lock(*cpt_iter);
for (j = 0; j < LNET_PEER_HASH_SIZE && !found; j++) {
struct list_head *peers = &peer_table->pt_hash[j];
list_for_each_entry(lp, peers, lp_hashlist) {
if (peer_index-- > 0)
continue;
snprintf(aliveness, LNET_MAX_STR_LEN, "NA");
if (lnet_isrouter(lp) ||
lnet_peer_aliveness_enabled(lp))
snprintf(aliveness, LNET_MAX_STR_LEN,
lp->lp_alive ? "up" : "down");
*nid = lp->lp_nid;
*refcount = lp->lp_refcount;
*ni_peer_tx_credits = lp->lp_ni->ni_peertxcredits;
*peer_tx_credits = lp->lp_txcredits;
*peer_rtr_credits = lp->lp_rtrcredits;
*peer_min_rtr_credits = lp->lp_mintxcredits;
*peer_tx_qnob = lp->lp_txqnob;
found = true;
}
}
lnet_net_unlock(*cpt_iter);
*cpt_iter = lncpt;
return found ? 0 : -ENOENT;
}

File diff suppressed because it is too large Load Diff

View File

@ -1,907 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2011, 2012, Intel Corporation.
*
* This file is part of Portals
* http://sourceforge.net/projects/sandiaportals/
*
* Portals is free software; you can redistribute it and/or
* modify it under the terms of version 2 of the GNU General Public
* License as published by the Free Software Foundation.
*
* Portals is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*/
#define DEBUG_SUBSYSTEM S_LNET
#include <linux/lnet/lib-lnet.h>
/*
* This is really lnet_proc.c. You might need to update sanity test 215
* if any file format is changed.
*/
#define LNET_LOFFT_BITS (sizeof(loff_t) * 8)
/*
* NB: max allowed LNET_CPT_BITS is 8 on 64-bit system and 2 on 32-bit system
*/
#define LNET_PROC_CPT_BITS (LNET_CPT_BITS + 1)
/* change version, 16 bits or 8 bits */
#define LNET_PROC_VER_BITS max_t(size_t, min_t(size_t, LNET_LOFFT_BITS, 64) / 4, 8)
#define LNET_PROC_HASH_BITS LNET_PEER_HASH_BITS
/*
* bits for peer hash offset
* NB: we don't use the highest bit of *ppos because it's signed
*/
#define LNET_PROC_HOFF_BITS (LNET_LOFFT_BITS - \
LNET_PROC_CPT_BITS - \
LNET_PROC_VER_BITS - \
LNET_PROC_HASH_BITS - 1)
/* bits for hash index + position */
#define LNET_PROC_HPOS_BITS (LNET_PROC_HASH_BITS + LNET_PROC_HOFF_BITS)
/* bits for peer hash table + hash version */
#define LNET_PROC_VPOS_BITS (LNET_PROC_HPOS_BITS + LNET_PROC_VER_BITS)
#define LNET_PROC_CPT_MASK ((1ULL << LNET_PROC_CPT_BITS) - 1)
#define LNET_PROC_VER_MASK ((1ULL << LNET_PROC_VER_BITS) - 1)
#define LNET_PROC_HASH_MASK ((1ULL << LNET_PROC_HASH_BITS) - 1)
#define LNET_PROC_HOFF_MASK ((1ULL << LNET_PROC_HOFF_BITS) - 1)
#define LNET_PROC_CPT_GET(pos) \
(int)(((pos) >> LNET_PROC_VPOS_BITS) & LNET_PROC_CPT_MASK)
#define LNET_PROC_VER_GET(pos) \
(int)(((pos) >> LNET_PROC_HPOS_BITS) & LNET_PROC_VER_MASK)
#define LNET_PROC_HASH_GET(pos) \
(int)(((pos) >> LNET_PROC_HOFF_BITS) & LNET_PROC_HASH_MASK)
#define LNET_PROC_HOFF_GET(pos) \
(int)((pos) & LNET_PROC_HOFF_MASK)
#define LNET_PROC_POS_MAKE(cpt, ver, hash, off) \
(((((loff_t)(cpt)) & LNET_PROC_CPT_MASK) << LNET_PROC_VPOS_BITS) | \
((((loff_t)(ver)) & LNET_PROC_VER_MASK) << LNET_PROC_HPOS_BITS) | \
((((loff_t)(hash)) & LNET_PROC_HASH_MASK) << LNET_PROC_HOFF_BITS) | \
((off) & LNET_PROC_HOFF_MASK))
#define LNET_PROC_VERSION(v) ((unsigned int)((v) & LNET_PROC_VER_MASK))
static int __proc_lnet_stats(void *data, int write,
loff_t pos, void __user *buffer, int nob)
{
int rc;
struct lnet_counters *ctrs;
int len;
char *tmpstr;
const int tmpsiz = 256; /* 7 %u and 4 %llu */
if (write) {
lnet_counters_reset();
return 0;
}
/* read */
ctrs = kzalloc(sizeof(*ctrs), GFP_NOFS);
if (!ctrs)
return -ENOMEM;
tmpstr = kmalloc(tmpsiz, GFP_KERNEL);
if (!tmpstr) {
kfree(ctrs);
return -ENOMEM;
}
lnet_counters_get(ctrs);
len = snprintf(tmpstr, tmpsiz,
"%u %u %u %u %u %u %u %llu %llu %llu %llu",
ctrs->msgs_alloc, ctrs->msgs_max,
ctrs->errors,
ctrs->send_count, ctrs->recv_count,
ctrs->route_count, ctrs->drop_count,
ctrs->send_length, ctrs->recv_length,
ctrs->route_length, ctrs->drop_length);
if (pos >= min_t(int, len, strlen(tmpstr)))
rc = 0;
else
rc = cfs_trace_copyout_string(buffer, nob,
tmpstr + pos, "\n");
kfree(tmpstr);
kfree(ctrs);
return rc;
}
static int proc_lnet_stats(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
__proc_lnet_stats);
}
static int proc_lnet_routes(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
const int tmpsiz = 256;
char *tmpstr;
char *s;
int rc = 0;
int len;
int ver;
int off;
BUILD_BUG_ON(sizeof(loff_t) < 4);
off = LNET_PROC_HOFF_GET(*ppos);
ver = LNET_PROC_VER_GET(*ppos);
LASSERT(!write);
if (!*lenp)
return 0;
tmpstr = kmalloc(tmpsiz, GFP_KERNEL);
if (!tmpstr)
return -ENOMEM;
s = tmpstr; /* points to current position in tmpstr[] */
if (!*ppos) {
s += snprintf(s, tmpstr + tmpsiz - s, "Routing %s\n",
the_lnet.ln_routing ? "enabled" : "disabled");
LASSERT(tmpstr + tmpsiz - s > 0);
s += snprintf(s, tmpstr + tmpsiz - s, "%-8s %4s %8s %7s %s\n",
"net", "hops", "priority", "state", "router");
LASSERT(tmpstr + tmpsiz - s > 0);
lnet_net_lock(0);
ver = (unsigned int)the_lnet.ln_remote_nets_version;
lnet_net_unlock(0);
*ppos = LNET_PROC_POS_MAKE(0, ver, 0, off);
} else {
struct list_head *n;
struct list_head *r;
struct lnet_route *route = NULL;
struct lnet_remotenet *rnet = NULL;
int skip = off - 1;
struct list_head *rn_list;
int i;
lnet_net_lock(0);
if (ver != LNET_PROC_VERSION(the_lnet.ln_remote_nets_version)) {
lnet_net_unlock(0);
kfree(tmpstr);
return -ESTALE;
}
for (i = 0; i < LNET_REMOTE_NETS_HASH_SIZE && !route; i++) {
rn_list = &the_lnet.ln_remote_nets_hash[i];
n = rn_list->next;
while (n != rn_list && !route) {
rnet = list_entry(n, struct lnet_remotenet,
lrn_list);
r = rnet->lrn_routes.next;
while (r != &rnet->lrn_routes) {
struct lnet_route *re;
re = list_entry(r, struct lnet_route,
lr_list);
if (!skip) {
route = re;
break;
}
skip--;
r = r->next;
}
n = n->next;
}
}
if (route) {
__u32 net = rnet->lrn_net;
__u32 hops = route->lr_hops;
unsigned int priority = route->lr_priority;
lnet_nid_t nid = route->lr_gateway->lp_nid;
int alive = lnet_is_route_alive(route);
s += snprintf(s, tmpstr + tmpsiz - s,
"%-8s %4u %8u %7s %s\n",
libcfs_net2str(net), hops,
priority,
alive ? "up" : "down",
libcfs_nid2str(nid));
LASSERT(tmpstr + tmpsiz - s > 0);
}
lnet_net_unlock(0);
}
len = s - tmpstr; /* how many bytes was written */
if (len > *lenp) { /* linux-supplied buffer is too small */
rc = -EINVAL;
} else if (len > 0) { /* wrote something */
if (copy_to_user(buffer, tmpstr, len)) {
rc = -EFAULT;
} else {
off += 1;
*ppos = LNET_PROC_POS_MAKE(0, ver, 0, off);
}
}
kfree(tmpstr);
if (!rc)
*lenp = len;
return rc;
}
static int proc_lnet_routers(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
int rc = 0;
char *tmpstr;
char *s;
const int tmpsiz = 256;
int len;
int ver;
int off;
off = LNET_PROC_HOFF_GET(*ppos);
ver = LNET_PROC_VER_GET(*ppos);
LASSERT(!write);
if (!*lenp)
return 0;
tmpstr = kmalloc(tmpsiz, GFP_KERNEL);
if (!tmpstr)
return -ENOMEM;
s = tmpstr; /* points to current position in tmpstr[] */
if (!*ppos) {
s += snprintf(s, tmpstr + tmpsiz - s,
"%-4s %7s %9s %6s %12s %9s %8s %7s %s\n",
"ref", "rtr_ref", "alive_cnt", "state",
"last_ping", "ping_sent", "deadline",
"down_ni", "router");
LASSERT(tmpstr + tmpsiz - s > 0);
lnet_net_lock(0);
ver = (unsigned int)the_lnet.ln_routers_version;
lnet_net_unlock(0);
*ppos = LNET_PROC_POS_MAKE(0, ver, 0, off);
} else {
struct list_head *r;
struct lnet_peer *peer = NULL;
int skip = off - 1;
lnet_net_lock(0);
if (ver != LNET_PROC_VERSION(the_lnet.ln_routers_version)) {
lnet_net_unlock(0);
kfree(tmpstr);
return -ESTALE;
}
r = the_lnet.ln_routers.next;
while (r != &the_lnet.ln_routers) {
struct lnet_peer *lp;
lp = list_entry(r, struct lnet_peer, lp_rtr_list);
if (!skip) {
peer = lp;
break;
}
skip--;
r = r->next;
}
if (peer) {
lnet_nid_t nid = peer->lp_nid;
unsigned long now = jiffies;
unsigned long deadline = peer->lp_ping_deadline;
int nrefs = peer->lp_refcount;
int nrtrrefs = peer->lp_rtr_refcount;
int alive_cnt = peer->lp_alive_count;
int alive = peer->lp_alive;
int pingsent = !peer->lp_ping_notsent;
int last_ping = (now - peer->lp_ping_timestamp) / HZ;
int down_ni = 0;
struct lnet_route *rtr;
if ((peer->lp_ping_feats &
LNET_PING_FEAT_NI_STATUS)) {
list_for_each_entry(rtr, &peer->lp_routes,
lr_gwlist) {
/*
* downis on any route should be the
* number of downis on the gateway
*/
if (rtr->lr_downis) {
down_ni = rtr->lr_downis;
break;
}
}
}
if (!deadline)
s += snprintf(s, tmpstr + tmpsiz - s,
"%-4d %7d %9d %6s %12d %9d %8s %7d %s\n",
nrefs, nrtrrefs, alive_cnt,
alive ? "up" : "down", last_ping,
pingsent, "NA", down_ni,
libcfs_nid2str(nid));
else
s += snprintf(s, tmpstr + tmpsiz - s,
"%-4d %7d %9d %6s %12d %9d %8lu %7d %s\n",
nrefs, nrtrrefs, alive_cnt,
alive ? "up" : "down", last_ping,
pingsent,
(deadline - now) / HZ,
down_ni, libcfs_nid2str(nid));
LASSERT(tmpstr + tmpsiz - s > 0);
}
lnet_net_unlock(0);
}
len = s - tmpstr; /* how many bytes was written */
if (len > *lenp) { /* linux-supplied buffer is too small */
rc = -EINVAL;
} else if (len > 0) { /* wrote something */
if (copy_to_user(buffer, tmpstr, len)) {
rc = -EFAULT;
} else {
off += 1;
*ppos = LNET_PROC_POS_MAKE(0, ver, 0, off);
}
}
kfree(tmpstr);
if (!rc)
*lenp = len;
return rc;
}
static int proc_lnet_peers(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
const int tmpsiz = 256;
struct lnet_peer_table *ptable;
char *tmpstr;
char *s;
int cpt = LNET_PROC_CPT_GET(*ppos);
int ver = LNET_PROC_VER_GET(*ppos);
int hash = LNET_PROC_HASH_GET(*ppos);
int hoff = LNET_PROC_HOFF_GET(*ppos);
int rc = 0;
int len;
BUILD_BUG_ON(LNET_PROC_HASH_BITS < LNET_PEER_HASH_BITS);
LASSERT(!write);
if (!*lenp)
return 0;
if (cpt >= LNET_CPT_NUMBER) {
*lenp = 0;
return 0;
}
tmpstr = kmalloc(tmpsiz, GFP_KERNEL);
if (!tmpstr)
return -ENOMEM;
s = tmpstr; /* points to current position in tmpstr[] */
if (!*ppos) {
s += snprintf(s, tmpstr + tmpsiz - s,
"%-24s %4s %5s %5s %5s %5s %5s %5s %5s %s\n",
"nid", "refs", "state", "last", "max",
"rtr", "min", "tx", "min", "queue");
LASSERT(tmpstr + tmpsiz - s > 0);
hoff++;
} else {
struct lnet_peer *peer;
struct list_head *p;
int skip;
again:
p = NULL;
peer = NULL;
skip = hoff - 1;
lnet_net_lock(cpt);
ptable = the_lnet.ln_peer_tables[cpt];
if (hoff == 1)
ver = LNET_PROC_VERSION(ptable->pt_version);
if (ver != LNET_PROC_VERSION(ptable->pt_version)) {
lnet_net_unlock(cpt);
kfree(tmpstr);
return -ESTALE;
}
while (hash < LNET_PEER_HASH_SIZE) {
if (!p)
p = ptable->pt_hash[hash].next;
while (p != &ptable->pt_hash[hash]) {
struct lnet_peer *lp;
lp = list_entry(p, struct lnet_peer,
lp_hashlist);
if (!skip) {
peer = lp;
/*
* minor optimization: start from idx+1
* on next iteration if we've just
* drained lp_hashlist
*/
if (lp->lp_hashlist.next ==
&ptable->pt_hash[hash]) {
hoff = 1;
hash++;
} else {
hoff++;
}
break;
}
skip--;
p = lp->lp_hashlist.next;
}
if (peer)
break;
p = NULL;
hoff = 1;
hash++;
}
if (peer) {
lnet_nid_t nid = peer->lp_nid;
int nrefs = peer->lp_refcount;
int lastalive = -1;
char *aliveness = "NA";
int maxcr = peer->lp_ni->ni_peertxcredits;
int txcr = peer->lp_txcredits;
int mintxcr = peer->lp_mintxcredits;
int rtrcr = peer->lp_rtrcredits;
int minrtrcr = peer->lp_minrtrcredits;
int txqnob = peer->lp_txqnob;
if (lnet_isrouter(peer) ||
lnet_peer_aliveness_enabled(peer))
aliveness = peer->lp_alive ? "up" : "down";
if (lnet_peer_aliveness_enabled(peer)) {
unsigned long now = jiffies;
long delta;
delta = now - peer->lp_last_alive;
lastalive = (delta) / HZ;
/* No need to mess up peers contents with
* arbitrarily long integers - it suffices to
* know that lastalive is more than 10000s old
*/
if (lastalive >= 10000)
lastalive = 9999;
}
lnet_net_unlock(cpt);
s += snprintf(s, tmpstr + tmpsiz - s,
"%-24s %4d %5s %5d %5d %5d %5d %5d %5d %d\n",
libcfs_nid2str(nid), nrefs, aliveness,
lastalive, maxcr, rtrcr, minrtrcr, txcr,
mintxcr, txqnob);
LASSERT(tmpstr + tmpsiz - s > 0);
} else { /* peer is NULL */
lnet_net_unlock(cpt);
}
if (hash == LNET_PEER_HASH_SIZE) {
cpt++;
hash = 0;
hoff = 1;
if (!peer && cpt < LNET_CPT_NUMBER)
goto again;
}
}
len = s - tmpstr; /* how many bytes was written */
if (len > *lenp) { /* linux-supplied buffer is too small */
rc = -EINVAL;
} else if (len > 0) { /* wrote something */
if (copy_to_user(buffer, tmpstr, len))
rc = -EFAULT;
else
*ppos = LNET_PROC_POS_MAKE(cpt, ver, hash, hoff);
}
kfree(tmpstr);
if (!rc)
*lenp = len;
return rc;
}
static int __proc_lnet_buffers(void *data, int write,
loff_t pos, void __user *buffer, int nob)
{
char *s;
char *tmpstr;
int tmpsiz;
int idx;
int len;
int rc;
int i;
LASSERT(!write);
/* (4 %d) * 4 * LNET_CPT_NUMBER */
tmpsiz = 64 * (LNET_NRBPOOLS + 1) * LNET_CPT_NUMBER;
tmpstr = kvmalloc(tmpsiz, GFP_KERNEL);
if (!tmpstr)
return -ENOMEM;
s = tmpstr; /* points to current position in tmpstr[] */
s += snprintf(s, tmpstr + tmpsiz - s,
"%5s %5s %7s %7s\n",
"pages", "count", "credits", "min");
LASSERT(tmpstr + tmpsiz - s > 0);
if (!the_lnet.ln_rtrpools)
goto out; /* I'm not a router */
for (idx = 0; idx < LNET_NRBPOOLS; idx++) {
struct lnet_rtrbufpool *rbp;
lnet_net_lock(LNET_LOCK_EX);
cfs_percpt_for_each(rbp, i, the_lnet.ln_rtrpools) {
s += snprintf(s, tmpstr + tmpsiz - s,
"%5d %5d %7d %7d\n",
rbp[idx].rbp_npages,
rbp[idx].rbp_nbuffers,
rbp[idx].rbp_credits,
rbp[idx].rbp_mincredits);
LASSERT(tmpstr + tmpsiz - s > 0);
}
lnet_net_unlock(LNET_LOCK_EX);
}
out:
len = s - tmpstr;
if (pos >= min_t(int, len, strlen(tmpstr)))
rc = 0;
else
rc = cfs_trace_copyout_string(buffer, nob,
tmpstr + pos, NULL);
kvfree(tmpstr);
return rc;
}
static int proc_lnet_buffers(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
__proc_lnet_buffers);
}
static int proc_lnet_nis(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
int tmpsiz = 128 * LNET_CPT_NUMBER;
int rc = 0;
char *tmpstr;
char *s;
int len;
LASSERT(!write);
if (!*lenp)
return 0;
tmpstr = kvmalloc(tmpsiz, GFP_KERNEL);
if (!tmpstr)
return -ENOMEM;
s = tmpstr; /* points to current position in tmpstr[] */
if (!*ppos) {
s += snprintf(s, tmpstr + tmpsiz - s,
"%-24s %6s %5s %4s %4s %4s %5s %5s %5s\n",
"nid", "status", "alive", "refs", "peer",
"rtr", "max", "tx", "min");
LASSERT(tmpstr + tmpsiz - s > 0);
} else {
struct list_head *n;
struct lnet_ni *ni = NULL;
int skip = *ppos - 1;
lnet_net_lock(0);
n = the_lnet.ln_nis.next;
while (n != &the_lnet.ln_nis) {
struct lnet_ni *a_ni;
a_ni = list_entry(n, struct lnet_ni, ni_list);
if (!skip) {
ni = a_ni;
break;
}
skip--;
n = n->next;
}
if (ni) {
struct lnet_tx_queue *tq;
char *stat;
time64_t now = ktime_get_real_seconds();
int last_alive = -1;
int i;
int j;
if (the_lnet.ln_routing)
last_alive = now - ni->ni_last_alive;
/* @lo forever alive */
if (ni->ni_lnd->lnd_type == LOLND)
last_alive = 0;
lnet_ni_lock(ni);
LASSERT(ni->ni_status);
stat = (ni->ni_status->ns_status ==
LNET_NI_STATUS_UP) ? "up" : "down";
lnet_ni_unlock(ni);
/*
* we actually output credits information for
* TX queue of each partition
*/
cfs_percpt_for_each(tq, i, ni->ni_tx_queues) {
for (j = 0; ni->ni_cpts &&
j < ni->ni_ncpts; j++) {
if (i == ni->ni_cpts[j])
break;
}
if (j == ni->ni_ncpts)
continue;
if (i)
lnet_net_lock(i);
s += snprintf(s, tmpstr + tmpsiz - s,
"%-24s %6s %5d %4d %4d %4d %5d %5d %5d\n",
libcfs_nid2str(ni->ni_nid), stat,
last_alive, *ni->ni_refs[i],
ni->ni_peertxcredits,
ni->ni_peerrtrcredits,
tq->tq_credits_max,
tq->tq_credits,
tq->tq_credits_min);
if (i)
lnet_net_unlock(i);
}
LASSERT(tmpstr + tmpsiz - s > 0);
}
lnet_net_unlock(0);
}
len = s - tmpstr; /* how many bytes was written */
if (len > *lenp) { /* linux-supplied buffer is too small */
rc = -EINVAL;
} else if (len > 0) { /* wrote something */
if (copy_to_user(buffer, tmpstr, len))
rc = -EFAULT;
else
*ppos += 1;
}
kvfree(tmpstr);
if (!rc)
*lenp = len;
return rc;
}
struct lnet_portal_rotors {
int pr_value;
const char *pr_name;
const char *pr_desc;
};
static struct lnet_portal_rotors portal_rotors[] = {
{
.pr_value = LNET_PTL_ROTOR_OFF,
.pr_name = "OFF",
.pr_desc = "Turn off message rotor for wildcard portals"
},
{
.pr_value = LNET_PTL_ROTOR_ON,
.pr_name = "ON",
.pr_desc = "round-robin dispatch all PUT messages for wildcard portals"
},
{
.pr_value = LNET_PTL_ROTOR_RR_RT,
.pr_name = "RR_RT",
.pr_desc = "round-robin dispatch routed PUT message for wildcard portals"
},
{
.pr_value = LNET_PTL_ROTOR_HASH_RT,
.pr_name = "HASH_RT",
.pr_desc = "dispatch routed PUT message by hashing source NID for wildcard portals"
},
{
.pr_value = -1,
.pr_name = NULL,
.pr_desc = NULL
},
};
static int __proc_lnet_portal_rotor(void *data, int write,
loff_t pos, void __user *buffer, int nob)
{
const int buf_len = 128;
char *buf;
char *tmp;
int rc;
int i;
buf = kmalloc(buf_len, GFP_KERNEL);
if (!buf)
return -ENOMEM;
if (!write) {
lnet_res_lock(0);
for (i = 0; portal_rotors[i].pr_value >= 0; i++) {
if (portal_rotors[i].pr_value == portal_rotor)
break;
}
LASSERT(portal_rotors[i].pr_value == portal_rotor);
lnet_res_unlock(0);
rc = snprintf(buf, buf_len,
"{\n\tportals: all\n"
"\trotor: %s\n\tdescription: %s\n}",
portal_rotors[i].pr_name,
portal_rotors[i].pr_desc);
if (pos >= min_t(int, rc, buf_len)) {
rc = 0;
} else {
rc = cfs_trace_copyout_string(buffer, nob,
buf + pos, "\n");
}
goto out;
}
rc = cfs_trace_copyin_string(buf, buf_len, buffer, nob);
if (rc < 0)
goto out;
tmp = strim(buf);
rc = -EINVAL;
lnet_res_lock(0);
for (i = 0; portal_rotors[i].pr_name; i++) {
if (!strncasecmp(portal_rotors[i].pr_name, tmp,
strlen(portal_rotors[i].pr_name))) {
portal_rotor = portal_rotors[i].pr_value;
rc = 0;
break;
}
}
lnet_res_unlock(0);
out:
kfree(buf);
return rc;
}
static int proc_lnet_portal_rotor(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp,
loff_t *ppos)
{
return lprocfs_call_handler(table->data, write, ppos, buffer, lenp,
__proc_lnet_portal_rotor);
}
static struct ctl_table lnet_table[] = {
/*
* NB No .strategy entries have been provided since sysctl(8) prefers
* to go via /proc for portability.
*/
{
.procname = "stats",
.mode = 0644,
.proc_handler = &proc_lnet_stats,
},
{
.procname = "routes",
.mode = 0444,
.proc_handler = &proc_lnet_routes,
},
{
.procname = "routers",
.mode = 0444,
.proc_handler = &proc_lnet_routers,
},
{
.procname = "peers",
.mode = 0444,
.proc_handler = &proc_lnet_peers,
},
{
.procname = "buffers",
.mode = 0444,
.proc_handler = &proc_lnet_buffers,
},
{
.procname = "nis",
.mode = 0444,
.proc_handler = &proc_lnet_nis,
},
{
.procname = "portal_rotor",
.mode = 0644,
.proc_handler = &proc_lnet_portal_rotor,
},
{
}
};
void lnet_router_debugfs_init(void)
{
lustre_insert_debugfs(lnet_table);
}
void lnet_router_debugfs_fini(void)
{
}

View File

@ -1,7 +0,0 @@
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/include
subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include
obj-$(CONFIG_LNET_SELFTEST) := lnet_selftest.o
lnet_selftest-y := console.o conrpc.o conctl.o framework.o timer.o rpc.o \
module.o ping_test.o brw_test.o

View File

@ -1,526 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2015, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/selftest/brw_test.c
*
* Author: Isaac Huang <isaac@clusterfs.com>
*/
#include "selftest.h"
static int brw_srv_workitems = SFW_TEST_WI_MAX;
module_param(brw_srv_workitems, int, 0644);
MODULE_PARM_DESC(brw_srv_workitems, "# BRW server workitems");
static int brw_inject_errors;
module_param(brw_inject_errors, int, 0644);
MODULE_PARM_DESC(brw_inject_errors, "# data errors to inject randomly, zero by default");
#define BRW_POISON 0xbeefbeefbeefbeefULL
#define BRW_MAGIC 0xeeb0eeb1eeb2eeb3ULL
#define BRW_MSIZE sizeof(u64)
static void
brw_client_fini(struct sfw_test_instance *tsi)
{
struct srpc_bulk *bulk;
struct sfw_test_unit *tsu;
LASSERT(tsi->tsi_is_client);
list_for_each_entry(tsu, &tsi->tsi_units, tsu_list) {
bulk = tsu->tsu_private;
if (!bulk)
continue;
srpc_free_bulk(bulk);
tsu->tsu_private = NULL;
}
}
static int
brw_client_init(struct sfw_test_instance *tsi)
{
struct sfw_session *sn = tsi->tsi_batch->bat_session;
int flags;
int off;
int npg;
int len;
int opc;
struct srpc_bulk *bulk;
struct sfw_test_unit *tsu;
LASSERT(sn);
LASSERT(tsi->tsi_is_client);
if (!(sn->sn_features & LST_FEAT_BULK_LEN)) {
struct test_bulk_req *breq = &tsi->tsi_u.bulk_v0;
opc = breq->blk_opc;
flags = breq->blk_flags;
npg = breq->blk_npg;
/*
* NB: this is not going to work for variable page size,
* but we have to keep it for compatibility
*/
len = npg * PAGE_SIZE;
off = 0;
} else {
struct test_bulk_req_v1 *breq = &tsi->tsi_u.bulk_v1;
/*
* I should never get this step if it's unknown feature
* because make_session will reject unknown feature
*/
LASSERT(!(sn->sn_features & ~LST_FEATS_MASK));
opc = breq->blk_opc;
flags = breq->blk_flags;
len = breq->blk_len;
off = breq->blk_offset & ~PAGE_MASK;
npg = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
}
if (off % BRW_MSIZE)
return -EINVAL;
if (npg > LNET_MAX_IOV || npg <= 0)
return -EINVAL;
if (opc != LST_BRW_READ && opc != LST_BRW_WRITE)
return -EINVAL;
if (flags != LST_BRW_CHECK_NONE &&
flags != LST_BRW_CHECK_FULL && flags != LST_BRW_CHECK_SIMPLE)
return -EINVAL;
list_for_each_entry(tsu, &tsi->tsi_units, tsu_list) {
bulk = srpc_alloc_bulk(lnet_cpt_of_nid(tsu->tsu_dest.nid),
off, npg, len, opc == LST_BRW_READ);
if (!bulk) {
brw_client_fini(tsi);
return -ENOMEM;
}
tsu->tsu_private = bulk;
}
return 0;
}
static int brw_inject_one_error(void)
{
struct timespec64 ts;
if (brw_inject_errors <= 0)
return 0;
ktime_get_ts64(&ts);
if (!((ts.tv_nsec / NSEC_PER_USEC) & 1))
return 0;
return brw_inject_errors--;
}
static void
brw_fill_page(struct page *pg, int off, int len, int pattern, __u64 magic)
{
char *addr = page_address(pg) + off;
int i;
LASSERT(addr);
LASSERT(!(off % BRW_MSIZE) && !(len % BRW_MSIZE));
if (pattern == LST_BRW_CHECK_NONE)
return;
if (magic == BRW_MAGIC)
magic += brw_inject_one_error();
if (pattern == LST_BRW_CHECK_SIMPLE) {
memcpy(addr, &magic, BRW_MSIZE);
if (len > BRW_MSIZE) {
addr += PAGE_SIZE - BRW_MSIZE;
memcpy(addr, &magic, BRW_MSIZE);
}
return;
}
if (pattern == LST_BRW_CHECK_FULL) {
for (i = 0; i < len; i += BRW_MSIZE)
memcpy(addr + i, &magic, BRW_MSIZE);
return;
}
LBUG();
}
static int
brw_check_page(struct page *pg, int off, int len, int pattern, __u64 magic)
{
char *addr = page_address(pg) + off;
__u64 data = 0; /* make compiler happy */
int i;
LASSERT(addr);
LASSERT(!(off % BRW_MSIZE) && !(len % BRW_MSIZE));
if (pattern == LST_BRW_CHECK_NONE)
return 0;
if (pattern == LST_BRW_CHECK_SIMPLE) {
data = *((__u64 *)addr);
if (data != magic)
goto bad_data;
if (len > BRW_MSIZE) {
addr += PAGE_SIZE - BRW_MSIZE;
data = *((__u64 *)addr);
if (data != magic)
goto bad_data;
}
return 0;
}
if (pattern == LST_BRW_CHECK_FULL) {
for (i = 0; i < len; i += BRW_MSIZE) {
data = *(u64 *)(addr + i);
if (data != magic)
goto bad_data;
}
return 0;
}
LBUG();
bad_data:
CERROR("Bad data in page %p: %#llx, %#llx expected\n",
pg, data, magic);
return 1;
}
static void
brw_fill_bulk(struct srpc_bulk *bk, int pattern, __u64 magic)
{
int i;
struct page *pg;
for (i = 0; i < bk->bk_niov; i++) {
int off, len;
pg = bk->bk_iovs[i].bv_page;
off = bk->bk_iovs[i].bv_offset;
len = bk->bk_iovs[i].bv_len;
brw_fill_page(pg, off, len, pattern, magic);
}
}
static int
brw_check_bulk(struct srpc_bulk *bk, int pattern, __u64 magic)
{
int i;
struct page *pg;
for (i = 0; i < bk->bk_niov; i++) {
int off, len;
pg = bk->bk_iovs[i].bv_page;
off = bk->bk_iovs[i].bv_offset;
len = bk->bk_iovs[i].bv_len;
if (brw_check_page(pg, off, len, pattern, magic)) {
CERROR("Bulk page %p (%d/%d) is corrupted!\n",
pg, i, bk->bk_niov);
return 1;
}
}
return 0;
}
static int
brw_client_prep_rpc(struct sfw_test_unit *tsu, struct lnet_process_id dest,
struct srpc_client_rpc **rpcpp)
{
struct srpc_bulk *bulk = tsu->tsu_private;
struct sfw_test_instance *tsi = tsu->tsu_instance;
struct sfw_session *sn = tsi->tsi_batch->bat_session;
struct srpc_client_rpc *rpc;
struct srpc_brw_reqst *req;
int flags;
int npg;
int len;
int opc;
int rc;
LASSERT(sn);
LASSERT(bulk);
if (!(sn->sn_features & LST_FEAT_BULK_LEN)) {
struct test_bulk_req *breq = &tsi->tsi_u.bulk_v0;
opc = breq->blk_opc;
flags = breq->blk_flags;
npg = breq->blk_npg;
len = npg * PAGE_SIZE;
} else {
struct test_bulk_req_v1 *breq = &tsi->tsi_u.bulk_v1;
int off;
/*
* I should never get this step if it's unknown feature
* because make_session will reject unknown feature
*/
LASSERT(!(sn->sn_features & ~LST_FEATS_MASK));
opc = breq->blk_opc;
flags = breq->blk_flags;
len = breq->blk_len;
off = breq->blk_offset;
npg = (off + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
}
rc = sfw_create_test_rpc(tsu, dest, sn->sn_features, npg, len, &rpc);
if (rc)
return rc;
memcpy(&rpc->crpc_bulk, bulk, offsetof(struct srpc_bulk, bk_iovs[npg]));
if (opc == LST_BRW_WRITE)
brw_fill_bulk(&rpc->crpc_bulk, flags, BRW_MAGIC);
else
brw_fill_bulk(&rpc->crpc_bulk, flags, BRW_POISON);
req = &rpc->crpc_reqstmsg.msg_body.brw_reqst;
req->brw_flags = flags;
req->brw_rw = opc;
req->brw_len = len;
*rpcpp = rpc;
return 0;
}
static void
brw_client_done_rpc(struct sfw_test_unit *tsu, struct srpc_client_rpc *rpc)
{
__u64 magic = BRW_MAGIC;
struct sfw_test_instance *tsi = tsu->tsu_instance;
struct sfw_session *sn = tsi->tsi_batch->bat_session;
struct srpc_msg *msg = &rpc->crpc_replymsg;
struct srpc_brw_reply *reply = &msg->msg_body.brw_reply;
struct srpc_brw_reqst *reqst = &rpc->crpc_reqstmsg.msg_body.brw_reqst;
LASSERT(sn);
if (rpc->crpc_status) {
CERROR("BRW RPC to %s failed with %d\n",
libcfs_id2str(rpc->crpc_dest), rpc->crpc_status);
if (!tsi->tsi_stopping) /* rpc could have been aborted */
atomic_inc(&sn->sn_brw_errors);
return;
}
if (msg->msg_magic != SRPC_MSG_MAGIC) {
__swab64s(&magic);
__swab32s(&reply->brw_status);
}
CDEBUG(reply->brw_status ? D_WARNING : D_NET,
"BRW RPC to %s finished with brw_status: %d\n",
libcfs_id2str(rpc->crpc_dest), reply->brw_status);
if (reply->brw_status) {
atomic_inc(&sn->sn_brw_errors);
rpc->crpc_status = -(int)reply->brw_status;
return;
}
if (reqst->brw_rw == LST_BRW_WRITE)
return;
if (brw_check_bulk(&rpc->crpc_bulk, reqst->brw_flags, magic)) {
CERROR("Bulk data from %s is corrupted!\n",
libcfs_id2str(rpc->crpc_dest));
atomic_inc(&sn->sn_brw_errors);
rpc->crpc_status = -EBADMSG;
}
}
static void
brw_server_rpc_done(struct srpc_server_rpc *rpc)
{
struct srpc_bulk *blk = rpc->srpc_bulk;
if (!blk)
return;
if (rpc->srpc_status)
CERROR("Bulk transfer %s %s has failed: %d\n",
blk->bk_sink ? "from" : "to",
libcfs_id2str(rpc->srpc_peer), rpc->srpc_status);
else
CDEBUG(D_NET, "Transferred %d pages bulk data %s %s\n",
blk->bk_niov, blk->bk_sink ? "from" : "to",
libcfs_id2str(rpc->srpc_peer));
sfw_free_pages(rpc);
}
static int
brw_bulk_ready(struct srpc_server_rpc *rpc, int status)
{
__u64 magic = BRW_MAGIC;
struct srpc_brw_reply *reply = &rpc->srpc_replymsg.msg_body.brw_reply;
struct srpc_brw_reqst *reqst;
struct srpc_msg *reqstmsg;
LASSERT(rpc->srpc_bulk);
LASSERT(rpc->srpc_reqstbuf);
reqstmsg = &rpc->srpc_reqstbuf->buf_msg;
reqst = &reqstmsg->msg_body.brw_reqst;
if (status) {
CERROR("BRW bulk %s failed for RPC from %s: %d\n",
reqst->brw_rw == LST_BRW_READ ? "READ" : "WRITE",
libcfs_id2str(rpc->srpc_peer), status);
return -EIO;
}
if (reqst->brw_rw == LST_BRW_READ)
return 0;
if (reqstmsg->msg_magic != SRPC_MSG_MAGIC)
__swab64s(&magic);
if (brw_check_bulk(rpc->srpc_bulk, reqst->brw_flags, magic)) {
CERROR("Bulk data from %s is corrupted!\n",
libcfs_id2str(rpc->srpc_peer));
reply->brw_status = EBADMSG;
}
return 0;
}
static int
brw_server_handle(struct srpc_server_rpc *rpc)
{
struct srpc_service *sv = rpc->srpc_scd->scd_svc;
struct srpc_msg *replymsg = &rpc->srpc_replymsg;
struct srpc_msg *reqstmsg = &rpc->srpc_reqstbuf->buf_msg;
struct srpc_brw_reply *reply = &replymsg->msg_body.brw_reply;
struct srpc_brw_reqst *reqst = &reqstmsg->msg_body.brw_reqst;
int npg;
int rc;
LASSERT(sv->sv_id == SRPC_SERVICE_BRW);
if (reqstmsg->msg_magic != SRPC_MSG_MAGIC) {
LASSERT(reqstmsg->msg_magic == __swab32(SRPC_MSG_MAGIC));
__swab32s(&reqst->brw_rw);
__swab32s(&reqst->brw_len);
__swab32s(&reqst->brw_flags);
__swab64s(&reqst->brw_rpyid);
__swab64s(&reqst->brw_bulkid);
}
LASSERT(reqstmsg->msg_type == (__u32)srpc_service2request(sv->sv_id));
reply->brw_status = 0;
rpc->srpc_done = brw_server_rpc_done;
if ((reqst->brw_rw != LST_BRW_READ && reqst->brw_rw != LST_BRW_WRITE) ||
(reqst->brw_flags != LST_BRW_CHECK_NONE &&
reqst->brw_flags != LST_BRW_CHECK_FULL &&
reqst->brw_flags != LST_BRW_CHECK_SIMPLE)) {
reply->brw_status = EINVAL;
return 0;
}
if (reqstmsg->msg_ses_feats & ~LST_FEATS_MASK) {
replymsg->msg_ses_feats = LST_FEATS_MASK;
reply->brw_status = EPROTO;
return 0;
}
if (!(reqstmsg->msg_ses_feats & LST_FEAT_BULK_LEN)) {
/* compat with old version */
if (reqst->brw_len & ~PAGE_MASK) {
reply->brw_status = EINVAL;
return 0;
}
npg = reqst->brw_len >> PAGE_SHIFT;
} else {
npg = (reqst->brw_len + PAGE_SIZE - 1) >> PAGE_SHIFT;
}
replymsg->msg_ses_feats = reqstmsg->msg_ses_feats;
if (!reqst->brw_len || npg > LNET_MAX_IOV) {
reply->brw_status = EINVAL;
return 0;
}
rc = sfw_alloc_pages(rpc, rpc->srpc_scd->scd_cpt, npg,
reqst->brw_len,
reqst->brw_rw == LST_BRW_WRITE);
if (rc)
return rc;
if (reqst->brw_rw == LST_BRW_READ)
brw_fill_bulk(rpc->srpc_bulk, reqst->brw_flags, BRW_MAGIC);
else
brw_fill_bulk(rpc->srpc_bulk, reqst->brw_flags, BRW_POISON);
return 0;
}
struct sfw_test_client_ops brw_test_client;
void brw_init_test_client(void)
{
brw_test_client.tso_init = brw_client_init;
brw_test_client.tso_fini = brw_client_fini;
brw_test_client.tso_prep_rpc = brw_client_prep_rpc;
brw_test_client.tso_done_rpc = brw_client_done_rpc;
};
struct srpc_service brw_test_service;
void brw_init_test_service(void)
{
brw_test_service.sv_id = SRPC_SERVICE_BRW;
brw_test_service.sv_name = "brw_test";
brw_test_service.sv_handler = brw_server_handle;
brw_test_service.sv_bulk_ready = brw_bulk_ready;
brw_test_service.sv_wi_total = brw_srv_workitems;
}

View File

@ -1,801 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/selftest/conctl.c
*
* IOC handle in kernel
*
* Author: Liang Zhen <liangzhen@clusterfs.com>
*/
#include <linux/lnet/lib-lnet.h>
#include <uapi/linux/lnet/lnetst.h>
#include "console.h"
static int
lst_session_new_ioctl(struct lstio_session_new_args *args)
{
char name[LST_NAME_SIZE + 1];
int rc;
if (!args->lstio_ses_idp || /* address for output sid */
!args->lstio_ses_key || /* no key is specified */
!args->lstio_ses_namep || /* session name */
args->lstio_ses_nmlen <= 0 ||
args->lstio_ses_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (copy_from_user(name, args->lstio_ses_namep,
args->lstio_ses_nmlen)) {
return -EFAULT;
}
name[args->lstio_ses_nmlen] = 0;
rc = lstcon_session_new(name,
args->lstio_ses_key,
args->lstio_ses_feats,
args->lstio_ses_timeout,
args->lstio_ses_force,
args->lstio_ses_idp);
return rc;
}
static int
lst_session_end_ioctl(struct lstio_session_end_args *args)
{
if (args->lstio_ses_key != console_session.ses_key)
return -EACCES;
return lstcon_session_end();
}
static int
lst_session_info_ioctl(struct lstio_session_info_args *args)
{
/* no checking of key */
if (!args->lstio_ses_idp || /* address for output sid */
!args->lstio_ses_keyp || /* address for output key */
!args->lstio_ses_featp || /* address for output features */
!args->lstio_ses_ndinfo || /* address for output ndinfo */
!args->lstio_ses_namep || /* address for output name */
args->lstio_ses_nmlen <= 0 ||
args->lstio_ses_nmlen > LST_NAME_SIZE)
return -EINVAL;
return lstcon_session_info(args->lstio_ses_idp,
args->lstio_ses_keyp,
args->lstio_ses_featp,
args->lstio_ses_ndinfo,
args->lstio_ses_namep,
args->lstio_ses_nmlen);
}
static int
lst_debug_ioctl(struct lstio_debug_args *args)
{
char name[LST_NAME_SIZE + 1];
int client = 1;
int rc;
if (args->lstio_dbg_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_dbg_resultp)
return -EINVAL;
if (args->lstio_dbg_namep && /* name of batch/group */
(args->lstio_dbg_nmlen <= 0 ||
args->lstio_dbg_nmlen > LST_NAME_SIZE))
return -EINVAL;
if (args->lstio_dbg_namep) {
if (copy_from_user(name, args->lstio_dbg_namep,
args->lstio_dbg_nmlen))
return -EFAULT;
name[args->lstio_dbg_nmlen] = 0;
}
rc = -EINVAL;
switch (args->lstio_dbg_type) {
case LST_OPC_SESSION:
rc = lstcon_session_debug(args->lstio_dbg_timeout,
args->lstio_dbg_resultp);
break;
case LST_OPC_BATCHSRV:
client = 0;
/* fall through */
case LST_OPC_BATCHCLI:
if (!args->lstio_dbg_namep)
goto out;
rc = lstcon_batch_debug(args->lstio_dbg_timeout,
name, client, args->lstio_dbg_resultp);
break;
case LST_OPC_GROUP:
if (!args->lstio_dbg_namep)
goto out;
rc = lstcon_group_debug(args->lstio_dbg_timeout,
name, args->lstio_dbg_resultp);
break;
case LST_OPC_NODES:
if (args->lstio_dbg_count <= 0 ||
!args->lstio_dbg_idsp)
goto out;
rc = lstcon_nodes_debug(args->lstio_dbg_timeout,
args->lstio_dbg_count,
args->lstio_dbg_idsp,
args->lstio_dbg_resultp);
break;
default:
break;
}
out:
return rc;
}
static int
lst_group_add_ioctl(struct lstio_group_add_args *args)
{
char name[LST_NAME_SIZE + 1];
int rc;
if (args->lstio_grp_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_grp_namep ||
args->lstio_grp_nmlen <= 0 ||
args->lstio_grp_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (copy_from_user(name, args->lstio_grp_namep,
args->lstio_grp_nmlen))
return -EFAULT;
name[args->lstio_grp_nmlen] = 0;
rc = lstcon_group_add(name);
return rc;
}
static int
lst_group_del_ioctl(struct lstio_group_del_args *args)
{
int rc;
char name[LST_NAME_SIZE + 1];
if (args->lstio_grp_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_grp_namep ||
args->lstio_grp_nmlen <= 0 ||
args->lstio_grp_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (copy_from_user(name, args->lstio_grp_namep,
args->lstio_grp_nmlen))
return -EFAULT;
name[args->lstio_grp_nmlen] = 0;
rc = lstcon_group_del(name);
return rc;
}
static int
lst_group_update_ioctl(struct lstio_group_update_args *args)
{
int rc;
char name[LST_NAME_SIZE + 1];
if (args->lstio_grp_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_grp_resultp ||
!args->lstio_grp_namep ||
args->lstio_grp_nmlen <= 0 ||
args->lstio_grp_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (copy_from_user(name, args->lstio_grp_namep,
args->lstio_grp_nmlen))
return -EFAULT;
name[args->lstio_grp_nmlen] = 0;
switch (args->lstio_grp_opc) {
case LST_GROUP_CLEAN:
rc = lstcon_group_clean(name, args->lstio_grp_args);
break;
case LST_GROUP_REFRESH:
rc = lstcon_group_refresh(name, args->lstio_grp_resultp);
break;
case LST_GROUP_RMND:
if (args->lstio_grp_count <= 0 ||
!args->lstio_grp_idsp) {
rc = -EINVAL;
break;
}
rc = lstcon_nodes_remove(name, args->lstio_grp_count,
args->lstio_grp_idsp,
args->lstio_grp_resultp);
break;
default:
rc = -EINVAL;
break;
}
return rc;
}
static int
lst_nodes_add_ioctl(struct lstio_group_nodes_args *args)
{
unsigned int feats;
int rc;
char name[LST_NAME_SIZE + 1];
if (args->lstio_grp_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_grp_idsp || /* array of ids */
args->lstio_grp_count <= 0 ||
!args->lstio_grp_resultp ||
!args->lstio_grp_featp ||
!args->lstio_grp_namep ||
args->lstio_grp_nmlen <= 0 ||
args->lstio_grp_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (copy_from_user(name, args->lstio_grp_namep,
args->lstio_grp_nmlen))
return -EFAULT;
name[args->lstio_grp_nmlen] = 0;
rc = lstcon_nodes_add(name, args->lstio_grp_count,
args->lstio_grp_idsp, &feats,
args->lstio_grp_resultp);
if (!rc &&
copy_to_user(args->lstio_grp_featp, &feats, sizeof(feats))) {
return -EINVAL;
}
return rc;
}
static int
lst_group_list_ioctl(struct lstio_group_list_args *args)
{
if (args->lstio_grp_key != console_session.ses_key)
return -EACCES;
if (args->lstio_grp_idx < 0 ||
!args->lstio_grp_namep ||
args->lstio_grp_nmlen <= 0 ||
args->lstio_grp_nmlen > LST_NAME_SIZE)
return -EINVAL;
return lstcon_group_list(args->lstio_grp_idx,
args->lstio_grp_nmlen,
args->lstio_grp_namep);
}
static int
lst_group_info_ioctl(struct lstio_group_info_args *args)
{
char name[LST_NAME_SIZE + 1];
int ndent;
int index;
int rc;
if (args->lstio_grp_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_grp_namep ||
args->lstio_grp_nmlen <= 0 ||
args->lstio_grp_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (!args->lstio_grp_entp && /* output: group entry */
!args->lstio_grp_dentsp) /* output: node entry */
return -EINVAL;
if (args->lstio_grp_dentsp) { /* have node entry */
if (!args->lstio_grp_idxp || /* node index */
!args->lstio_grp_ndentp) /* # of node entry */
return -EINVAL;
if (copy_from_user(&ndent, args->lstio_grp_ndentp,
sizeof(ndent)) ||
copy_from_user(&index, args->lstio_grp_idxp,
sizeof(index)))
return -EFAULT;
if (ndent <= 0 || index < 0)
return -EINVAL;
}
if (copy_from_user(name, args->lstio_grp_namep,
args->lstio_grp_nmlen))
return -EFAULT;
name[args->lstio_grp_nmlen] = 0;
rc = lstcon_group_info(name, args->lstio_grp_entp,
&index, &ndent, args->lstio_grp_dentsp);
if (rc)
return rc;
if (args->lstio_grp_dentsp &&
(copy_to_user(args->lstio_grp_idxp, &index, sizeof(index)) ||
copy_to_user(args->lstio_grp_ndentp, &ndent, sizeof(ndent))))
return -EFAULT;
return 0;
}
static int
lst_batch_add_ioctl(struct lstio_batch_add_args *args)
{
int rc;
char name[LST_NAME_SIZE + 1];
if (args->lstio_bat_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_bat_namep ||
args->lstio_bat_nmlen <= 0 ||
args->lstio_bat_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (copy_from_user(name, args->lstio_bat_namep,
args->lstio_bat_nmlen))
return -EFAULT;
name[args->lstio_bat_nmlen] = 0;
rc = lstcon_batch_add(name);
return rc;
}
static int
lst_batch_run_ioctl(struct lstio_batch_run_args *args)
{
int rc;
char name[LST_NAME_SIZE + 1];
if (args->lstio_bat_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_bat_namep ||
args->lstio_bat_nmlen <= 0 ||
args->lstio_bat_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (copy_from_user(name, args->lstio_bat_namep,
args->lstio_bat_nmlen))
return -EFAULT;
name[args->lstio_bat_nmlen] = 0;
rc = lstcon_batch_run(name, args->lstio_bat_timeout,
args->lstio_bat_resultp);
return rc;
}
static int
lst_batch_stop_ioctl(struct lstio_batch_stop_args *args)
{
int rc;
char name[LST_NAME_SIZE + 1];
if (args->lstio_bat_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_bat_resultp ||
!args->lstio_bat_namep ||
args->lstio_bat_nmlen <= 0 ||
args->lstio_bat_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (copy_from_user(name, args->lstio_bat_namep,
args->lstio_bat_nmlen))
return -EFAULT;
name[args->lstio_bat_nmlen] = 0;
rc = lstcon_batch_stop(name, args->lstio_bat_force,
args->lstio_bat_resultp);
return rc;
}
static int
lst_batch_query_ioctl(struct lstio_batch_query_args *args)
{
char name[LST_NAME_SIZE + 1];
int rc;
if (args->lstio_bat_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_bat_resultp ||
!args->lstio_bat_namep ||
args->lstio_bat_nmlen <= 0 ||
args->lstio_bat_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (args->lstio_bat_testidx < 0)
return -EINVAL;
if (copy_from_user(name, args->lstio_bat_namep,
args->lstio_bat_nmlen))
return -EFAULT;
name[args->lstio_bat_nmlen] = 0;
rc = lstcon_test_batch_query(name,
args->lstio_bat_testidx,
args->lstio_bat_client,
args->lstio_bat_timeout,
args->lstio_bat_resultp);
return rc;
}
static int
lst_batch_list_ioctl(struct lstio_batch_list_args *args)
{
if (args->lstio_bat_key != console_session.ses_key)
return -EACCES;
if (args->lstio_bat_idx < 0 ||
!args->lstio_bat_namep ||
args->lstio_bat_nmlen <= 0 ||
args->lstio_bat_nmlen > LST_NAME_SIZE)
return -EINVAL;
return lstcon_batch_list(args->lstio_bat_idx,
args->lstio_bat_nmlen,
args->lstio_bat_namep);
}
static int
lst_batch_info_ioctl(struct lstio_batch_info_args *args)
{
char name[LST_NAME_SIZE + 1];
int rc;
int index;
int ndent;
if (args->lstio_bat_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_bat_namep || /* batch name */
args->lstio_bat_nmlen <= 0 ||
args->lstio_bat_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (!args->lstio_bat_entp && /* output: batch entry */
!args->lstio_bat_dentsp) /* output: node entry */
return -EINVAL;
if (args->lstio_bat_dentsp) { /* have node entry */
if (!args->lstio_bat_idxp || /* node index */
!args->lstio_bat_ndentp) /* # of node entry */
return -EINVAL;
if (copy_from_user(&index, args->lstio_bat_idxp,
sizeof(index)) ||
copy_from_user(&ndent, args->lstio_bat_ndentp,
sizeof(ndent)))
return -EFAULT;
if (ndent <= 0 || index < 0)
return -EINVAL;
}
if (copy_from_user(name, args->lstio_bat_namep,
args->lstio_bat_nmlen))
return -EFAULT;
name[args->lstio_bat_nmlen] = 0;
rc = lstcon_batch_info(name, args->lstio_bat_entp,
args->lstio_bat_server, args->lstio_bat_testidx,
&index, &ndent, args->lstio_bat_dentsp);
if (rc)
return rc;
if (args->lstio_bat_dentsp &&
(copy_to_user(args->lstio_bat_idxp, &index, sizeof(index)) ||
copy_to_user(args->lstio_bat_ndentp, &ndent, sizeof(ndent))))
rc = -EFAULT;
return rc;
}
static int
lst_stat_query_ioctl(struct lstio_stat_args *args)
{
int rc;
char name[LST_NAME_SIZE + 1];
/* TODO: not finished */
if (args->lstio_sta_key != console_session.ses_key)
return -EACCES;
if (!args->lstio_sta_resultp)
return -EINVAL;
if (args->lstio_sta_idsp) {
if (args->lstio_sta_count <= 0)
return -EINVAL;
rc = lstcon_nodes_stat(args->lstio_sta_count,
args->lstio_sta_idsp,
args->lstio_sta_timeout,
args->lstio_sta_resultp);
} else if (args->lstio_sta_namep) {
if (args->lstio_sta_nmlen <= 0 ||
args->lstio_sta_nmlen > LST_NAME_SIZE)
return -EINVAL;
rc = copy_from_user(name, args->lstio_sta_namep,
args->lstio_sta_nmlen);
if (!rc)
rc = lstcon_group_stat(name, args->lstio_sta_timeout,
args->lstio_sta_resultp);
else
rc = -EFAULT;
} else {
rc = -EINVAL;
}
return rc;
}
static int lst_test_add_ioctl(struct lstio_test_args *args)
{
char batch_name[LST_NAME_SIZE + 1];
char src_name[LST_NAME_SIZE + 1];
char dst_name[LST_NAME_SIZE + 1];
void *param = NULL;
int ret = 0;
int rc = -ENOMEM;
if (!args->lstio_tes_resultp ||
!args->lstio_tes_retp ||
!args->lstio_tes_bat_name || /* no specified batch */
args->lstio_tes_bat_nmlen <= 0 ||
args->lstio_tes_bat_nmlen > LST_NAME_SIZE ||
!args->lstio_tes_sgrp_name || /* no source group */
args->lstio_tes_sgrp_nmlen <= 0 ||
args->lstio_tes_sgrp_nmlen > LST_NAME_SIZE ||
!args->lstio_tes_dgrp_name || /* no target group */
args->lstio_tes_dgrp_nmlen <= 0 ||
args->lstio_tes_dgrp_nmlen > LST_NAME_SIZE)
return -EINVAL;
if (!args->lstio_tes_loop || /* negative is infinite */
args->lstio_tes_concur <= 0 ||
args->lstio_tes_dist <= 0 ||
args->lstio_tes_span <= 0)
return -EINVAL;
/* have parameter, check if parameter length is valid */
if (args->lstio_tes_param &&
(args->lstio_tes_param_len <= 0 ||
args->lstio_tes_param_len >
PAGE_SIZE - sizeof(struct lstcon_test)))
return -EINVAL;
/* Enforce zero parameter length if there's no parameter */
if (!args->lstio_tes_param && args->lstio_tes_param_len)
return -EINVAL;
if (args->lstio_tes_param) {
param = memdup_user(args->lstio_tes_param,
args->lstio_tes_param_len);
if (IS_ERR(param))
return PTR_ERR(param);
}
rc = -EFAULT;
if (copy_from_user(batch_name, args->lstio_tes_bat_name,
args->lstio_tes_bat_nmlen) ||
copy_from_user(src_name, args->lstio_tes_sgrp_name,
args->lstio_tes_sgrp_nmlen) ||
copy_from_user(dst_name, args->lstio_tes_dgrp_name,
args->lstio_tes_dgrp_nmlen))
goto out;
rc = lstcon_test_add(batch_name, args->lstio_tes_type,
args->lstio_tes_loop, args->lstio_tes_concur,
args->lstio_tes_dist, args->lstio_tes_span,
src_name, dst_name, param,
args->lstio_tes_param_len,
&ret, args->lstio_tes_resultp);
if (!rc && ret)
rc = (copy_to_user(args->lstio_tes_retp, &ret,
sizeof(ret))) ? -EFAULT : 0;
out:
kfree(param);
return rc;
}
int
lstcon_ioctl_entry(struct notifier_block *nb,
unsigned long cmd, void *vdata)
{
struct libcfs_ioctl_hdr *hdr = vdata;
char *buf = NULL;
struct libcfs_ioctl_data *data;
int opc;
int rc = -EINVAL;
if (cmd != IOC_LIBCFS_LNETST)
goto err;
data = container_of(hdr, struct libcfs_ioctl_data, ioc_hdr);
opc = data->ioc_u32[0];
if (data->ioc_plen1 > PAGE_SIZE)
goto err;
buf = kmalloc(data->ioc_plen1, GFP_KERNEL);
rc = -ENOMEM;
if (!buf)
goto err;
/* copy in parameter */
rc = -EFAULT;
if (copy_from_user(buf, data->ioc_pbuf1, data->ioc_plen1))
goto err;
mutex_lock(&console_session.ses_mutex);
console_session.ses_laststamp = ktime_get_real_seconds();
if (console_session.ses_shutdown) {
rc = -ESHUTDOWN;
goto out;
}
if (console_session.ses_expired)
lstcon_session_end();
if (opc != LSTIO_SESSION_NEW &&
console_session.ses_state == LST_SESSION_NONE) {
CDEBUG(D_NET, "LST no active session\n");
rc = -ESRCH;
goto out;
}
memset(&console_session.ses_trans_stat, 0, sizeof(struct lstcon_trans_stat));
switch (opc) {
case LSTIO_SESSION_NEW:
rc = lst_session_new_ioctl((struct lstio_session_new_args *)buf);
break;
case LSTIO_SESSION_END:
rc = lst_session_end_ioctl((struct lstio_session_end_args *)buf);
break;
case LSTIO_SESSION_INFO:
rc = lst_session_info_ioctl((struct lstio_session_info_args *)buf);
break;
case LSTIO_DEBUG:
rc = lst_debug_ioctl((struct lstio_debug_args *)buf);
break;
case LSTIO_GROUP_ADD:
rc = lst_group_add_ioctl((struct lstio_group_add_args *)buf);
break;
case LSTIO_GROUP_DEL:
rc = lst_group_del_ioctl((struct lstio_group_del_args *)buf);
break;
case LSTIO_GROUP_UPDATE:
rc = lst_group_update_ioctl((struct lstio_group_update_args *)buf);
break;
case LSTIO_NODES_ADD:
rc = lst_nodes_add_ioctl((struct lstio_group_nodes_args *)buf);
break;
case LSTIO_GROUP_LIST:
rc = lst_group_list_ioctl((struct lstio_group_list_args *)buf);
break;
case LSTIO_GROUP_INFO:
rc = lst_group_info_ioctl((struct lstio_group_info_args *)buf);
break;
case LSTIO_BATCH_ADD:
rc = lst_batch_add_ioctl((struct lstio_batch_add_args *)buf);
break;
case LSTIO_BATCH_START:
rc = lst_batch_run_ioctl((struct lstio_batch_run_args *)buf);
break;
case LSTIO_BATCH_STOP:
rc = lst_batch_stop_ioctl((struct lstio_batch_stop_args *)buf);
break;
case LSTIO_BATCH_QUERY:
rc = lst_batch_query_ioctl((struct lstio_batch_query_args *)buf);
break;
case LSTIO_BATCH_LIST:
rc = lst_batch_list_ioctl((struct lstio_batch_list_args *)buf);
break;
case LSTIO_BATCH_INFO:
rc = lst_batch_info_ioctl((struct lstio_batch_info_args *)buf);
break;
case LSTIO_TEST_ADD:
rc = lst_test_add_ioctl((struct lstio_test_args *)buf);
break;
case LSTIO_STAT_QUERY:
rc = lst_stat_query_ioctl((struct lstio_stat_args *)buf);
break;
default:
rc = -EINVAL;
goto out;
}
if (copy_to_user(data->ioc_pbuf2, &console_session.ses_trans_stat,
sizeof(struct lstcon_trans_stat)))
rc = -EFAULT;
out:
mutex_unlock(&console_session.ses_mutex);
err:
kfree(buf);
return notifier_from_ioctl_errno(rc);
}

File diff suppressed because it is too large Load Diff

View File

@ -1,142 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* /lnet/selftest/conrpc.h
*
* Console rpc
*
* Author: Liang Zhen <liang@whamcloud.com>
*/
#ifndef __LST_CONRPC_H__
#define __LST_CONRPC_H__
#include <linux/lnet/lib-types.h>
#include <uapi/linux/lnet/lnetst.h>
#include "rpc.h"
#include "selftest.h"
/* Console rpc and rpc transaction */
#define LST_TRANS_TIMEOUT 30
#define LST_TRANS_MIN_TIMEOUT 3
#define LST_VALIDATE_TIMEOUT(t) min(max(t, LST_TRANS_MIN_TIMEOUT), LST_TRANS_TIMEOUT)
#define LST_PING_INTERVAL 8
struct lstcon_rpc_trans;
struct lstcon_tsb_hdr;
struct lstcon_test;
struct lstcon_node;
struct lstcon_rpc {
struct list_head crp_link; /* chain on rpc transaction */
struct srpc_client_rpc *crp_rpc; /* client rpc */
struct lstcon_node *crp_node; /* destination node */
struct lstcon_rpc_trans *crp_trans; /* conrpc transaction */
unsigned int crp_posted:1; /* rpc is posted */
unsigned int crp_finished:1; /* rpc is finished */
unsigned int crp_unpacked:1; /* reply is unpacked */
/** RPC is embedded in other structure and can't free it */
unsigned int crp_embedded:1;
int crp_status; /* console rpc errors */
unsigned long crp_stamp; /* replied time stamp */
};
struct lstcon_rpc_trans {
struct list_head tas_olink; /* link chain on owner list */
struct list_head tas_link; /* link chain on global list */
int tas_opc; /* operation code of transaction */
unsigned int tas_feats_updated; /* features mask is uptodate */
unsigned int tas_features; /* test features mask */
wait_queue_head_t tas_waitq; /* wait queue head */
atomic_t tas_remaining; /* # of un-scheduled rpcs */
struct list_head tas_rpcs_list; /* queued requests */
};
#define LST_TRANS_PRIVATE 0x1000
#define LST_TRANS_SESNEW (LST_TRANS_PRIVATE | 0x01)
#define LST_TRANS_SESEND (LST_TRANS_PRIVATE | 0x02)
#define LST_TRANS_SESQRY 0x03
#define LST_TRANS_SESPING 0x04
#define LST_TRANS_TSBCLIADD (LST_TRANS_PRIVATE | 0x11)
#define LST_TRANS_TSBSRVADD (LST_TRANS_PRIVATE | 0x12)
#define LST_TRANS_TSBRUN (LST_TRANS_PRIVATE | 0x13)
#define LST_TRANS_TSBSTOP (LST_TRANS_PRIVATE | 0x14)
#define LST_TRANS_TSBCLIQRY 0x15
#define LST_TRANS_TSBSRVQRY 0x16
#define LST_TRANS_STATQRY 0x21
typedef int (*lstcon_rpc_cond_func_t)(int, struct lstcon_node *, void *);
typedef int (*lstcon_rpc_readent_func_t)(int, struct srpc_msg *,
struct lstcon_rpc_ent __user *);
int lstcon_sesrpc_prep(struct lstcon_node *nd, int transop,
unsigned int version, struct lstcon_rpc **crpc);
int lstcon_dbgrpc_prep(struct lstcon_node *nd,
unsigned int version, struct lstcon_rpc **crpc);
int lstcon_batrpc_prep(struct lstcon_node *nd, int transop,
unsigned int version, struct lstcon_tsb_hdr *tsb,
struct lstcon_rpc **crpc);
int lstcon_testrpc_prep(struct lstcon_node *nd, int transop,
unsigned int version, struct lstcon_test *test,
struct lstcon_rpc **crpc);
int lstcon_statrpc_prep(struct lstcon_node *nd, unsigned int version,
struct lstcon_rpc **crpc);
void lstcon_rpc_put(struct lstcon_rpc *crpc);
int lstcon_rpc_trans_prep(struct list_head *translist,
int transop, struct lstcon_rpc_trans **transpp);
int lstcon_rpc_trans_ndlist(struct list_head *ndlist,
struct list_head *translist, int transop,
void *arg, lstcon_rpc_cond_func_t condition,
struct lstcon_rpc_trans **transpp);
void lstcon_rpc_trans_stat(struct lstcon_rpc_trans *trans,
struct lstcon_trans_stat *stat);
int lstcon_rpc_trans_interpreter(struct lstcon_rpc_trans *trans,
struct list_head __user *head_up,
lstcon_rpc_readent_func_t readent);
void lstcon_rpc_trans_abort(struct lstcon_rpc_trans *trans, int error);
void lstcon_rpc_trans_destroy(struct lstcon_rpc_trans *trans);
void lstcon_rpc_trans_addreq(struct lstcon_rpc_trans *trans,
struct lstcon_rpc *req);
int lstcon_rpc_trans_postwait(struct lstcon_rpc_trans *trans, int timeout);
int lstcon_rpc_pinger_start(void);
void lstcon_rpc_pinger_stop(void);
void lstcon_rpc_cleanup_wait(void);
int lstcon_rpc_module_init(void);
void lstcon_rpc_module_fini(void);
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,244 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/selftest/console.h
*
* kernel structure for LST console
*
* Author: Liang Zhen <liangzhen@clusterfs.com>
*/
#ifndef __LST_CONSOLE_H__
#define __LST_CONSOLE_H__
#include <linux/lnet/lib-types.h>
#include <uapi/linux/lnet/lnetst.h>
#include "selftest.h"
#include "conrpc.h"
/* node descriptor */
struct lstcon_node {
struct lnet_process_id nd_id; /* id of the node */
int nd_ref; /* reference count */
int nd_state; /* state of the node */
int nd_timeout; /* session timeout */
unsigned long nd_stamp; /* timestamp of last replied RPC */
struct lstcon_rpc nd_ping; /* ping rpc */
};
/* node link descriptor */
struct lstcon_ndlink {
struct list_head ndl_link; /* chain on list */
struct list_head ndl_hlink; /* chain on hash */
struct lstcon_node *ndl_node; /* pointer to node */
};
/* (alias of nodes) group descriptor */
struct lstcon_group {
struct list_head grp_link; /* chain on global group list
*/
int grp_ref; /* reference count */
int grp_userland; /* has userland nodes */
int grp_nnode; /* # of nodes */
char grp_name[LST_NAME_SIZE]; /* group name */
struct list_head grp_trans_list; /* transaction list */
struct list_head grp_ndl_list; /* nodes list */
struct list_head grp_ndl_hash[0]; /* hash table for nodes */
};
#define LST_BATCH_IDLE 0xB0 /* idle batch */
#define LST_BATCH_RUNNING 0xB1 /* running batch */
struct lstcon_tsb_hdr {
struct lst_bid tsb_id; /* batch ID */
int tsb_index; /* test index */
};
/* (tests ) batch descriptor */
struct lstcon_batch {
struct lstcon_tsb_hdr bat_hdr; /* test_batch header */
struct list_head bat_link; /* chain on session's batches list */
int bat_ntest; /* # of test */
int bat_state; /* state of the batch */
int bat_arg; /* parameter for run|stop, timeout
* for run, force for stop
*/
char bat_name[LST_NAME_SIZE];/* name of batch */
struct list_head bat_test_list; /* list head of tests (struct lstcon_test)
*/
struct list_head bat_trans_list; /* list head of transaction */
struct list_head bat_cli_list; /* list head of client nodes
* (struct lstcon_node)
*/
struct list_head *bat_cli_hash; /* hash table of client nodes */
struct list_head bat_srv_list; /* list head of server nodes */
struct list_head *bat_srv_hash; /* hash table of server nodes */
};
/* a single test descriptor */
struct lstcon_test {
struct lstcon_tsb_hdr tes_hdr; /* test batch header */
struct list_head tes_link; /* chain on batch's tests list */
struct lstcon_batch *tes_batch; /* pointer to batch */
int tes_type; /* type of the test, i.e: bulk, ping */
int tes_stop_onerr; /* stop on error */
int tes_oneside; /* one-sided test */
int tes_concur; /* concurrency */
int tes_loop; /* loop count */
int tes_dist; /* nodes distribution of target group */
int tes_span; /* nodes span of target group */
int tes_cliidx; /* client index, used for RPC creating */
struct list_head tes_trans_list; /* transaction list */
struct lstcon_group *tes_src_grp; /* group run the test */
struct lstcon_group *tes_dst_grp; /* target group */
int tes_paramlen; /* test parameter length */
char tes_param[0]; /* test parameter */
};
#define LST_GLOBAL_HASHSIZE 503 /* global nodes hash table size */
#define LST_NODE_HASHSIZE 239 /* node hash table (for batch or group) */
#define LST_SESSION_NONE 0x0 /* no session */
#define LST_SESSION_ACTIVE 0x1 /* working session */
#define LST_CONSOLE_TIMEOUT 300 /* default console timeout */
struct lstcon_session {
struct mutex ses_mutex; /* only 1 thread in session */
struct lst_sid ses_id; /* global session id */
int ses_key; /* local session key */
int ses_state; /* state of session */
int ses_timeout; /* timeout in seconds */
time64_t ses_laststamp; /* last operation stamp (seconds)
*/
unsigned int ses_features; /* tests features of the session
*/
unsigned int ses_feats_updated:1; /* features are synced with
* remote test nodes
*/
unsigned int ses_force:1; /* force creating */
unsigned int ses_shutdown:1; /* session is shutting down */
unsigned int ses_expired:1; /* console is timedout */
__u64 ses_id_cookie; /* batch id cookie */
char ses_name[LST_NAME_SIZE];/* session name */
struct lstcon_rpc_trans *ses_ping; /* session pinger */
struct stt_timer ses_ping_timer; /* timer for pinger */
struct lstcon_trans_stat ses_trans_stat; /* transaction stats */
struct list_head ses_trans_list; /* global list of transaction */
struct list_head ses_grp_list; /* global list of groups */
struct list_head ses_bat_list; /* global list of batches */
struct list_head ses_ndl_list; /* global list of nodes */
struct list_head *ses_ndl_hash; /* hash table of nodes */
spinlock_t ses_rpc_lock; /* serialize */
atomic_t ses_rpc_counter; /* # of initialized RPCs */
struct list_head ses_rpc_freelist; /* idle console rpc */
}; /* session descriptor */
extern struct lstcon_session console_session;
static inline struct lstcon_trans_stat *
lstcon_trans_stat(void)
{
return &console_session.ses_trans_stat;
}
static inline struct list_head *
lstcon_id2hash(struct lnet_process_id id, struct list_head *hash)
{
unsigned int idx = LNET_NIDADDR(id.nid) % LST_NODE_HASHSIZE;
return &hash[idx];
}
int lstcon_ioctl_entry(struct notifier_block *nb,
unsigned long cmd, void *vdata);
int lstcon_console_init(void);
int lstcon_console_fini(void);
int lstcon_session_match(struct lst_sid sid);
int lstcon_session_new(char *name, int key, unsigned int version,
int timeout, int flags, struct lst_sid __user *sid_up);
int lstcon_session_info(struct lst_sid __user *sid_up, int __user *key,
unsigned __user *verp, struct lstcon_ndlist_ent __user *entp,
char __user *name_up, int len);
int lstcon_session_end(void);
int lstcon_session_debug(int timeout, struct list_head __user *result_up);
int lstcon_session_feats_check(unsigned int feats);
int lstcon_batch_debug(int timeout, char *name,
int client, struct list_head __user *result_up);
int lstcon_group_debug(int timeout, char *name,
struct list_head __user *result_up);
int lstcon_nodes_debug(int timeout, int nnd,
struct lnet_process_id __user *nds_up,
struct list_head __user *result_up);
int lstcon_group_add(char *name);
int lstcon_group_del(char *name);
int lstcon_group_clean(char *name, int args);
int lstcon_group_refresh(char *name, struct list_head __user *result_up);
int lstcon_nodes_add(char *name, int nnd, struct lnet_process_id __user *nds_up,
unsigned int *featp, struct list_head __user *result_up);
int lstcon_nodes_remove(char *name, int nnd,
struct lnet_process_id __user *nds_up,
struct list_head __user *result_up);
int lstcon_group_info(char *name, struct lstcon_ndlist_ent __user *gent_up,
int *index_p, int *ndent_p,
struct lstcon_node_ent __user *ndents_up);
int lstcon_group_list(int idx, int len, char __user *name_up);
int lstcon_batch_add(char *name);
int lstcon_batch_run(char *name, int timeout,
struct list_head __user *result_up);
int lstcon_batch_stop(char *name, int force,
struct list_head __user *result_up);
int lstcon_test_batch_query(char *name, int testidx,
int client, int timeout,
struct list_head __user *result_up);
int lstcon_batch_del(char *name);
int lstcon_batch_list(int idx, int namelen, char __user *name_up);
int lstcon_batch_info(char *name, struct lstcon_test_batch_ent __user *ent_up,
int server, int testidx, int *index_p,
int *ndent_p, struct lstcon_node_ent __user *dents_up);
int lstcon_group_stat(char *grp_name, int timeout,
struct list_head __user *result_up);
int lstcon_nodes_stat(int count, struct lnet_process_id __user *ids_up,
int timeout, struct list_head __user *result_up);
int lstcon_test_add(char *batch_name, int type, int loop,
int concur, int dist, int span,
char *src_name, char *dst_name,
void *param, int paramlen, int *retp,
struct list_head __user *result_up);
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,169 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#define DEBUG_SUBSYSTEM S_LNET
#include "selftest.h"
#include "console.h"
enum {
LST_INIT_NONE = 0,
LST_INIT_WI_SERIAL,
LST_INIT_WI_TEST,
LST_INIT_RPC,
LST_INIT_FW,
LST_INIT_CONSOLE
};
static int lst_init_step = LST_INIT_NONE;
struct workqueue_struct *lst_serial_wq;
struct workqueue_struct **lst_test_wq;
static void
lnet_selftest_exit(void)
{
int i;
switch (lst_init_step) {
case LST_INIT_CONSOLE:
lstcon_console_fini();
/* fall through */
case LST_INIT_FW:
sfw_shutdown();
/* fall through */
case LST_INIT_RPC:
srpc_shutdown();
/* fall through */
case LST_INIT_WI_TEST:
for (i = 0;
i < cfs_cpt_number(lnet_cpt_table()); i++) {
if (!lst_test_wq[i])
continue;
destroy_workqueue(lst_test_wq[i]);
}
kvfree(lst_test_wq);
lst_test_wq = NULL;
/* fall through */
case LST_INIT_WI_SERIAL:
destroy_workqueue(lst_serial_wq);
lst_serial_wq = NULL;
case LST_INIT_NONE:
break;
default:
LBUG();
}
}
static int
lnet_selftest_init(void)
{
int nscheds;
int rc;
int i;
rc = libcfs_setup();
if (rc)
return rc;
lst_serial_wq = alloc_ordered_workqueue("lst_s", 0);
if (!lst_serial_wq) {
CERROR("Failed to create serial WI scheduler for LST\n");
return -ENOMEM;
}
lst_init_step = LST_INIT_WI_SERIAL;
nscheds = cfs_cpt_number(lnet_cpt_table());
lst_test_wq = kvmalloc_array(nscheds, sizeof(lst_test_wq[0]),
GFP_KERNEL | __GFP_ZERO);
if (!lst_test_wq) {
rc = -ENOMEM;
goto error;
}
lst_init_step = LST_INIT_WI_TEST;
for (i = 0; i < nscheds; i++) {
int nthrs = cfs_cpt_weight(lnet_cpt_table(), i);
struct workqueue_attrs attrs = {0};
cpumask_var_t *mask = cfs_cpt_cpumask(lnet_cpt_table(), i);
/* reserve at least one CPU for LND */
nthrs = max(nthrs - 1, 1);
lst_test_wq[i] = alloc_workqueue("lst_t", WQ_UNBOUND, nthrs);
if (!lst_test_wq[i]) {
CWARN("Failed to create CPU partition affinity WI scheduler %d for LST\n",
i);
rc = -ENOMEM;
goto error;
}
if (mask && alloc_cpumask_var(&attrs.cpumask, GFP_KERNEL)) {
cpumask_copy(attrs.cpumask, *mask);
apply_workqueue_attrs(lst_test_wq[i], &attrs);
free_cpumask_var(attrs.cpumask);
}
}
rc = srpc_startup();
if (rc) {
CERROR("LST can't startup rpc\n");
goto error;
}
lst_init_step = LST_INIT_RPC;
rc = sfw_startup();
if (rc) {
CERROR("LST can't startup framework\n");
goto error;
}
lst_init_step = LST_INIT_FW;
rc = lstcon_console_init();
if (rc) {
CERROR("LST can't startup console\n");
goto error;
}
lst_init_step = LST_INIT_CONSOLE;
return 0;
error:
lnet_selftest_exit();
return rc;
}
MODULE_AUTHOR("OpenSFS, Inc. <http://www.lustre.org/>");
MODULE_DESCRIPTION("LNet Selftest");
MODULE_VERSION("2.7.0");
MODULE_LICENSE("GPL");
module_init(lnet_selftest_init);
module_exit(lnet_selftest_exit);

View File

@ -1,228 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/selftest/conctl.c
*
* Test client & Server
*
* Author: Liang Zhen <liangzhen@clusterfs.com>
*/
#include "selftest.h"
#define LST_PING_TEST_MAGIC 0xbabeface
static int ping_srv_workitems = SFW_TEST_WI_MAX;
module_param(ping_srv_workitems, int, 0644);
MODULE_PARM_DESC(ping_srv_workitems, "# PING server workitems");
struct lst_ping_data {
spinlock_t pnd_lock; /* serialize */
int pnd_counter; /* sequence counter */
};
static struct lst_ping_data lst_ping_data;
static int
ping_client_init(struct sfw_test_instance *tsi)
{
struct sfw_session *sn = tsi->tsi_batch->bat_session;
LASSERT(tsi->tsi_is_client);
LASSERT(sn && !(sn->sn_features & ~LST_FEATS_MASK));
spin_lock_init(&lst_ping_data.pnd_lock);
lst_ping_data.pnd_counter = 0;
return 0;
}
static void
ping_client_fini(struct sfw_test_instance *tsi)
{
struct sfw_session *sn = tsi->tsi_batch->bat_session;
int errors;
LASSERT(sn);
LASSERT(tsi->tsi_is_client);
errors = atomic_read(&sn->sn_ping_errors);
if (errors)
CWARN("%d pings have failed.\n", errors);
else
CDEBUG(D_NET, "Ping test finished OK.\n");
}
static int
ping_client_prep_rpc(struct sfw_test_unit *tsu, struct lnet_process_id dest,
struct srpc_client_rpc **rpc)
{
struct srpc_ping_reqst *req;
struct sfw_test_instance *tsi = tsu->tsu_instance;
struct sfw_session *sn = tsi->tsi_batch->bat_session;
struct timespec64 ts;
int rc;
LASSERT(sn);
LASSERT(!(sn->sn_features & ~LST_FEATS_MASK));
rc = sfw_create_test_rpc(tsu, dest, sn->sn_features, 0, 0, rpc);
if (rc)
return rc;
req = &(*rpc)->crpc_reqstmsg.msg_body.ping_reqst;
req->pnr_magic = LST_PING_TEST_MAGIC;
spin_lock(&lst_ping_data.pnd_lock);
req->pnr_seq = lst_ping_data.pnd_counter++;
spin_unlock(&lst_ping_data.pnd_lock);
ktime_get_real_ts64(&ts);
req->pnr_time_sec = ts.tv_sec;
req->pnr_time_usec = ts.tv_nsec / NSEC_PER_USEC;
return rc;
}
static void
ping_client_done_rpc(struct sfw_test_unit *tsu, struct srpc_client_rpc *rpc)
{
struct sfw_test_instance *tsi = tsu->tsu_instance;
struct sfw_session *sn = tsi->tsi_batch->bat_session;
struct srpc_ping_reqst *reqst = &rpc->crpc_reqstmsg.msg_body.ping_reqst;
struct srpc_ping_reply *reply = &rpc->crpc_replymsg.msg_body.ping_reply;
struct timespec64 ts;
LASSERT(sn);
if (rpc->crpc_status) {
if (!tsi->tsi_stopping) /* rpc could have been aborted */
atomic_inc(&sn->sn_ping_errors);
CERROR("Unable to ping %s (%d): %d\n",
libcfs_id2str(rpc->crpc_dest),
reqst->pnr_seq, rpc->crpc_status);
return;
}
if (rpc->crpc_replymsg.msg_magic != SRPC_MSG_MAGIC) {
__swab32s(&reply->pnr_seq);
__swab32s(&reply->pnr_magic);
__swab32s(&reply->pnr_status);
}
if (reply->pnr_magic != LST_PING_TEST_MAGIC) {
rpc->crpc_status = -EBADMSG;
atomic_inc(&sn->sn_ping_errors);
CERROR("Bad magic %u from %s, %u expected.\n",
reply->pnr_magic, libcfs_id2str(rpc->crpc_dest),
LST_PING_TEST_MAGIC);
return;
}
if (reply->pnr_seq != reqst->pnr_seq) {
rpc->crpc_status = -EBADMSG;
atomic_inc(&sn->sn_ping_errors);
CERROR("Bad seq %u from %s, %u expected.\n",
reply->pnr_seq, libcfs_id2str(rpc->crpc_dest),
reqst->pnr_seq);
return;
}
ktime_get_real_ts64(&ts);
CDEBUG(D_NET, "%d reply in %u usec\n", reply->pnr_seq,
(unsigned int)((ts.tv_sec - reqst->pnr_time_sec) * 1000000 +
(ts.tv_nsec / NSEC_PER_USEC - reqst->pnr_time_usec)));
}
static int
ping_server_handle(struct srpc_server_rpc *rpc)
{
struct srpc_service *sv = rpc->srpc_scd->scd_svc;
struct srpc_msg *reqstmsg = &rpc->srpc_reqstbuf->buf_msg;
struct srpc_msg *replymsg = &rpc->srpc_replymsg;
struct srpc_ping_reqst *req = &reqstmsg->msg_body.ping_reqst;
struct srpc_ping_reply *rep = &rpc->srpc_replymsg.msg_body.ping_reply;
LASSERT(sv->sv_id == SRPC_SERVICE_PING);
if (reqstmsg->msg_magic != SRPC_MSG_MAGIC) {
LASSERT(reqstmsg->msg_magic == __swab32(SRPC_MSG_MAGIC));
__swab32s(&req->pnr_seq);
__swab32s(&req->pnr_magic);
__swab64s(&req->pnr_time_sec);
__swab64s(&req->pnr_time_usec);
}
LASSERT(reqstmsg->msg_type == srpc_service2request(sv->sv_id));
if (req->pnr_magic != LST_PING_TEST_MAGIC) {
CERROR("Unexpected magic %08x from %s\n",
req->pnr_magic, libcfs_id2str(rpc->srpc_peer));
return -EINVAL;
}
rep->pnr_seq = req->pnr_seq;
rep->pnr_magic = LST_PING_TEST_MAGIC;
if (reqstmsg->msg_ses_feats & ~LST_FEATS_MASK) {
replymsg->msg_ses_feats = LST_FEATS_MASK;
rep->pnr_status = EPROTO;
return 0;
}
replymsg->msg_ses_feats = reqstmsg->msg_ses_feats;
CDEBUG(D_NET, "Get ping %d from %s\n",
req->pnr_seq, libcfs_id2str(rpc->srpc_peer));
return 0;
}
struct sfw_test_client_ops ping_test_client;
void ping_init_test_client(void)
{
ping_test_client.tso_init = ping_client_init;
ping_test_client.tso_fini = ping_client_fini;
ping_test_client.tso_prep_rpc = ping_client_prep_rpc;
ping_test_client.tso_done_rpc = ping_client_done_rpc;
}
struct srpc_service ping_test_service;
void ping_init_test_service(void)
{
ping_test_service.sv_id = SRPC_SERVICE_PING;
ping_test_service.sv_name = "ping_test";
ping_test_service.sv_handler = ping_server_handle;
ping_test_service.sv_wi_total = ping_srv_workitems;
}

File diff suppressed because it is too large Load Diff

View File

@ -1,295 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*/
#ifndef __SELFTEST_RPC_H__
#define __SELFTEST_RPC_H__
#include <uapi/linux/lnet/lnetst.h>
/*
* LST wired structures
*
* XXX: *REPLY == *REQST + 1
*/
enum srpc_msg_type {
SRPC_MSG_MKSN_REQST = 0,
SRPC_MSG_MKSN_REPLY = 1,
SRPC_MSG_RMSN_REQST = 2,
SRPC_MSG_RMSN_REPLY = 3,
SRPC_MSG_BATCH_REQST = 4,
SRPC_MSG_BATCH_REPLY = 5,
SRPC_MSG_STAT_REQST = 6,
SRPC_MSG_STAT_REPLY = 7,
SRPC_MSG_TEST_REQST = 8,
SRPC_MSG_TEST_REPLY = 9,
SRPC_MSG_DEBUG_REQST = 10,
SRPC_MSG_DEBUG_REPLY = 11,
SRPC_MSG_BRW_REQST = 12,
SRPC_MSG_BRW_REPLY = 13,
SRPC_MSG_PING_REQST = 14,
SRPC_MSG_PING_REPLY = 15,
SRPC_MSG_JOIN_REQST = 16,
SRPC_MSG_JOIN_REPLY = 17,
};
/* CAVEAT EMPTOR:
* All srpc_*_reqst_t's 1st field must be matchbits of reply buffer,
* and 2nd field matchbits of bulk buffer if any.
*
* All srpc_*_reply_t's 1st field must be a __u32 status, and 2nd field
* session id if needed.
*/
struct srpc_generic_reqst {
__u64 rpyid; /* reply buffer matchbits */
__u64 bulkid; /* bulk buffer matchbits */
} WIRE_ATTR;
struct srpc_generic_reply {
__u32 status;
struct lst_sid sid;
} WIRE_ATTR;
/* FRAMEWORK RPCs */
struct srpc_mksn_reqst {
__u64 mksn_rpyid; /* reply buffer matchbits */
struct lst_sid mksn_sid; /* session id */
__u32 mksn_force; /* use brute force */
char mksn_name[LST_NAME_SIZE];
} WIRE_ATTR; /* make session request */
struct srpc_mksn_reply {
__u32 mksn_status; /* session status */
struct lst_sid mksn_sid; /* session id */
__u32 mksn_timeout; /* session timeout */
char mksn_name[LST_NAME_SIZE];
} WIRE_ATTR; /* make session reply */
struct srpc_rmsn_reqst {
__u64 rmsn_rpyid; /* reply buffer matchbits */
struct lst_sid rmsn_sid; /* session id */
} WIRE_ATTR; /* remove session request */
struct srpc_rmsn_reply {
__u32 rmsn_status;
struct lst_sid rmsn_sid; /* session id */
} WIRE_ATTR; /* remove session reply */
struct srpc_join_reqst {
__u64 join_rpyid; /* reply buffer matchbits */
struct lst_sid join_sid; /* session id to join */
char join_group[LST_NAME_SIZE]; /* group name */
} WIRE_ATTR;
struct srpc_join_reply {
__u32 join_status; /* returned status */
struct lst_sid join_sid; /* session id */
__u32 join_timeout; /* # seconds' inactivity to
* expire
*/
char join_session[LST_NAME_SIZE]; /* session name */
} WIRE_ATTR;
struct srpc_debug_reqst {
__u64 dbg_rpyid; /* reply buffer matchbits */
struct lst_sid dbg_sid; /* session id */
__u32 dbg_flags; /* bitmap of debug */
} WIRE_ATTR;
struct srpc_debug_reply {
__u32 dbg_status; /* returned code */
struct lst_sid dbg_sid; /* session id */
__u32 dbg_timeout; /* session timeout */
__u32 dbg_nbatch; /* # of batches in the node */
char dbg_name[LST_NAME_SIZE]; /* session name */
} WIRE_ATTR;
#define SRPC_BATCH_OPC_RUN 1
#define SRPC_BATCH_OPC_STOP 2
#define SRPC_BATCH_OPC_QUERY 3
struct srpc_batch_reqst {
__u64 bar_rpyid; /* reply buffer matchbits */
struct lst_sid bar_sid; /* session id */
struct lst_bid bar_bid; /* batch id */
__u32 bar_opc; /* create/start/stop batch */
__u32 bar_testidx; /* index of test */
__u32 bar_arg; /* parameters */
} WIRE_ATTR;
struct srpc_batch_reply {
__u32 bar_status; /* status of request */
struct lst_sid bar_sid; /* session id */
__u32 bar_active; /* # of active tests in batch/test */
__u32 bar_time; /* remained time */
} WIRE_ATTR;
struct srpc_stat_reqst {
__u64 str_rpyid; /* reply buffer matchbits */
struct lst_sid str_sid; /* session id */
__u32 str_type; /* type of stat */
} WIRE_ATTR;
struct srpc_stat_reply {
__u32 str_status;
struct lst_sid str_sid;
struct sfw_counters str_fw;
struct srpc_counters str_rpc;
struct lnet_counters str_lnet;
} WIRE_ATTR;
struct test_bulk_req {
__u32 blk_opc; /* bulk operation code */
__u32 blk_npg; /* # of pages */
__u32 blk_flags; /* reserved flags */
} WIRE_ATTR;
struct test_bulk_req_v1 {
__u16 blk_opc; /* bulk operation code */
__u16 blk_flags; /* data check flags */
__u32 blk_len; /* data length */
__u32 blk_offset; /* offset */
} WIRE_ATTR;
struct test_ping_req {
__u32 png_size; /* size of ping message */
__u32 png_flags; /* reserved flags */
} WIRE_ATTR;
struct srpc_test_reqst {
__u64 tsr_rpyid; /* reply buffer matchbits */
__u64 tsr_bulkid; /* bulk buffer matchbits */
struct lst_sid tsr_sid; /* session id */
struct lst_bid tsr_bid; /* batch id */
__u32 tsr_service; /* test type: bulk|ping|... */
__u32 tsr_loop; /* test client loop count or
* # server buffers needed
*/
__u32 tsr_concur; /* concurrency of test */
__u8 tsr_is_client; /* is test client or not */
__u8 tsr_stop_onerr; /* stop on error */
__u32 tsr_ndest; /* # of dest nodes */
union {
struct test_ping_req ping;
struct test_bulk_req bulk_v0;
struct test_bulk_req_v1 bulk_v1;
} tsr_u;
} WIRE_ATTR;
struct srpc_test_reply {
__u32 tsr_status; /* returned code */
struct lst_sid tsr_sid;
} WIRE_ATTR;
/* TEST RPCs */
struct srpc_ping_reqst {
__u64 pnr_rpyid;
__u32 pnr_magic;
__u32 pnr_seq;
__u64 pnr_time_sec;
__u64 pnr_time_usec;
} WIRE_ATTR;
struct srpc_ping_reply {
__u32 pnr_status;
__u32 pnr_magic;
__u32 pnr_seq;
} WIRE_ATTR;
struct srpc_brw_reqst {
__u64 brw_rpyid; /* reply buffer matchbits */
__u64 brw_bulkid; /* bulk buffer matchbits */
__u32 brw_rw; /* read or write */
__u32 brw_len; /* bulk data len */
__u32 brw_flags; /* bulk data patterns */
} WIRE_ATTR; /* bulk r/w request */
struct srpc_brw_reply {
__u32 brw_status;
} WIRE_ATTR; /* bulk r/w reply */
#define SRPC_MSG_MAGIC 0xeeb0f00d
#define SRPC_MSG_VERSION 1
struct srpc_msg {
__u32 msg_magic; /* magic number */
__u32 msg_version; /* message version number */
__u32 msg_type; /* type of message body: srpc_msg_type */
__u32 msg_reserved0;
__u32 msg_reserved1;
__u32 msg_ses_feats; /* test session features */
union {
struct srpc_generic_reqst reqst;
struct srpc_generic_reply reply;
struct srpc_mksn_reqst mksn_reqst;
struct srpc_mksn_reply mksn_reply;
struct srpc_rmsn_reqst rmsn_reqst;
struct srpc_rmsn_reply rmsn_reply;
struct srpc_debug_reqst dbg_reqst;
struct srpc_debug_reply dbg_reply;
struct srpc_batch_reqst bat_reqst;
struct srpc_batch_reply bat_reply;
struct srpc_stat_reqst stat_reqst;
struct srpc_stat_reply stat_reply;
struct srpc_test_reqst tes_reqst;
struct srpc_test_reply tes_reply;
struct srpc_join_reqst join_reqst;
struct srpc_join_reply join_reply;
struct srpc_ping_reqst ping_reqst;
struct srpc_ping_reply ping_reply;
struct srpc_brw_reqst brw_reqst;
struct srpc_brw_reply brw_reply;
} msg_body;
} WIRE_ATTR;
static inline void
srpc_unpack_msg_hdr(struct srpc_msg *msg)
{
if (msg->msg_magic == SRPC_MSG_MAGIC)
return; /* no flipping needed */
/*
* We do not swap the magic number here as it is needed to
* determine whether the body needs to be swapped.
*/
/* __swab32s(&msg->msg_magic); */
__swab32s(&msg->msg_type);
__swab32s(&msg->msg_version);
__swab32s(&msg->msg_ses_feats);
__swab32s(&msg->msg_reserved0);
__swab32s(&msg->msg_reserved1);
}
#endif /* __SELFTEST_RPC_H__ */

View File

@ -1,622 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/selftest/selftest.h
*
* Author: Isaac Huang <isaac@clusterfs.com>
*/
#ifndef __SELFTEST_SELFTEST_H__
#define __SELFTEST_SELFTEST_H__
#define LNET_ONLY
#include <linux/lnet/lib-lnet.h>
#include <linux/lnet/lib-types.h>
#include <uapi/linux/lnet/lnetst.h>
#include "rpc.h"
#include "timer.h"
#ifndef MADE_WITHOUT_COMPROMISE
#define MADE_WITHOUT_COMPROMISE
#endif
#define SWI_STATE_NEWBORN 0
#define SWI_STATE_REPLY_SUBMITTED 1
#define SWI_STATE_REPLY_SENT 2
#define SWI_STATE_REQUEST_SUBMITTED 3
#define SWI_STATE_REQUEST_SENT 4
#define SWI_STATE_REPLY_RECEIVED 5
#define SWI_STATE_BULK_STARTED 6
#define SWI_STATE_DONE 10
/* forward refs */
struct srpc_service;
struct srpc_service_cd;
struct sfw_test_unit;
struct sfw_test_instance;
/* services below SRPC_FRAMEWORK_SERVICE_MAX_ID are framework
* services, e.g. create/modify session.
*/
#define SRPC_SERVICE_DEBUG 0
#define SRPC_SERVICE_MAKE_SESSION 1
#define SRPC_SERVICE_REMOVE_SESSION 2
#define SRPC_SERVICE_BATCH 3
#define SRPC_SERVICE_TEST 4
#define SRPC_SERVICE_QUERY_STAT 5
#define SRPC_SERVICE_JOIN 6
#define SRPC_FRAMEWORK_SERVICE_MAX_ID 10
/* other services start from SRPC_FRAMEWORK_SERVICE_MAX_ID+1 */
#define SRPC_SERVICE_BRW 11
#define SRPC_SERVICE_PING 12
#define SRPC_SERVICE_MAX_ID 12
#define SRPC_REQUEST_PORTAL 50
/* a lazy portal for framework RPC requests */
#define SRPC_FRAMEWORK_REQUEST_PORTAL 51
/* all reply/bulk RDMAs go to this portal */
#define SRPC_RDMA_PORTAL 52
static inline enum srpc_msg_type
srpc_service2request(int service)
{
switch (service) {
default:
LBUG();
case SRPC_SERVICE_DEBUG:
return SRPC_MSG_DEBUG_REQST;
case SRPC_SERVICE_MAKE_SESSION:
return SRPC_MSG_MKSN_REQST;
case SRPC_SERVICE_REMOVE_SESSION:
return SRPC_MSG_RMSN_REQST;
case SRPC_SERVICE_BATCH:
return SRPC_MSG_BATCH_REQST;
case SRPC_SERVICE_TEST:
return SRPC_MSG_TEST_REQST;
case SRPC_SERVICE_QUERY_STAT:
return SRPC_MSG_STAT_REQST;
case SRPC_SERVICE_BRW:
return SRPC_MSG_BRW_REQST;
case SRPC_SERVICE_PING:
return SRPC_MSG_PING_REQST;
case SRPC_SERVICE_JOIN:
return SRPC_MSG_JOIN_REQST;
}
}
static inline enum srpc_msg_type
srpc_service2reply(int service)
{
return srpc_service2request(service) + 1;
}
enum srpc_event_type {
SRPC_BULK_REQ_RCVD = 1, /* passive bulk request(PUT sink/GET source)
* received
*/
SRPC_BULK_PUT_SENT = 2, /* active bulk PUT sent (source) */
SRPC_BULK_GET_RPLD = 3, /* active bulk GET replied (sink) */
SRPC_REPLY_RCVD = 4, /* incoming reply received */
SRPC_REPLY_SENT = 5, /* outgoing reply sent */
SRPC_REQUEST_RCVD = 6, /* incoming request received */
SRPC_REQUEST_SENT = 7, /* outgoing request sent */
};
/* RPC event */
struct srpc_event {
enum srpc_event_type ev_type; /* what's up */
enum lnet_event_kind ev_lnet; /* LNet event type */
int ev_fired; /* LNet event fired? */
int ev_status; /* LNet event status */
void *ev_data; /* owning server/client RPC */
};
/* bulk descriptor */
struct srpc_bulk {
int bk_len; /* len of bulk data */
struct lnet_handle_md bk_mdh;
int bk_sink; /* sink/source */
int bk_niov; /* # iov in bk_iovs */
struct bio_vec bk_iovs[0];
};
/* message buffer descriptor */
struct srpc_buffer {
struct list_head buf_list; /* chain on srpc_service::*_msgq */
struct srpc_msg buf_msg;
struct lnet_handle_md buf_mdh;
lnet_nid_t buf_self;
struct lnet_process_id buf_peer;
};
struct swi_workitem;
typedef void (*swi_action_t) (struct swi_workitem *);
struct swi_workitem {
struct workqueue_struct *swi_wq;
struct work_struct swi_work;
swi_action_t swi_action;
int swi_state;
};
/* server-side state of a RPC */
struct srpc_server_rpc {
/* chain on srpc_service::*_rpcq */
struct list_head srpc_list;
struct srpc_service_cd *srpc_scd;
struct swi_workitem srpc_wi;
struct srpc_event srpc_ev; /* bulk/reply event */
lnet_nid_t srpc_self;
struct lnet_process_id srpc_peer;
struct srpc_msg srpc_replymsg;
struct lnet_handle_md srpc_replymdh;
struct srpc_buffer *srpc_reqstbuf;
struct srpc_bulk *srpc_bulk;
unsigned int srpc_aborted; /* being given up */
int srpc_status;
void (*srpc_done)(struct srpc_server_rpc *);
};
/* client-side state of a RPC */
struct srpc_client_rpc {
struct list_head crpc_list; /* chain on user's lists */
spinlock_t crpc_lock; /* serialize */
int crpc_service;
atomic_t crpc_refcount;
int crpc_timeout; /* # seconds to wait for reply */
struct stt_timer crpc_timer;
struct swi_workitem crpc_wi;
struct lnet_process_id crpc_dest;
void (*crpc_done)(struct srpc_client_rpc *);
void (*crpc_fini)(struct srpc_client_rpc *);
int crpc_status; /* completion status */
void *crpc_priv; /* caller data */
/* state flags */
unsigned int crpc_aborted:1; /* being given up */
unsigned int crpc_closed:1; /* completed */
/* RPC events */
struct srpc_event crpc_bulkev; /* bulk event */
struct srpc_event crpc_reqstev; /* request event */
struct srpc_event crpc_replyev; /* reply event */
/* bulk, request(reqst), and reply exchanged on wire */
struct srpc_msg crpc_reqstmsg;
struct srpc_msg crpc_replymsg;
struct lnet_handle_md crpc_reqstmdh;
struct lnet_handle_md crpc_replymdh;
struct srpc_bulk crpc_bulk;
};
#define srpc_client_rpc_size(rpc) \
offsetof(struct srpc_client_rpc, crpc_bulk.bk_iovs[(rpc)->crpc_bulk.bk_niov])
#define srpc_client_rpc_addref(rpc) \
do { \
CDEBUG(D_NET, "RPC[%p] -> %s (%d)++\n", \
(rpc), libcfs_id2str((rpc)->crpc_dest), \
atomic_read(&(rpc)->crpc_refcount)); \
LASSERT(atomic_read(&(rpc)->crpc_refcount) > 0); \
atomic_inc(&(rpc)->crpc_refcount); \
} while (0)
#define srpc_client_rpc_decref(rpc) \
do { \
CDEBUG(D_NET, "RPC[%p] -> %s (%d)--\n", \
(rpc), libcfs_id2str((rpc)->crpc_dest), \
atomic_read(&(rpc)->crpc_refcount)); \
LASSERT(atomic_read(&(rpc)->crpc_refcount) > 0); \
if (atomic_dec_and_test(&(rpc)->crpc_refcount)) \
srpc_destroy_client_rpc(rpc); \
} while (0)
#define srpc_event_pending(rpc) (!(rpc)->crpc_bulkev.ev_fired || \
!(rpc)->crpc_reqstev.ev_fired || \
!(rpc)->crpc_replyev.ev_fired)
/* CPU partition data of srpc service */
struct srpc_service_cd {
/** serialize */
spinlock_t scd_lock;
/** backref to service */
struct srpc_service *scd_svc;
/** event buffer */
struct srpc_event scd_ev;
/** free RPC descriptors */
struct list_head scd_rpc_free;
/** in-flight RPCs */
struct list_head scd_rpc_active;
/** workitem for posting buffer */
struct swi_workitem scd_buf_wi;
/** CPT id */
int scd_cpt;
/** error code for scd_buf_wi */
int scd_buf_err;
/** timestamp for scd_buf_err */
time64_t scd_buf_err_stamp;
/** total # request buffers */
int scd_buf_total;
/** # posted request buffers */
int scd_buf_nposted;
/** in progress of buffer posting */
int scd_buf_posting;
/** allocate more buffers if scd_buf_nposted < scd_buf_low */
int scd_buf_low;
/** increase/decrease some buffers */
int scd_buf_adjust;
/** posted message buffers */
struct list_head scd_buf_posted;
/** blocked for RPC descriptor */
struct list_head scd_buf_blocked;
};
/* number of server workitems (mini-thread) for testing service */
#define SFW_TEST_WI_MIN 256
#define SFW_TEST_WI_MAX 2048
/* extra buffers for tolerating buggy peers, or unbalanced number
* of peers between partitions
*/
#define SFW_TEST_WI_EXTRA 64
/* number of server workitems (mini-thread) for framework service */
#define SFW_FRWK_WI_MIN 16
#define SFW_FRWK_WI_MAX 256
struct srpc_service {
int sv_id; /* service id */
const char *sv_name; /* human readable name */
int sv_wi_total; /* total server workitems */
int sv_shuttingdown;
int sv_ncpts;
/* percpt data for srpc_service */
struct srpc_service_cd **sv_cpt_data;
/* Service callbacks:
* - sv_handler: process incoming RPC request
* - sv_bulk_ready: notify bulk data
*/
int (*sv_handler)(struct srpc_server_rpc *);
int (*sv_bulk_ready)(struct srpc_server_rpc *, int);
};
struct sfw_session {
struct list_head sn_list; /* chain on fw_zombie_sessions */
struct lst_sid sn_id; /* unique identifier */
unsigned int sn_timeout; /* # seconds' inactivity to expire */
int sn_timer_active;
unsigned int sn_features;
struct stt_timer sn_timer;
struct list_head sn_batches; /* list of batches */
char sn_name[LST_NAME_SIZE];
atomic_t sn_refcount;
atomic_t sn_brw_errors;
atomic_t sn_ping_errors;
unsigned long sn_started;
};
#define sfw_sid_equal(sid0, sid1) ((sid0).ses_nid == (sid1).ses_nid && \
(sid0).ses_stamp == (sid1).ses_stamp)
struct sfw_batch {
struct list_head bat_list; /* chain on sn_batches */
struct lst_bid bat_id; /* batch id */
int bat_error; /* error code of batch */
struct sfw_session *bat_session; /* batch's session */
atomic_t bat_nactive; /* # of active tests */
struct list_head bat_tests; /* test instances */
};
struct sfw_test_client_ops {
int (*tso_init)(struct sfw_test_instance *tsi); /* initialize test
* client
*/
void (*tso_fini)(struct sfw_test_instance *tsi); /* finalize test
* client
*/
int (*tso_prep_rpc)(struct sfw_test_unit *tsu,
struct lnet_process_id dest,
struct srpc_client_rpc **rpc); /* prep a tests rpc */
void (*tso_done_rpc)(struct sfw_test_unit *tsu,
struct srpc_client_rpc *rpc); /* done a test rpc */
};
struct sfw_test_instance {
struct list_head tsi_list; /* chain on batch */
int tsi_service; /* test type */
struct sfw_batch *tsi_batch; /* batch */
struct sfw_test_client_ops *tsi_ops; /* test client operation
*/
/* public parameter for all test units */
unsigned int tsi_is_client:1; /* is test client */
unsigned int tsi_stoptsu_onerr:1; /* stop tsu on error */
int tsi_concur; /* concurrency */
int tsi_loop; /* loop count */
/* status of test instance */
spinlock_t tsi_lock; /* serialize */
unsigned int tsi_stopping:1; /* test is stopping */
atomic_t tsi_nactive; /* # of active test
* unit
*/
struct list_head tsi_units; /* test units */
struct list_head tsi_free_rpcs; /* free rpcs */
struct list_head tsi_active_rpcs; /* active rpcs */
union {
struct test_ping_req ping; /* ping parameter */
struct test_bulk_req bulk_v0; /* bulk parameter */
struct test_bulk_req_v1 bulk_v1; /* bulk v1 parameter */
} tsi_u;
};
/*
* XXX: trailing (PAGE_SIZE % sizeof(struct lnet_process_id)) bytes at the end
* of pages are not used
*/
#define SFW_MAX_CONCUR LST_MAX_CONCUR
#define SFW_ID_PER_PAGE (PAGE_SIZE / sizeof(struct lnet_process_id_packed))
#define SFW_MAX_NDESTS (LNET_MAX_IOV * SFW_ID_PER_PAGE)
#define sfw_id_pages(n) (((n) + SFW_ID_PER_PAGE - 1) / SFW_ID_PER_PAGE)
struct sfw_test_unit {
struct list_head tsu_list; /* chain on lst_test_instance */
struct lnet_process_id tsu_dest; /* id of dest node */
int tsu_loop; /* loop count of the test */
struct sfw_test_instance *tsu_instance; /* pointer to test instance */
void *tsu_private; /* private data */
struct swi_workitem tsu_worker; /* workitem of the test unit */
};
struct sfw_test_case {
struct list_head tsc_list; /* chain on fw_tests */
struct srpc_service *tsc_srv_service; /* test service */
struct sfw_test_client_ops *tsc_cli_ops; /* ops of test client */
};
struct srpc_client_rpc *
sfw_create_rpc(struct lnet_process_id peer, int service,
unsigned int features, int nbulkiov, int bulklen,
void (*done)(struct srpc_client_rpc *), void *priv);
int sfw_create_test_rpc(struct sfw_test_unit *tsu,
struct lnet_process_id peer, unsigned int features,
int nblk, int blklen, struct srpc_client_rpc **rpc);
void sfw_abort_rpc(struct srpc_client_rpc *rpc);
void sfw_post_rpc(struct srpc_client_rpc *rpc);
void sfw_client_rpc_done(struct srpc_client_rpc *rpc);
void sfw_unpack_message(struct srpc_msg *msg);
void sfw_free_pages(struct srpc_server_rpc *rpc);
void sfw_add_bulk_page(struct srpc_bulk *bk, struct page *pg, int i);
int sfw_alloc_pages(struct srpc_server_rpc *rpc, int cpt, int npages, int len,
int sink);
int sfw_make_session(struct srpc_mksn_reqst *request,
struct srpc_mksn_reply *reply);
struct srpc_client_rpc *
srpc_create_client_rpc(struct lnet_process_id peer, int service,
int nbulkiov, int bulklen,
void (*rpc_done)(struct srpc_client_rpc *),
void (*rpc_fini)(struct srpc_client_rpc *), void *priv);
void srpc_post_rpc(struct srpc_client_rpc *rpc);
void srpc_abort_rpc(struct srpc_client_rpc *rpc, int why);
void srpc_free_bulk(struct srpc_bulk *bk);
struct srpc_bulk *srpc_alloc_bulk(int cpt, unsigned int off,
unsigned int bulk_npg, unsigned int bulk_len,
int sink);
void srpc_send_rpc(struct swi_workitem *wi);
int srpc_send_reply(struct srpc_server_rpc *rpc);
int srpc_add_service(struct srpc_service *sv);
int srpc_remove_service(struct srpc_service *sv);
void srpc_shutdown_service(struct srpc_service *sv);
void srpc_abort_service(struct srpc_service *sv);
int srpc_finish_service(struct srpc_service *sv);
int srpc_service_add_buffers(struct srpc_service *sv, int nbuffer);
void srpc_service_remove_buffers(struct srpc_service *sv, int nbuffer);
void srpc_get_counters(struct srpc_counters *cnt);
void srpc_set_counters(const struct srpc_counters *cnt);
extern struct workqueue_struct *lst_serial_wq;
extern struct workqueue_struct **lst_test_wq;
static inline int
srpc_serv_is_framework(struct srpc_service *svc)
{
return svc->sv_id < SRPC_FRAMEWORK_SERVICE_MAX_ID;
}
static void
swi_wi_action(struct work_struct *wi)
{
struct swi_workitem *swi;
swi = container_of(wi, struct swi_workitem, swi_work);
swi->swi_action(swi);
}
static inline void
swi_init_workitem(struct swi_workitem *swi,
swi_action_t action, struct workqueue_struct *wq)
{
swi->swi_wq = wq;
swi->swi_action = action;
swi->swi_state = SWI_STATE_NEWBORN;
INIT_WORK(&swi->swi_work, swi_wi_action);
}
static inline void
swi_schedule_workitem(struct swi_workitem *wi)
{
queue_work(wi->swi_wq, &wi->swi_work);
}
static inline int
swi_cancel_workitem(struct swi_workitem *swi)
{
return cancel_work_sync(&swi->swi_work);
}
int sfw_startup(void);
int srpc_startup(void);
void sfw_shutdown(void);
void srpc_shutdown(void);
static inline void
srpc_destroy_client_rpc(struct srpc_client_rpc *rpc)
{
LASSERT(rpc);
LASSERT(!srpc_event_pending(rpc));
LASSERT(!atomic_read(&rpc->crpc_refcount));
if (!rpc->crpc_fini)
kfree(rpc);
else
(*rpc->crpc_fini)(rpc);
}
static inline void
srpc_init_client_rpc(struct srpc_client_rpc *rpc, struct lnet_process_id peer,
int service, int nbulkiov, int bulklen,
void (*rpc_done)(struct srpc_client_rpc *),
void (*rpc_fini)(struct srpc_client_rpc *), void *priv)
{
LASSERT(nbulkiov <= LNET_MAX_IOV);
memset(rpc, 0, offsetof(struct srpc_client_rpc,
crpc_bulk.bk_iovs[nbulkiov]));
INIT_LIST_HEAD(&rpc->crpc_list);
swi_init_workitem(&rpc->crpc_wi, srpc_send_rpc,
lst_test_wq[lnet_cpt_of_nid(peer.nid)]);
spin_lock_init(&rpc->crpc_lock);
atomic_set(&rpc->crpc_refcount, 1); /* 1 ref for caller */
rpc->crpc_dest = peer;
rpc->crpc_priv = priv;
rpc->crpc_service = service;
rpc->crpc_bulk.bk_len = bulklen;
rpc->crpc_bulk.bk_niov = nbulkiov;
rpc->crpc_done = rpc_done;
rpc->crpc_fini = rpc_fini;
LNetInvalidateMDHandle(&rpc->crpc_reqstmdh);
LNetInvalidateMDHandle(&rpc->crpc_replymdh);
LNetInvalidateMDHandle(&rpc->crpc_bulk.bk_mdh);
/* no event is expected at this point */
rpc->crpc_bulkev.ev_fired = 1;
rpc->crpc_reqstev.ev_fired = 1;
rpc->crpc_replyev.ev_fired = 1;
rpc->crpc_reqstmsg.msg_magic = SRPC_MSG_MAGIC;
rpc->crpc_reqstmsg.msg_version = SRPC_MSG_VERSION;
rpc->crpc_reqstmsg.msg_type = srpc_service2request(service);
}
static inline const char *
swi_state2str(int state)
{
#define STATE2STR(x) case x: return #x
switch (state) {
default:
LBUG();
STATE2STR(SWI_STATE_NEWBORN);
STATE2STR(SWI_STATE_REPLY_SUBMITTED);
STATE2STR(SWI_STATE_REPLY_SENT);
STATE2STR(SWI_STATE_REQUEST_SUBMITTED);
STATE2STR(SWI_STATE_REQUEST_SENT);
STATE2STR(SWI_STATE_REPLY_RECEIVED);
STATE2STR(SWI_STATE_BULK_STARTED);
STATE2STR(SWI_STATE_DONE);
}
#undef STATE2STR
}
#define selftest_wait_events() \
do { \
set_current_state(TASK_UNINTERRUPTIBLE); \
schedule_timeout(HZ / 10); \
} while (0)
#define lst_wait_until(cond, lock, fmt, ...) \
do { \
int __I = 2; \
while (!(cond)) { \
CDEBUG(is_power_of_2(++__I) ? D_WARNING : D_NET, \
fmt, ## __VA_ARGS__); \
spin_unlock(&(lock)); \
\
selftest_wait_events(); \
\
spin_lock(&(lock)); \
} \
} while (0)
static inline void
srpc_wait_service_shutdown(struct srpc_service *sv)
{
int i = 2;
LASSERT(sv->sv_shuttingdown);
while (!srpc_finish_service(sv)) {
i++;
CDEBUG(((i & -i) == i) ? D_WARNING : D_NET,
"Waiting for %s service to shutdown...\n",
sv->sv_name);
selftest_wait_events();
}
}
extern struct sfw_test_client_ops brw_test_client;
void brw_init_test_client(void);
extern struct srpc_service brw_test_service;
void brw_init_test_service(void);
extern struct sfw_test_client_ops ping_test_client;
void ping_init_test_client(void);
extern struct srpc_service ping_test_service;
void ping_init_test_service(void);
#endif /* __SELFTEST_SELFTEST_H__ */

View File

@ -1,244 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* GPL HEADER START
*
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 only,
* as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License version 2 for more details (a copy is included
* in the LICENSE file that accompanied this code).
*
* You should have received a copy of the GNU General Public License
* version 2 along with this program; If not, see
* http://www.gnu.org/licenses/gpl-2.0.html
*
* GPL HEADER END
*/
/*
* Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved.
* Use is subject to license terms.
*
* Copyright (c) 2011, 2012, Intel Corporation.
*/
/*
* This file is part of Lustre, http://www.lustre.org/
* Lustre is a trademark of Sun Microsystems, Inc.
*
* lnet/selftest/timer.c
*
* Author: Isaac Huang <isaac@clusterfs.com>
*/
#define DEBUG_SUBSYSTEM S_LNET
#include "selftest.h"
/*
* Timers are implemented as a sorted queue of expiry times. The queue
* is slotted, with each slot holding timers which expire in a
* 2**STTIMER_MINPOLL (8) second period. The timers in each slot are
* sorted by increasing expiry time. The number of slots is 2**7 (128),
* to cover a time period of 1024 seconds into the future before wrapping.
*/
#define STTIMER_MINPOLL 3 /* log2 min poll interval (8 s) */
#define STTIMER_SLOTTIME BIT(STTIMER_MINPOLL)
#define STTIMER_SLOTTIMEMASK (~(STTIMER_SLOTTIME - 1))
#define STTIMER_NSLOTS BIT(7)
#define STTIMER_SLOT(t) (&stt_data.stt_hash[(((t) >> STTIMER_MINPOLL) & \
(STTIMER_NSLOTS - 1))])
static struct st_timer_data {
spinlock_t stt_lock;
unsigned long stt_prev_slot; /* start time of the slot processed
* previously
*/
struct list_head stt_hash[STTIMER_NSLOTS];
int stt_shuttingdown;
wait_queue_head_t stt_waitq;
int stt_nthreads;
} stt_data;
void
stt_add_timer(struct stt_timer *timer)
{
struct list_head *pos;
spin_lock(&stt_data.stt_lock);
LASSERT(stt_data.stt_nthreads > 0);
LASSERT(!stt_data.stt_shuttingdown);
LASSERT(timer->stt_func);
LASSERT(list_empty(&timer->stt_list));
LASSERT(timer->stt_expires > ktime_get_real_seconds());
/* a simple insertion sort */
list_for_each_prev(pos, STTIMER_SLOT(timer->stt_expires)) {
struct stt_timer *old = list_entry(pos, struct stt_timer,
stt_list);
if (timer->stt_expires >= old->stt_expires)
break;
}
list_add(&timer->stt_list, pos);
spin_unlock(&stt_data.stt_lock);
}
/*
* The function returns whether it has deactivated a pending timer or not.
* (ie. del_timer() of an inactive timer returns 0, del_timer() of an
* active timer returns 1.)
*
* CAVEAT EMPTOR:
* When 0 is returned, it is possible that timer->stt_func _is_ running on
* another CPU.
*/
int
stt_del_timer(struct stt_timer *timer)
{
int ret = 0;
spin_lock(&stt_data.stt_lock);
LASSERT(stt_data.stt_nthreads > 0);
LASSERT(!stt_data.stt_shuttingdown);
if (!list_empty(&timer->stt_list)) {
ret = 1;
list_del_init(&timer->stt_list);
}
spin_unlock(&stt_data.stt_lock);
return ret;
}
/* called with stt_data.stt_lock held */
static int
stt_expire_list(struct list_head *slot, time64_t now)
{
int expired = 0;
struct stt_timer *timer;
while (!list_empty(slot)) {
timer = list_entry(slot->next, struct stt_timer, stt_list);
if (timer->stt_expires > now)
break;
list_del_init(&timer->stt_list);
spin_unlock(&stt_data.stt_lock);
expired++;
(*timer->stt_func) (timer->stt_data);
spin_lock(&stt_data.stt_lock);
}
return expired;
}
static int
stt_check_timers(unsigned long *last)
{
int expired = 0;
time64_t now;
unsigned long this_slot;
now = ktime_get_real_seconds();
this_slot = now & STTIMER_SLOTTIMEMASK;
spin_lock(&stt_data.stt_lock);
while (time_after_eq(this_slot, *last)) {
expired += stt_expire_list(STTIMER_SLOT(this_slot), now);
this_slot = this_slot - STTIMER_SLOTTIME;
}
*last = now & STTIMER_SLOTTIMEMASK;
spin_unlock(&stt_data.stt_lock);
return expired;
}
static int
stt_timer_main(void *arg)
{
int rc = 0;
while (!stt_data.stt_shuttingdown) {
stt_check_timers(&stt_data.stt_prev_slot);
rc = wait_event_timeout(stt_data.stt_waitq,
stt_data.stt_shuttingdown,
STTIMER_SLOTTIME * HZ);
}
spin_lock(&stt_data.stt_lock);
stt_data.stt_nthreads--;
spin_unlock(&stt_data.stt_lock);
return rc;
}
static int
stt_start_timer_thread(void)
{
struct task_struct *task;
LASSERT(!stt_data.stt_shuttingdown);
task = kthread_run(stt_timer_main, NULL, "st_timer");
if (IS_ERR(task))
return PTR_ERR(task);
spin_lock(&stt_data.stt_lock);
stt_data.stt_nthreads++;
spin_unlock(&stt_data.stt_lock);
return 0;
}
int
stt_startup(void)
{
int rc = 0;
int i;
stt_data.stt_shuttingdown = 0;
stt_data.stt_prev_slot = ktime_get_real_seconds() & STTIMER_SLOTTIMEMASK;
spin_lock_init(&stt_data.stt_lock);
for (i = 0; i < STTIMER_NSLOTS; i++)
INIT_LIST_HEAD(&stt_data.stt_hash[i]);
stt_data.stt_nthreads = 0;
init_waitqueue_head(&stt_data.stt_waitq);
rc = stt_start_timer_thread();
if (rc)
CERROR("Can't spawn timer thread: %d\n", rc);
return rc;
}
void
stt_shutdown(void)
{
int i;
spin_lock(&stt_data.stt_lock);
for (i = 0; i < STTIMER_NSLOTS; i++)
LASSERT(list_empty(&stt_data.stt_hash[i]));
stt_data.stt_shuttingdown = 1;
wake_up(&stt_data.stt_waitq);
lst_wait_until(!stt_data.stt_nthreads, stt_data.stt_lock,
"waiting for %d threads to terminate\n",
stt_data.stt_nthreads);
spin_unlock(&stt_data.stt_lock);
}

Some files were not shown because too many files have changed in this diff Show More