Skip to content

Commit

Permalink
Merge branch 'net-smc-parallelism'
Browse files Browse the repository at this point in the history
D. Wythe says:

====================
net/smc: optimize the parallelism of SMC-R connections

This patch set attempts to optimize the parallelism of SMC-R connections,
mainly to reduce unnecessary blocking on locks, and to fix exceptions that
occur after thoses optimization.

According to Off-CPU graph, SMC worker's off-CPU as that:

smc_close_passive_work                  (1.09%)
        smcr_buf_unuse                  (1.08%)
                smc_llc_flow_initiate   (1.02%)

smc_listen_work                         (48.17%)
        __mutex_lock.isra.11            (47.96%)

An ideal SMC-R connection process should only block on the IO events
of the network, but it's quite clear that the SMC-R connection now is
queued on the lock most of the time.

The goal of this patchset is to achieve our ideal situation where
network IO events are blocked for the majority of the connection lifetime.

There are three big locks here:

1. smc_client_lgr_pending & smc_server_lgr_pending

2. llc_conf_mutex

3. rmbs_lock & sndbufs_lock

And an implementation issue:

1. confirm/delete rkey msg can't be sent concurrently while
protocol allows indeed.

Unfortunately,The above problems together affect the parallelism of
SMC-R connection. If any of them are not solved. our goal cannot
be achieved.

After this patch set, we can get a quite ideal off-CPU graph as
following:

smc_close_passive_work                                  (41.58%)
        smcr_buf_unuse                                  (41.57%)
                smc_llc_do_delete_rkey                  (41.57%)

smc_listen_work                                         (39.10%)
        smc_clc_wait_msg                                (13.18%)
                tcp_recvmsg_locked                      (13.18)
        smc_listen_find_device                          (25.87%)
                smcr_lgr_reg_rmbs                       (25.87%)
                        smc_llc_do_confirm_rkey         (25.87%)

We can see that most of the waiting times are waiting for network IO
events. This also has a certain performance improvement on our
short-lived conenction wrk/nginx benchmark test:

+--------------+------+------+-------+--------+------+--------+
|conns/qps     |c4    | c8   |  c16  |  c32   | c64  |  c200  |
+--------------+------+------+-------+--------+------+--------+
|SMC-R before  |9.7k  | 10k  |  10k  |  9.9k  | 9.1k |  8.9k  |
+--------------+------+------+-------+--------+------+--------+
|SMC-R now     |13k   | 19k  |  18k  |  16k   | 15k  |  12k   |
+--------------+------+------+-------+--------+------+--------+
|TCP           |15k   | 35k  |  51k  |  80k   | 100k |  162k  |
+--------------+------+------+-------+--------+------+--------+

The reason why the benefit is not obvious after the number of connections
has increased dues to workqueue. If we try to change workqueue to UNBOUND,
we can obtain at least 4-5 times performance improvement, reach up to half
of TCP. However, this is not an elegant solution, the optimization of it
will be much more complicated. But in any case, we will submit relevant
optimization patches as soon as possible.

Please note that the premise here is that the lock related problem
must be solved first, otherwise, no matter how we optimize the workqueue,
there won't be much improvement.

Because there are a lot of related changes to the code, if you have
any questions or suggestions, please let me know.

Thanks
D. Wythe

v1 -> v2:

1. Fix panic in SMC-D scenario
2. Fix lnkc related hashfn calculation exception, caused by operator
priority
3. Only wake up one connection if the lnk is not active
4. Delete obsolete unlock logic in smc_listen_work()
5. PATCH format, do Reverse Christmas tree
6. PATCH format, change all xxx_lnk_xxx function to xxx_link_xxx
7. PATCH format, add correct fix tag for the patches for fixes.
8. PATCH format, fix some spelling error
9. PATCH format, rename slow to do_slow

v2 -> v3:

1. add SMC-D support, remove the concept of link cluster since SMC-D has
no link at all. Replace it by lgr decision maker, who provides suggestions
to SMC-D and SMC-R on whether to create new link group.

2. Fix the corruption problem described by PATCH 'fix application
data exception' on SMC-D.

v3 -> v4:

1. Fix panic caused by uninitialization map.

v4 -> v5:

1. Make SMC-D buf creation be serial to avoid Potential error
2. Add a flag to synchronize the success of the first contact
with the ready of the link group, including SMC-D and SMC-R.
3. Fixed possible reference count leak in smc_llc_flow_start().
4. reorder the patch, make bugfix PATCH be ahead.

v5 -> v6:

1. Separate the bugfix patches to make it independent.
2. Merge patch 'fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending'
with patch 'remove locks smc_client_lgr_pending and smc_server_lgr_pending'
3. Format code styles, including alignment and reverse christmas tree
style.
4. Fix a possible memory leak in smc_llc_rmt_delete_rkey()
and smc_llc_rmt_conf_rkey().

v6 -> v7:

1. Discard patch attempting to remove global locks
2. Discard patch attempting make confirm/delete rkey process concurrently
====================

Signed-off-by: David S. Miller <[email protected]>
  • Loading branch information
davem330 committed Feb 4, 2023
2 parents 88c940c + aff7bfe commit 042b785
Show file tree
Hide file tree
Showing 4 changed files with 78 additions and 62 deletions.
25 changes: 20 additions & 5 deletions net/smc/af_smc.c
Original file line number Diff line number Diff line change
Expand Up @@ -502,15 +502,15 @@ static int smcr_lgr_reg_sndbufs(struct smc_link *link,
return -EINVAL;

/* protect against parallel smcr_link_reg_buf() */
mutex_lock(&lgr->llc_conf_mutex);
down_write(&lgr->llc_conf_mutex);
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
if (!smc_link_active(&lgr->lnk[i]))
continue;
rc = smcr_link_reg_buf(&lgr->lnk[i], snd_desc);
if (rc)
break;
}
mutex_unlock(&lgr->llc_conf_mutex);
up_write(&lgr->llc_conf_mutex);
return rc;
}

Expand All @@ -519,23 +519,38 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link,
struct smc_buf_desc *rmb_desc)
{
struct smc_link_group *lgr = link->lgr;
bool do_slow = false;
int i, rc = 0;

rc = smc_llc_flow_initiate(lgr, SMC_LLC_FLOW_RKEY);
if (rc)
return rc;

down_read(&lgr->llc_conf_mutex);
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
if (!smc_link_active(&lgr->lnk[i]))
continue;
if (!rmb_desc->is_reg_mr[link->link_idx]) {
up_read(&lgr->llc_conf_mutex);
goto slow_path;
}
}
/* mr register already */
goto fast_path;
slow_path:
do_slow = true;
/* protect against parallel smc_llc_cli_rkey_exchange() and
* parallel smcr_link_reg_buf()
*/
mutex_lock(&lgr->llc_conf_mutex);
down_write(&lgr->llc_conf_mutex);
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
if (!smc_link_active(&lgr->lnk[i]))
continue;
rc = smcr_link_reg_buf(&lgr->lnk[i], rmb_desc);
if (rc)
goto out;
}

fast_path:
/* exchange confirm_rkey msg with peer */
rc = smc_llc_do_confirm_rkey(link, rmb_desc);
if (rc) {
Expand All @@ -544,7 +559,7 @@ static int smcr_lgr_reg_rmbs(struct smc_link *link,
}
rmb_desc->is_conf_rkey = true;
out:
mutex_unlock(&lgr->llc_conf_mutex);
do_slow ? up_write(&lgr->llc_conf_mutex) : up_read(&lgr->llc_conf_mutex);
smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl);
return rc;
}
Expand Down
75 changes: 38 additions & 37 deletions net/smc/smc_core.c
Original file line number Diff line number Diff line change
Expand Up @@ -854,8 +854,8 @@ static int smc_lgr_create(struct smc_sock *smc, struct smc_init_info *ini)
lgr->freeing = 0;
lgr->vlan_id = ini->vlan_id;
refcount_set(&lgr->refcnt, 1); /* set lgr refcnt to 1 */
mutex_init(&lgr->sndbufs_lock);
mutex_init(&lgr->rmbs_lock);
init_rwsem(&lgr->sndbufs_lock);
init_rwsem(&lgr->rmbs_lock);
rwlock_init(&lgr->conns_lock);
for (i = 0; i < SMC_RMBE_SIZES; i++) {
INIT_LIST_HEAD(&lgr->sndbufs[i]);
Expand Down Expand Up @@ -1098,18 +1098,18 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr,
static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
struct smc_link_group *lgr)
{
struct mutex *lock; /* lock buffer list */
struct rw_semaphore *lock; /* lock buffer list */
int rc;

if (is_rmb && buf_desc->is_conf_rkey && !list_empty(&lgr->list)) {
/* unregister rmb with peer */
rc = smc_llc_flow_initiate(lgr, SMC_LLC_FLOW_RKEY);
if (!rc) {
/* protect against smc_llc_cli_rkey_exchange() */
mutex_lock(&lgr->llc_conf_mutex);
down_read(&lgr->llc_conf_mutex);
smc_llc_do_delete_rkey(lgr, buf_desc);
buf_desc->is_conf_rkey = false;
mutex_unlock(&lgr->llc_conf_mutex);
up_read(&lgr->llc_conf_mutex);
smc_llc_flow_stop(lgr, &lgr->llc_flow_lcl);
}
}
Expand All @@ -1118,9 +1118,9 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
/* buf registration failed, reuse not possible */
lock = is_rmb ? &lgr->rmbs_lock :
&lgr->sndbufs_lock;
mutex_lock(lock);
down_write(lock);
list_del(&buf_desc->list);
mutex_unlock(lock);
up_write(lock);

smc_buf_free(lgr, is_rmb, buf_desc);
} else {
Expand Down Expand Up @@ -1224,15 +1224,16 @@ static void smcr_buf_unmap_lgr(struct smc_link *lnk)
int i;

for (i = 0; i < SMC_RMBE_SIZES; i++) {
mutex_lock(&lgr->rmbs_lock);
down_write(&lgr->rmbs_lock);
list_for_each_entry_safe(buf_desc, bf, &lgr->rmbs[i], list)
smcr_buf_unmap_link(buf_desc, true, lnk);
mutex_unlock(&lgr->rmbs_lock);
mutex_lock(&lgr->sndbufs_lock);
up_write(&lgr->rmbs_lock);

down_write(&lgr->sndbufs_lock);
list_for_each_entry_safe(buf_desc, bf, &lgr->sndbufs[i],
list)
smcr_buf_unmap_link(buf_desc, false, lnk);
mutex_unlock(&lgr->sndbufs_lock);
up_write(&lgr->sndbufs_lock);
}
}

Expand Down Expand Up @@ -1377,12 +1378,12 @@ static void smc_lgr_free(struct smc_link_group *lgr)
int i;

if (!lgr->is_smcd) {
mutex_lock(&lgr->llc_conf_mutex);
down_write(&lgr->llc_conf_mutex);
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
if (lgr->lnk[i].state != SMC_LNK_UNUSED)
smcr_link_clear(&lgr->lnk[i], false);
}
mutex_unlock(&lgr->llc_conf_mutex);
up_write(&lgr->llc_conf_mutex);
smc_llc_lgr_clear(lgr);
}

Expand Down Expand Up @@ -1696,12 +1697,12 @@ static void smcr_link_down(struct smc_link *lnk)
} else {
if (lgr->llc_flow_lcl.type != SMC_LLC_FLOW_NONE) {
/* another llc task is ongoing */
mutex_unlock(&lgr->llc_conf_mutex);
up_write(&lgr->llc_conf_mutex);
wait_event_timeout(lgr->llc_flow_waiter,
(list_empty(&lgr->list) ||
lgr->llc_flow_lcl.type == SMC_LLC_FLOW_NONE),
SMC_LLC_WAIT_TIME);
mutex_lock(&lgr->llc_conf_mutex);
down_write(&lgr->llc_conf_mutex);
}
if (!list_empty(&lgr->list)) {
smc_llc_send_delete_link(to_lnk, del_link_id,
Expand Down Expand Up @@ -1761,9 +1762,9 @@ static void smc_link_down_work(struct work_struct *work)
if (list_empty(&lgr->list))
return;
wake_up_all(&lgr->llc_msg_waiter);
mutex_lock(&lgr->llc_conf_mutex);
down_write(&lgr->llc_conf_mutex);
smcr_link_down(link);
mutex_unlock(&lgr->llc_conf_mutex);
up_write(&lgr->llc_conf_mutex);
}

static int smc_vlan_by_tcpsk_walk(struct net_device *lower_dev,
Expand Down Expand Up @@ -1990,19 +1991,19 @@ int smc_uncompress_bufsize(u8 compressed)
* buffer size; if not available, return NULL
*/
static struct smc_buf_desc *smc_buf_get_slot(int compressed_bufsize,
struct mutex *lock,
struct rw_semaphore *lock,
struct list_head *buf_list)
{
struct smc_buf_desc *buf_slot;

mutex_lock(lock);
down_read(lock);
list_for_each_entry(buf_slot, buf_list, list) {
if (cmpxchg(&buf_slot->used, 0, 1) == 0) {
mutex_unlock(lock);
up_read(lock);
return buf_slot;
}
}
mutex_unlock(lock);
up_read(lock);
return NULL;
}

Expand Down Expand Up @@ -2111,13 +2112,13 @@ int smcr_link_reg_buf(struct smc_link *link, struct smc_buf_desc *buf_desc)
return 0;
}

static int _smcr_buf_map_lgr(struct smc_link *lnk, struct mutex *lock,
static int _smcr_buf_map_lgr(struct smc_link *lnk, struct rw_semaphore *lock,
struct list_head *lst, bool is_rmb)
{
struct smc_buf_desc *buf_desc, *bf;
int rc = 0;

mutex_lock(lock);
down_write(lock);
list_for_each_entry_safe(buf_desc, bf, lst, list) {
if (!buf_desc->used)
continue;
Expand All @@ -2126,7 +2127,7 @@ static int _smcr_buf_map_lgr(struct smc_link *lnk, struct mutex *lock,
goto out;
}
out:
mutex_unlock(lock);
up_write(lock);
return rc;
}

Expand Down Expand Up @@ -2159,37 +2160,37 @@ int smcr_buf_reg_lgr(struct smc_link *lnk)
int i, rc = 0;

/* reg all RMBs for a new link */
mutex_lock(&lgr->rmbs_lock);
down_write(&lgr->rmbs_lock);
for (i = 0; i < SMC_RMBE_SIZES; i++) {
list_for_each_entry_safe(buf_desc, bf, &lgr->rmbs[i], list) {
if (!buf_desc->used)
continue;
rc = smcr_link_reg_buf(lnk, buf_desc);
if (rc) {
mutex_unlock(&lgr->rmbs_lock);
up_write(&lgr->rmbs_lock);
return rc;
}
}
}
mutex_unlock(&lgr->rmbs_lock);
up_write(&lgr->rmbs_lock);

if (lgr->buf_type == SMCR_PHYS_CONT_BUFS)
return rc;

/* reg all vzalloced sndbufs for a new link */
mutex_lock(&lgr->sndbufs_lock);
down_write(&lgr->sndbufs_lock);
for (i = 0; i < SMC_RMBE_SIZES; i++) {
list_for_each_entry_safe(buf_desc, bf, &lgr->sndbufs[i], list) {
if (!buf_desc->used || !buf_desc->is_vm)
continue;
rc = smcr_link_reg_buf(lnk, buf_desc);
if (rc) {
mutex_unlock(&lgr->sndbufs_lock);
up_write(&lgr->sndbufs_lock);
return rc;
}
}
}
mutex_unlock(&lgr->sndbufs_lock);
up_write(&lgr->sndbufs_lock);
return rc;
}

Expand Down Expand Up @@ -2247,7 +2248,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr,
int i, rc = 0, cnt = 0;

/* protect against parallel link reconfiguration */
mutex_lock(&lgr->llc_conf_mutex);
down_read(&lgr->llc_conf_mutex);
for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
struct smc_link *lnk = &lgr->lnk[i];

Expand All @@ -2260,7 +2261,7 @@ static int smcr_buf_map_usable_links(struct smc_link_group *lgr,
cnt++;
}
out:
mutex_unlock(&lgr->llc_conf_mutex);
up_read(&lgr->llc_conf_mutex);
if (!rc && !cnt)
rc = -EINVAL;
return rc;
Expand Down Expand Up @@ -2309,8 +2310,8 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
struct smc_link_group *lgr = conn->lgr;
struct list_head *buf_list;
int bufsize, bufsize_short;
struct rw_semaphore *lock; /* lock buffer list */
bool is_dgraded = false;
struct mutex *lock; /* lock buffer list */
int sk_buf_size;

if (is_rmb)
Expand Down Expand Up @@ -2358,9 +2359,9 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_smcd, bool is_rmb)
SMC_STAT_RMB_ALLOC(smc, is_smcd, is_rmb);
SMC_STAT_RMB_SIZE(smc, is_smcd, is_rmb, bufsize);
buf_desc->used = 1;
mutex_lock(lock);
down_write(lock);
list_add(&buf_desc->list, buf_list);
mutex_unlock(lock);
up_write(lock);
break; /* found */
}

Expand Down Expand Up @@ -2434,9 +2435,9 @@ int smc_buf_create(struct smc_sock *smc, bool is_smcd)
/* create rmb */
rc = __smc_buf_create(smc, is_smcd, true);
if (rc) {
mutex_lock(&smc->conn.lgr->sndbufs_lock);
down_write(&smc->conn.lgr->sndbufs_lock);
list_del(&smc->conn.sndbuf_desc->list);
mutex_unlock(&smc->conn.lgr->sndbufs_lock);
up_write(&smc->conn.lgr->sndbufs_lock);
smc_buf_free(smc->conn.lgr, false, smc->conn.sndbuf_desc);
smc->conn.sndbuf_desc = NULL;
}
Expand Down
6 changes: 3 additions & 3 deletions net/smc/smc_core.h
Original file line number Diff line number Diff line change
Expand Up @@ -252,9 +252,9 @@ struct smc_link_group {
unsigned short vlan_id; /* vlan id of link group */

struct list_head sndbufs[SMC_RMBE_SIZES];/* tx buffers */
struct mutex sndbufs_lock; /* protects tx buffers */
struct rw_semaphore sndbufs_lock; /* protects tx buffers */
struct list_head rmbs[SMC_RMBE_SIZES]; /* rx buffers */
struct mutex rmbs_lock; /* protects rx buffers */
struct rw_semaphore rmbs_lock; /* protects rx buffers */

u8 id[SMC_LGR_ID_SIZE]; /* unique lgr id */
struct delayed_work free_work; /* delayed freeing of an lgr */
Expand Down Expand Up @@ -298,7 +298,7 @@ struct smc_link_group {
/* queue for llc events */
spinlock_t llc_event_q_lock;
/* protects llc_event_q */
struct mutex llc_conf_mutex;
struct rw_semaphore llc_conf_mutex;
/* protects lgr reconfig. */
struct work_struct llc_add_link_work;
struct work_struct llc_del_link_work;
Expand Down
Loading

0 comments on commit 042b785

Please sign in to comment.