Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[qp_impl.hpp:131] poll till completion error: 12 transport retry counter exceeded #3

Open
minghust opened this issue Apr 9, 2021 · 4 comments

Comments

@minghust
Copy link

minghust commented Apr 9, 2021

您好,感谢开源rlib!我在使用过程中,遇到了一个问题:

背景:client端(在机器1)使用线程t1和server端(在机器2)建立2个RCQP连接(QP1,QP2)后,client端的t1线程内创建一个新线程t2。接下来,t1使用QP1对server进行one-sided RDMA READ,t2使用QP2对server进行one-sided RDMA WRITE。t1和t2的RDMA READ/WRITE是并行的(读写没有任何冲突)。

到这里本应该没有问题,但是t2的RDMA WRITE却无法写成功(通过查看server端mem region未被修改而得知),导致t2在poll cq时出现 “transport retry counter exceeded” 报错。

查阅RDMA Aware Networks Programming User Manual (Rev 1.7),该错误的解释是:

6.2.13 IBV_WC_RETRY_EXC_ERR
This event is generated when a sender is unable to receive feedback from the receiver. This
means that either the receiver just never ACKs sender messages in a specified time period, or it
has been disconnected or it is in a bad state which prevents it from responding.

奇怪的是,如果t2使用QP1进行RDMA WRITE,则可以写成功,poll也没问题(注意到QP1和QP2都是使用class RRCQP中的connect函数分2次成功连接的)。

但我并不希望t1和t2共用一个RCQP,因为t1和t2会争抢completion queue,比如t1 poll到了t2的ack,导致t1认为自己的RDMA READ成功了,但实际上可能还没读到remote data。

希望您可以解答,谢谢!

@wxdwfc
Copy link
Owner

wxdwfc commented Apr 10, 2021

你好,

能不能给出一个具体的代码来复现问题?目前从描述上来看我没看出什么问题。

ps:现在这个project移到 https://github.com/wxdwfc/rlibv2 进行维护了,如果方便的话还是用新版本比较好。

谢谢!

@minghust
Copy link
Author

minghust commented Apr 10, 2021

好的,具体代码是这样的:
Server端:

void Server::RDMAConnect(std::string& client_ip,
                         int client_port,
                         int client_id) {
    // Server has already registered two seperate memory regions
    /************************************* RDMA Connection ***************************************/
    RDMA_LOG(INFO) << "Waiting for RDMA connecting compute nodes...";
    auto qp0 = rdma_ctrl->create_rc_qp(QPIdx{.node_id = client_id, .worker_id = 0, .index = 0},
                                           rdma_ctrl->get_device(),
                                           nullptr);
    while (qp0->connect(client_ip, client_port) != SUCC) {
        usleep(2000);
    }
    auto qp1 = rdma_ctrl->create_rc_qp(QPIdx{.node_id = client_id, .worker_id = 0, .index = 1},
                                          rdma_ctrl->get_device(),
                                          nullptr);
    while (qp1->connect(client_ip, client_port) != SUCC) {
        usleep(2000);
    }
    RDMA_LOG(INFO) << "Server: QP connected!";
}

Client端,线程t1

    void PairQPConnect(RdmaCtrl* rdma_ctrl,
                       RemoteNode& remote_node, // struct RemoteNode {int node_id; std::string ip; int port;};
                       MemoryAttr remote_mr0, // has been prefetched via QP::get_remote_mr()
                       MemoryAttr remote_mr1, // has been prefetched via QP::get_remote_mr()
                       RNicHandler* opened_rnic) {
        // Create the two queue pairs
        MemoryAttr local_mr = rdma_ctrl->get_local_mr(CLIENT_MR_ID); // CLIENT_MR_ID is a magic number
        RCQP* qp0 = rdma_ctrl->create_rc_qp(
            QPIdx{.node_id = remote_node.node_id, .worker_id = 0, .index = 0},
            opened_rnic,
            &local_mr);
        qp0->bind_remote_mr(remote_mr0);

        RCQP* qp1 = rdma_ctrl->create_rc_qp(
            QPIdx{.node_id = remote_node.node_id, .worker_id = 0, .index = 1},
            opened_rnic,
            &local_mr);
        qp1->bind_remote_mr(remote_mr1);

        // Queue pair connection, exchange queue pair info via TCP
        while (qp0->connect(remote_node.ip, remote_node.port) != SUCC) {
            usleep(2000);
        }
        while (qp1->connect(remote_node.ip, remote_node.port) != SUCC) {
            usleep(2000);
        }
        RDMA_LOG(INFO) << "Client: QP connected!";
        qp0_array[remote_node.node_id] = qp0;
        qp1_array[remote_node.node_id] = qp1;
    }

Client端,线程t1

int node_id = GetRemoteNodeID();
RCQP* qp = qp0_array[node_id];
size_t data_size = 1024;
char* read_buf = (char*) Rmalloc(data_size);
memset(read_buf, 0, data_size);
uint64_t remote_offset = 0;
auto rc = qp->post_send_to_mr(local_mr, remote_mr0, IBV_WR_RDMA_READ, read_buf, data_size, remote_offset, IBV_SEND_SIGNALED);
if (rc != SUCC) {
    RDMA_LOG(ERROR) << "client: post read fail. rc=" << rc;
}
ibv_wc wc{};
rc = qp->poll_till_completion(wc, no_timeout);
if (rc != SUCC) {
    RDMA_LOG(ERROR) << "client: poll read fail. rc=" << rc;
}
Rfree(read_buf);

Client端,线程t2

int node_id = GetRemoteNodeID();
RCQP* qp = qp1_array[node_id];
size_t data_size = 1024;
char* write_buf = (char*) Rmalloc(data_size);
memset(read_buf, 0, data_size);
uint64_t remote_offset = 0;
auto rc = qp->post_send_to_mr(local_mr, remote_mr1, IBV_WR_RDMA_WRITE, write_buf, data_size, remote_offset, IBV_SEND_SIGNALED);
if (rc != SUCC) {
    RDMA_LOG(ERROR) << "client: post read fail. rc=" << rc;
}
ibv_wc wc{};
rc = qp->poll_till_completion(wc, no_timeout); // ERROR: [qp_impl.hpp:131] poll till completion error: 12 transport retry counter exceeded
if (rc != SUCC) {
    RDMA_LOG(ERROR) << "client: poll read fail. rc=" << rc;
}
Rfree(write_buf);

如果将t2内的RCQP* qp = qp1_array[node_id];换成RCQP* qp = qp0_array[node_id];就没有问题

@wxdwfc
Copy link
Owner

wxdwfc commented Apr 10, 2021

你好,

目前如果要在rlib中连接多个QP的话建议借助RdmaCtrl (./rdma_ctrl.hpp), 具体可以参见link_symmetric_rcqps这一函数。 单独连接QP在rlib中没有经过详细的测试。

如果想要单独连接QP可以使用https://github.com/wxdwfc/rlibv2。
在该repo中可以比较方便的单独见QP(见https://github.com/wxdwfc/rlibv2/blob/master/examples/rc_write/client.cc)。
v2除了建立连接外在使用上和rlib基本一致,并且基本经过详细的测试比较稳定。

最后,由于rlib目前已经迁移到了https://github.com/wxdwfc/rlibv2,
还是建议使用rlibv2毕竟rlib我已经不再维护(但是v2有专人维护)。

感谢!

@minghust
Copy link
Author

好的,谢谢解答!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants