Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using TCP/IP for DSM #2

Open
moharaka opened this issue Mar 11, 2020 · 4 comments
Open

Using TCP/IP for DSM #2

moharaka opened this issue Mar 11, 2020 · 4 comments

Comments

@moharaka
Copy link

Hi,

I am trying to compile using the TCP/IP for network communication. However, when I compile I get this error:

$ make
...
arch/x86/kvm/krdma.c: In function ‘krdma_connect_single’:
arch/x86/kvm/krdma.c:325:2: warning: ignoring return value of ‘kstrtol’, declared with attribute warn_unused_result [-Wunused-result]
  kstrtol(port, 10, &portdec);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kvm/krdma.c: In function ‘krdma_listen’:
arch/x86/kvm/krdma.c:550:2: warning: ignoring return value of ‘kstrtol’, declared with attribute warn_unused_result [-Wunused-result]
  kstrtol(port, 10, &portdec);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  CC      arch/x86/kvm/dsm.o
arch/x86/kvm/dsm.c: In function ‘kvm_dsm_init’:
arch/x86/kvm/dsm.c:573:19: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
  network_ops.send = ktcp_send;
                   ^
arch/x86/kvm/dsm.c:574:22: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
  network_ops.receive = ktcp_receive;
                      ^
arch/x86/kvm/dsm.c:553:2: warning: ignoring return value of ‘copy_from_user’, declared with attribute warn_unused_result [-Wunused-result]
  copy_from_user(user_cluster_iplist, params->cluster_iplist, sizeof(void *) *
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    params->cluster_iplist_len);
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kvm/dsm.c:561:3: warning: ignoring return value of ‘strncpy_from_user’, declared with attribute warn_unused_result [-Wunused-result]
   strncpy_from_user(kvm->arch.cluster_iplist[i], user_cluster_iplist[i], 20);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
scripts/Makefile.build:293: recipe for target 'arch/x86/kvm/dsm.o' failed
make[2]: *** [arch/x86/kvm/dsm.o] Error 1
scripts/Makefile.build:544: recipe for target 'arch/x86/kvm' failed
make[1]: *** [arch/x86/kvm] Error 2
Makefile:995: recipe for target 'arch/x86' failed
make: *** [arch/x86] Error 2

Any idea on the issue?

@xianliang66
Copy link
Contributor

Hi,

Unfortunately, we don't complete this part of the code. If you do want to use TCP, here's some advice:

Multiple vCPU threads share one communication channel backed by TCP/RDMA with another node. When a thread sends a request via the channel, you need to make sure the received response belongs to this thread. A mutex that protects send->receive is not recommended, unless you want to be drowned in the swamp of deadlocks. What we do for RDMA is that each send->receive pair is associated with a transaction id (tx_add->txid). The code in ivy.c guarantees that whenever DSM software issues network transmission, a txid is generated in send and DSM software tries to retrieve the response from receive with this txid. You may need to manage a buffer in ktcp.c. Consider how TCP handles disordered packets.

In addition, you may be disappointed to find TCP is too slow to boot a vanilla Linux like Ubuntu. (Light-weighted experimental OSes like sv6, Barrelfish are okay) The swap device booting may be timeout, soft lockup may be triggered, etc. You probably know the reason why few people research DSM in the 21st century.

@merimus
Copy link

merimus commented Jan 9, 2023

TCP shouldn't be that much slower... is this just a current implementation limitation?

@xianliang66
Copy link
Contributor

Well, for a single packet delivery, TCP is ~10 times slower than RDMA. And the e2e results might be even worse (think about the queuing theory). Some time-sensitive services for Linux (e.g., waiting for some devices) may fail without hacking the guest.

@merimus
Copy link

merimus commented Jan 9, 2023

Assume you are referring to latency then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants