Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

发包异常 #20

Closed
silencelbl opened this issue Dec 16, 2023 · 14 comments
Closed

发包异常 #20

silencelbl opened this issue Dec 16, 2023 · 14 comments
Labels
bug Something isn't working

Comments

@silencelbl
Copy link

在测试过程中,服务器都使用tnet作为网络库,A进程使用tnet.DialTCP连接B进程服务器,当大量客户端连接和A进程开始交互后,A进程频繁和B进程进行消息通信,在一段时间后,B进程解包失败。
通过日志分析排查,结合使用tcpdump抓包发现,B进程收到了一个异常消息包,通过tcpdump可以看到,消息是从A进程发送过来,但是在A进程的Send方法中,并没有找到发送该消息的日志和信息。
A进程消息收发也没有异常和报错,什么情况会导致这种现象发生,从tcpdump抓到的包看,疑似是A进程的DialTCP连接底层发送了其他客户端连接上的数据,导致的B进程解包失败。
目前从代码上看DailTCP和Service使用的是同一个TCPHandler回调函数。

@silencelbl silencelbl added the bug Something isn't working label Dec 16, 2023
@WineChord
Copy link
Contributor

Hello, could you please provide a reproducible repository and specify your working environment, including the Go version, operating system version, and architecture?

@silencelbl
Copy link
Author

Go version: 1.21.3
Linux version 3.10.0-1160.66.1.el7.x86_64

@silencelbl
Copy link
Author

silencelbl commented Dec 18, 2023 via email

@silencelbl
Copy link
Author

Is anyone looking into this issue?

@WineChord
Copy link
Contributor

@silencelbl Could you please directly provide a reproducible github repo? You can push your code to your public repository and provide a link here.

@WineChord
Copy link
Contributor

I can think of a common scenario where this issue might occur. The root cause could be port reuse, where multiple processes are listening on the same port. To troubleshoot this, you can check if there are multiple processes using the same port.

@WineChord
Copy link
Contributor

@silencelbl I still believe it would be best if you could directly provide a minimal reproducible code repository. This would minimize the communication effort required. Otherwise, I would need to ask for details about each issue mentioned in your description in order to reproduce them accurately. Could you please provide a repository? It would save time for both of us. Thank you.

@silencelbl
Copy link
Author

@silencelbl I still believe it would be best if you could directly provide a minimal reproducible code repository. This would minimize the communication effort required. Otherwise, I would need to ask for details about each issue mentioned in your description in order to reproduce them accurately. Could you please provide a repository? It would save time for both of us. Thank you.

这两天我尽量抽空整理份能复现问题的代码,可否添加个VX?

@silencelbl
Copy link
Author

I can think of a common scenario where this issue might occur. The root cause could be port reuse, where multiple processes are listening on the same port. To troubleshoot this, you can check if there are multiple processes using the same port.

测试环境虽然在一台机子上启动了多个服务进程,client是在单独的施压机上单独运行,各服务进程监听端口是检查过的,我们在tnet测试异常后,切换会原生net包代码,问题不在复现,后面我整理下代码,提供一份给你看看。

@silencelbl
Copy link
Author

@silencelbl I still believe it would be best if you could directly provide a minimal reproducible code repository. This would minimize the communication effort required. Otherwise, I would need to ask for details about each issue mentioned in your description in order to reproduce them accurately. Could you please provide a repository? It would save time for both of us. Thank you.

复现代码已上传,我邀请了你

@WineChord
Copy link
Contributor

@silencelbl I still believe it would be best if you could directly provide a minimal reproducible code repository. This would minimize the communication effort required. Otherwise, I would need to ask for details about each issue mentioned in your description in order to reproduce them accurately. Could you please provide a repository? It would save time for both of us. Thank you.

复现代码已上传,我邀请了你

OK, I've received the invitation. I am working on it.

@WineChord
Copy link
Contributor

@silencelbl, I have figured out the solution. You have reused the buffer that is provided to tnet.Conn.Writev. But you only provide SafeWrite option to the server side. You should also enable the SafeWrite option on the client side:

image

@silencelbl
Copy link
Author

@silencelbl, I have figured out the solution. You have reused the buffer that is provided to tnet.Conn.Writev. But you only provide SafeWrite option to the server side. You should also enable the SafeWrite option on the client side:

image

3Q,问题得以解决,测试过程无再现此问题!在同一进程中的conn/tcp/udp都应设置safeWrite对吧@WineChord

@WineChord
Copy link
Contributor

@silencelbl, I have figured out the solution. You have reused the buffer that is provided to tnet.Conn.Writev. But you only provide SafeWrite option to the server side. You should also enable the SafeWrite option on the client side:
image

3Q,问题得以解决,测试过程无再现此问题!在同一进程中的conn/tcp/udp都应设置safeWrite对吧@WineChord

As long as you manage the buffer passed to writev on your own, you should always set safewrite to true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants