When a cn nodes in k8s cluster crash and restart, non-transactional stream load starts to get stuck #49950

snippins · 2024-08-19T05:33:05Z

Steps to reproduce the behavior (Required)

Wait for a cn node to restart/crash (evicting an cn node manuall does not seems to cause this problem)

Expected behavior (Required)

Non-transaction Stream loads works

Real behavior (Required)

Non transactional stream loads start to timeout (I know this by checking fe-proxy logs), at first for some tables it still works, some stop working, then gradually all stream load stop working, then query starts to get slower.

In the fe-proxy logs, I saw that the fe still might trying to send requests to the old ip of the restarted cn.

StarRocks version (Required)

3.3.1-2b87854

Workarounds:

For now, everytime this happens, I have to manually evict the fe leader node for things to gradually become normal again.

I suspect this is related to #40229, and for k8s environments the IPs are not static and thus causing problems?

kevincai · 2024-11-23T02:52:59Z

@snippins do you have some detailed logs for this issue, the fe leader log and fe-proxy log.

snippins · 2024-11-25T17:29:28Z

Sorry, we found out the actual reason that making cns crash, the default configuration would use 90% of disks for cache, but sometimes there are 2 cns started on the same k8s node so cn would crashed because there are not enough disk space. Thus we applied podaffinity settings to avoid this. Since then there were no cn crashses happened so we did not investigate further about the problem with stream load.

kevincai · 2024-11-25T22:36:07Z

thanks for the update.

close this issue for now.

snippins added the type/bug Something isn't working label Aug 19, 2024

kevincai self-assigned this Aug 19, 2024

kevincai closed this as completed Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When a cn nodes in k8s cluster crash and restart, non-transactional stream load starts to get stuck #49950

When a cn nodes in k8s cluster crash and restart, non-transactional stream load starts to get stuck #49950

snippins commented Aug 19, 2024 •

edited

Loading

kevincai commented Nov 23, 2024

snippins commented Nov 25, 2024

kevincai commented Nov 25, 2024

When a cn nodes in k8s cluster crash and restart, non-transactional stream load starts to get stuck #49950

When a cn nodes in k8s cluster crash and restart, non-transactional stream load starts to get stuck #49950

Comments

snippins commented Aug 19, 2024 • edited Loading

Steps to reproduce the behavior (Required)

Expected behavior (Required)

Real behavior (Required)

StarRocks version (Required)

Workarounds:

kevincai commented Nov 23, 2024

snippins commented Nov 25, 2024

kevincai commented Nov 25, 2024

snippins commented Aug 19, 2024 •

edited

Loading