Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate README.md to English #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 42 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,59 @@
# 基于blackbox构建的Pingmesh体系
# Based on

## 背景
## Background

数据中心自身是极为复杂的,其中网络涉及到的设备很多就显得更为复杂,一个大型数据中心都有成百上千的节点、网卡、交换机、路由器以及无数的网线、光纤。在这些硬件设备基础上构建了很多软件,比如搜索引擎、分布式文件系统、分布式存储等等。在这些系统运行过程中,面临一些问题:如何判断一个故障是网络故障?如何定义和追踪网络的 SLA?出了故障如何去排查?
Data centers themselves are highly complex, with networks involving numerous devices that further amplify this complexity. A large-scale data center consists of hundreds or even thousands of nodes, network interface cards, switches, routers, as well as countless network cables and optical fibers. On top of these hardware components, a plethora of software systems are built, such as search engines, distributed file systems, and distributed storage, among others. During the operation of these systems, various challenges arise: How can one determine whether a fault is a network issue? How can network SLAs (Service Level Agreements) be defined and tracked? What are the procedures for troubleshooting once a fault occurs?

![IDC](https://kubeservice.cn/img/devops/IDC_hu8ec2fdff58b0ea09e7358f84cbaf1df1_175984_filter_3454788233369042773.png)

`网络性能数据监控` 就比较困难实现。 如果单纯直接使用 `ping` 命令收集结果,`每台`服务器去 ping 剩下 `(N-1)` 台,也就是 `N^2` 的复杂度,稳定性和性能都存在一些问题。
Implementing `Network Performance Data Monitoring` is quite challenging. If we simply use the `ping` command to collect results, having each server ping the remaining `(N-1)` servers would result in a complexity of `N^2`, leading to both stability and performance issues.

举个例子:
如果IDC中有10000台服务器,ping的任务就有,`10000*9999` 任务, 如果一台机器有多IP请求,结果再翻倍。
For instance:
If there are 10,000 servers in the IDC, the task of pinging would involve `10,000 * 9999` tasks. If a single machine sends multiple IP requests, the workload doubles.

对于数据存储也是一个问题,如果是每30s进行一次ping, 一次ping 需要 payload大小是64bytes
数据存储量: `10000*9999*2*64*24*3600/30` = `3.6860314e+13 bytes` = `33.52TB`
Data storage presents another issue. If pinging occurs every 30 seconds, with a payload size of 64 bytes per ping, the required data storage would be: `10,000 * 9999 * 2 * 64 * 24 * 3600 / 30` = `3.6860314e+13 bytes` = `33.52TB`.

是否只记录`fail``timeout`的记录,可以节约`99.99%`的存储空间
Considering whether to only record `fail` and `timeout` instances could save `99.99%` of storage space.

## 业界实现
## Production Implementation

本体系是基于`微软Pingmesh论文`一种`增强`实现.
This architecture is an `enhanced` implementation based on the `Microsoft Pingmesh paper`.

原微软Pingmesh论文地址:
[Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis](https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p139.pdf)
Original Microsoft Pingmesh paper link:
[Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis](https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p139.pdf)

For monitoring networks, `Microsoft Pingmesh` serves as a significant breakthrough (details can be found in the original paper). However, there are several limitations in practical use:

对于`微软Pingmesh`是网络监控中一个很好突破。(具体可认真读原文)
1. Agent Data Flow: For each ping, the `Agent` records information into logs, which are then collected through the infrastructure for log data analysis. This process, involving a `log analysis` system, increases system complexity.

但是在实际使用中也有不少局限性:
2. Ping Mode Support: Only supports `UDP` mode, lacking support for protocols like `DNS tcp` and `ICMP ping`.

1. agent数据流: 对于`Agent` 每次ping完都是记录到log中,再通过基础设施进行`log`数据收集,使用`日志分析`系统加大了系统复杂性。
3. Ping Dimensions: Supports only `IPv4` pinging. However, many scenarios require support for pinging involving aspects like public network connectivity and domain/DNS across the network.

2. Ping 模式支持: 只能支持`UDP`模式, 对于`DNS tcp`、`ICMP ping`等支持比较缺少。
4. Lack of Support for Manual Real-Time Ping Attempts: Achievable through network probing with `blackbox-exporter`.

3. Ping维度:只能支持`IPv4`ping。 但很多场景需要支持 是否公网互联互通等 `domain/dns` ping
5. Lack of IPv6 Support.

4. 不支持手动实时尝试ping: 可基于`balckbox-exporter`网络探测实现

5. 不支持ipv6

## Pingmesh升级后的架构
## Upgraded Pingmesh Architecture

![Pingmesh+](https://kubeservice.cn/img/devops/pingmesh_hu8c196f2563a4108ff3fa8682517063fd_177531_filter_4759638724306006349.png)
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fkubeservice-stack%2Fpingmesh-agent.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fkubeservice-stack%2Fpingmesh-agent?ref=badge_shield)

### Controller

`Controller` 主要负责生成 `pinglist.yaml` 文件。 `pinglist` 的生成来源有3个方向:
The `Controller` is primarily responsible for generating the `pinglist.yaml` file. The generation of the `pinglist` originates from three sources:

> 通过`IP Controller`自动获取到整个集群的podIP 和 nodeIp list
> Automatically acquiring the entire cluster's podIP and nodeIP list through the `IP Controller`.

> 通过`Pinglist Controller` 活动`Agent Setting`配置
> Configuring the `Agent Setting` through the `Pinglist Controller`.

> 通过`Custom Define Pinglist` 在 `pinglist.yaml` 文件中补充 外部地址。 支持`dns地址`、`外部http地址`、`domain地址`、`ntp地址`、`Kubenetes apiserver地址`等等

`Controller` 在生成 `pinglist` 文件后,通过 `HTTP/HTTPS` 提供出去,`Agent` 会定期获取 `pinglist` 来更新 `agent` 自己的配置,也就是我们说的`拉`模式。`Controller `需要保证高可用,因此需要在 `Service` 后面配置多个实例,每个实例的算法一致,`pinglist` 文件内容也一致,保证可用性
> Supplementing external addresses in the `pinglist.yaml` file through `Custom Define Pinglist`. This includes support for `DNS addresses`, `external HTTP addresses`, `domain addresses`, `NTP addresses`, `Kubernetes API server addresses`, and more.

After generating the `pinglist` file, the `Controller` exposes it through `HTTP/HTTPS`. The `Agent` periodically retrieves the `pinglist` to update its own configuration, known as the 'pull' model. The `Controller` needs to ensure high availability, thus requiring multiple instances configured behind a `Service`. Each instance follows the same algorithm, and the content of the `pinglist` file remains consistent to ensure availability.

### Agent
每个 ping 动作都开启一个新的连接,为了减少 `Pingmesh` 造成的 `TCP` 并发. 两个server ping 的周期最小是 10s,Packet 大小最大 64kb。

Each ping action initiates a new connection to reduce the `TCP` concurrency caused by `Pingmesh`. The minimum ping interval between two servers is 10 seconds, and the maximum packet size is 64 KB.

```yaml
setting:
Expand Down Expand Up @@ -85,29 +81,32 @@ mesh:
- kubernetes.default.svc.cluster.local
```

并且做了`过载保护`
1. 如果`pinglist`中 数据很多, 在一个周期(比如`10s`)处理不完, 会保证本次处理完成后,在执行下一次, 优先一个轮回完成
2. 配置可以设置 `agent` 并发线程数,确保 `pingmesh agent` 对整个集群影响小于`千分之一`
3. metrics中是通过`Promethrus Gauge`, 在每个周期中单独计算
Furthermore, `overload protection` is implemented:

1. If there is a large amount of data in the `pinglist`, and it cannot be processed within a single cycle (e.g., `10s`), it ensures that after completing the ongoing cycle, the next one will be prioritized to complete in a full rotation.
2. Configuration can set the concurrent thread count for the `Agent`, ensuring that the impact of `Pingmesh Agent` on the entire cluster remains below `one-thousandth`.
3. In the metrics, a `Prometheus Gauge` is utilized to calculate separately within each cycle.


```metrics
# HELP ping_fail ping fail
# TYPE ping_fail gauge
ping_fail{target="8.8.8.8",tor="ping-public-demo"} 1
```

4. 为了确保 ping的请求在一个`时间窗口interval`中平均发出, 对请求job 做了内存态计算,在`并发协程`上做了`ratelimit`
4. To ensure that ping requests are evenly distributed within a defined `time window interval`, memory-state calculations are applied to the request job. A `ratelimit` is implemented on concurrent coroutines to achieve this.

## Network Condition Design

## 网络状况设计
Using the `interval` time window specified in the `pinglist.yaml` settings:
- Requests exceeding the `timeout` duration will be marked as `ping_fail`.
- Requests exceeding the `delay` but not the `timeout` duration will be marked as `ping_duration_milliseconds`.
- Requests not exceeding the `delay` will not be recorded in the metrics interface.

通过`pinglist.yaml`设置中的`interval`时间窗口:
- 请求超过了`timeout`时间, 将请求标记为 `ping_fail`
- 请求超过了`delay` 但没有超过`timeout`时间, 将请求标记为 `ping_duration_milliseconds`
- 请求没有超过`delay` ,在metrics接口中不记录
## Integration with Prometheus

## 与promtheus集成
Add the following text to the `scrape_configs` section of the `prometheus.yaml`, where `pingmeship` is the IP of the server.

将以下文本添加到promtheus.yaml的scrape_configs部分, `pingmeship`为server的ip

```yaml
scrape_configs:
Expand All @@ -126,4 +125,4 @@ scrape_configs:


## License
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fkubeservice-stack%2Fpingmesh-agent.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2Fkubeservice-stack%2Fpingmesh-agent?ref=badge_large)
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fkubeservice-stack%2Fpingmesh-agent.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2Fkubeservice-stack%2Fpingmesh-agent?ref=badge_large)