GitHub - MichaelYin1994/netman-2018-kpi-anomaly-detection: 2018年国际AIOps挑战赛KPI时序异常检测比赛基于OpenMLDB部署的工程化部署实践方案

2018年AIOps国际挑战赛KPI异常检测工程化部署方法实践

作者简介

作者：鱼丸粗面（[email protected]）。整体采用了此项目的编码规范，逐步完善中（20220617）。

系统环境与组件依赖

系统环境:

Ubuntu 20.04 LTS
GPU: NVIDIA Corporation GP104GL Quadro P5000
CPU: Intel® Core™ i9-9920X CPU @ 3.50GHz × 24
RAM: 94G
CUDA: 11.4
swap: 96G

开源组件:

OpenMLDB: 用于stream形式数据的实时特征工程。
cAdivisor: 用于容器状态监控。
Triton inference server: 用于serving XGBoost模型与DL-based模型。
Prometheus + Grafana: 用于容器状态的可视化监控。

源代码说明

start_openmldb_cluster.sh：启动openmldb的docker container service，注意提前修改/work/openmldb/conf/taskmanager.properties的spark.master的local线程数，采用多线程写入。
preprocessing_train_test.py：对原始比赛数据进行重新预处理、数据切分、重新存储。
create_offline_table.py：创建离线表 && 将离线数据导入数据库中。
compute_offline_feats.py：读取sql脚本 && 执行sql脚本，创建离线特征组。
train_xgb.py：读取离线特征组 && 训练XGBoost模型 && 导出模型基本参数。
deploy_realtime_fe.py：读取sql脚本（与offline feats相同） && 创建在线表 && 导入在线数据 && 部署在线特征脚本。
deploy_triton_server.py && start_triton_xgb_server.sh：生成Inference Server的配置文件 && 部署Triton Inference Server后端。
deploy_inference_pipeline.py：Flask部署Inference Pipeline，Flask Server接收数据 --> Preprocessing --> Openmldb特征工程 && 实时数据插入 --> Postprocessing --> 返回inference结果。

Todo List

配置的yaml文件进行统一管理
XGBoost Early Stopping的官方Metric的njit实现
单元测试部分对于部署的特征工程的正确性测试
Triton inference server的XGBoost模型inference测试
刨除Openmldb的window特征，特征工程对原始数据的前处理和后处理脚本部分

References

[1] https://github.com/MichaelYin1994/tianchi-pakdd-aiops-2021

[2] Lam, Siu Kwan, Antoine Pitrou, and Stanley Seibert. "Numba: A llvm-based python jit compiler." Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC. 2015.

[3] https://github.com/johannfaouzi/pyts

[4] https://github.com/blue-yonder/tsfresh

[5] Goldstein M, Dengel A. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm[J]. KI-2012: Poster and Demo Track, 2012: 59-63.

[6] Bu, Jiahao, et al. "Rapid deployment of anomaly detection models for large number of emerging kpi streams." 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC). IEEE, 2018.

[7] Ma M, Zhang S, Pei D, et al. Robust and rapid adaption for concept drift in software system anomaly detection[C]//2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2018: 13-24.

[8] Li, Zhihan, et al. "Robust and rapid clustering of kpis for large-scale anomaly detection." 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). IEEE, 2018.

[9] Li, Zeyan, Wenxiao Chen, and Dan Pei. "Robust and unsupervised kpi anomaly detection based on conditional variational autoencoder." 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC). IEEE, 2018.

[10] Liu, Dapeng, et al. "Opprentice: Towards practical and automatic anomaly detection through machine learning." Proceedings of the 2015 Internet Measurement Conference. 2015.

[11] Zhao, Nengwen, et al. "Label-less: A semi-automatic labelling tool for kpi anomalies." IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2019.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
.gitignore		.gitignore
README.MD		README.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2018年AIOps国际挑战赛KPI异常检测工程化部署方法实践

作者简介

系统环境与组件依赖

源代码说明

Todo List

References

About

Releases

Packages

Languages

MichaelYin1994/netman-2018-kpi-anomaly-detection

Folders and files

Latest commit

History

Repository files navigation

2018年AIOps国际挑战赛KPI异常检测工程化部署方法实践

作者简介

系统环境与组件依赖

源代码说明

Todo List

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages