Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval_knn: Splitting train-dataset / test-dataset by train_test_split is not good #2

Open
Mer1997 opened this issue Jul 20, 2022 · 1 comment

Comments

@Mer1997
Copy link

Mer1997 commented Jul 20, 2022

非专业机器学习相关,但是在磁盘预测场景下同一序列号的磁盘在发生错误的前几天 SMART 数据应该是相近的,这里使用 train_test_split 去划分训练集/数据集是否有 "泄题" 嫌疑?

实际测试中使用 2021Q1-Q4 数据训练,2022Q1 数据验证也证实该算法的结果并不好

@Mer1997
Copy link
Author

Mer1997 commented Jul 20, 2022

非专业机器学习相关,但是在磁盘预测场景下同一序列号的磁盘在发生错误的前几天 SMART 数据应该是相近的,这里使用 train_test_split 去划分训练集/数据集是否有 "泄题" 嫌疑?

实际测试中使用 2021Q1-Q4 数据训练,2022Q1 数据验证也证实该算法的结果并不好

原作者应该很久没有上 Github 了,这个 Issue 只是提醒一下大家以免花费不必要的时间

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant