-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 使用 simple 模式过滤时出现不正常任务 #111
Comments
【 Code Review 】自己的分析 当标签页收集到所有链接后,均是通过异步的方式加入任务池,便会有大量阻塞的任务。推测:这就是即使退出了浏览器也能继续执行任务的原因 Lines 202 to 208 in dbf7064
另一个不理解问题:但页面超时的控制是在 |
I'm getting the same error(navigate timeout). Please fix this bug. @Qianlitp |
This critical bug has not been resolved. Crawlergo fails to perform its most important function. |
这个问题修复了吗? |
Version
go version go1.18 darwin/arm64
Google Chrome 103.0.5060.53
macOS 12.4 (21F79) [m1]
arm64
321828272c66c95a05fd262365a501e8f7b5d031
dbf70647a44bbfbdaeec98791f90c2497d781708
- latest问题描述
当我用
simple
过滤模式爬取某个目标站点时,发现当某个网页存在上千条链接时便会出现如下情况:Crawling *******
,但浏览器没有打开新标签页Crawling ********
当页面链接数比较少时不会发生这种情况
执行命令
增加额外日志输出
crawlergo/pkg/task_main.go
Lines 230 to 232 in dbf7064
我在 231 行后添加了日志输出便于调试
日志截图如下
圈起来的位置解释
在第一张图中,获取到了超过 1000 条 url ,第二张图是后续的爬虫任务都会超时,浏览器标签页也不会打开新页面。
复现步骤
Commit Version
上测试观察到的表现
大概几分钟后
Crawling ****
后面跟随的日志也是大量的navigate timeout
期望表现
当页面存在大量链接时:
Crawling ***
时标签页会打开新页面The text was updated successfully, but these errors were encountered: