We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
visit里面 String name = page.select("h1").text(); String content = page.select("h2").html();
System.out.println("名称"+ name); System.out.println("内容"+ content);
打印台结果 名称姝h���瑁��ㄨ���ㄧО����瑁��虹�-DXDK110 内容姝h���瑁��ㄨ���ㄧО����瑁��虹�-DXDK110浜у��绠�浠
The text was updated successfully, but these errors were encountered:
可以通过page.charset("utf-8")方法,设置对应的网页编码后,再进行上述操作。
Sorry, something went wrong.
@hujunxianligong cn.edu.hfut.dmic.webcollector.util.CharsetDetector#guessEncoding可不可以改改,当猜测为gb2312时,直接修改为GB18030。 GB18030兼容GBK和GB2312,比如这个页面 http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2018/44/4419.html 它的页面明明是gb2312,但cn.edu.hfut.dmic.webcollector.model.Page#html()就是乱码。使用浏览器也没乱码。但用page.charset("GB18030")也没乱码,但不想每个页面都设一下。
No branches or pull requests
visit里面
String name = page.select("h1").text();
String content = page.select("h2").html();
System.out.println("名称"+ name);
System.out.println("内容"+ content);
打印台结果
名称姝h���瑁��ㄨ���ㄧО����瑁��虹�-DXDK110
内容姝h���瑁��ㄨ���ㄧО����瑁��虹�-DXDK110浜у��绠�浠
The text was updated successfully, but these errors were encountered: