-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ansj如何对英文粘连起来的词分词? #802
Comments
可以先分出来,词性是en,构造自定义词典,继承SmartGetWord,重写getAllWords、getFrontWords,处理父类SmartGetWord的checkNumberOrEnglish,之后自定义Recognition,在Recognition实现里,可通过以下代码拿到结果 MyGetWord getWord = new MyGetWord(myforest, "iwantto".toCharArray());
String word;
while ((word = getWord.getFrontWords()) != null) {
// 词
System.out.println(word);
// 词性,权重,...
String[] param = getWord.getParam();
System.out.println(Arrays.toString(param));
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ansj如何对英文分词? 比如这个term: iwantto
然后我想分成:i/自定义词性 want/自定义词性 to/自定义词性
这样改如何配置,需要改代码吗?
The text was updated successfully, but these errors were encountered: