Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solr4.3 useSmart=false模式下会把词分割成单个字符 #135

Open
GoogleCodeExporter opened this issue Apr 7, 2016 · 1 comment
Open

Comments

@GoogleCodeExporter
Copy link

solr 版本4.3
ik 版本 2012hf1
使用了IKTokenizerFactory接入solr,在useSmart=false模式下会把词分割�
��单个字符,在true下则不会,比如:
-------------------------------------------------------------
   solr 右侧Analysis功能中,Field Value为123,分析结果如下:
HTMLSCF text 123
IKT text        123         1     2   3
    raw_bytes   [31 32 33]  [31] [32] [33]
    start       0           0     1   2
    end         3           1     2   3
    type        ARABIC    CN_WORD CN_WORD CN_WORD
--------------------------------------------------------------
   在索引中也有1,2,3这三个字符,很奇怪false模式下会有这种结果,特别是1,2,3的类型是CN_WORD,简单了解过 IK的源码,CN_WORD类型只有在CJKSegment中匹配成词才会得到,求解决方法。



Original issue reported on code.google.com by [email protected] on 13 May 2014 at 7:57

@GoogleCodeExporter
Copy link
Author

这是我的问题,我接手维护前人的工作,才发现在class文件夹
下有个扩展词库

Original comment by [email protected] on 14 May 2014 at 1:34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant