-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make FSTCompiler.compile() to only return the FSTMetadata #12831
Conversation
I'll put the CHANGES.txt for #12624 together with this |
8ca0921
to
6418026
Compare
* Save the FST to DataOutput. If you use an {@link org.apache.lucene.store.IndexOutput} to build | ||
* the FST, then you should not and do not need to call this method, as the FST is already saved. | ||
* Doing so will throw an {@link UnsupportedOperationException}. | ||
* Save the FST to DataOutput. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no longer possible for user to create a FST that uses NullFSTReader
, so this Javadoc will no longer holds. If users are able to create FST, then it will always be readable.
@@ -111,7 +111,7 @@ public NormalizeCharMap build() { | |||
for (Map.Entry<String, String> ent : pendingPairs.entrySet()) { | |||
fstCompiler.add(Util.toUTF16(ent.getKey(), scratch), new CharsRef(ent.getValue())); | |||
} | |||
map = fstCompiler.compile(); | |||
map = FST.fromFSTReader(fstCompiler.compile(), fstCompiler.getFSTReader()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fromFSTReader
is there to avoid the boilerplate null-check that each consumer must now do. Open for method name suggestion.
@@ -175,7 +176,7 @@ private FSTCompiler( | |||
fst = | |||
new FST<>( | |||
new FST.FSTMetadata<>(inputType, outputs, null, -1, VERSION_CURRENT, 0), | |||
toFSTReader(dataOutput)); | |||
NULL_FST_READER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we can't pass actual null to FSTReader, NullFSTReader
will be used instead.
One of the point I'm unsure about this PR is that there is now 2 (obscured) ways to construct the FST, one using the DataInput+FSTStore and one using FSTReader returned by the on-heap DataOutput. If users don't read the Javadoc, they might be confused on how to get the FST. Maybe there could be other way which feels more natural. |
5351d97
to
e1fd2b2
Compare
e1fd2b2
to
d81f49e
Compare
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
Thank you stale bot! @dungba88 -- what is the status of this change? I think it makes sense to have two FST compile+consume paths -- one on heap, that you can (efficiently) consume (read) right away without writing FST to stable storage, another that writes and then reads from stable storage. |
Thanks bot and @mikemccand, apparently I forgot about this PR! I'll make another revision. The rebase after conflict seems to break the build. |
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @dungba88, and, incredibly, no additional conflicts despite getting stale.
I'll merge soon. Thanks!
This is technically an API break, but |
Thank you for merging @mikemccand ! |
* Make FSTCompiler.compile() to only return the FSTMetadata * tidy code
Thank you! |
Description
Make FSTCompiler.compile() to only return the FSTMetadata. Depending on whether the DataOutput used implements FSTReader or not, the returned FST might be unreadable. Thus having this method only return the FSTMetadata means we will never return an unusable FST.
Now to create the FST there will be 2 ways:
Note that this PR depends on 2 others to be merged first:
FST.fromFSTReader
will also handle the case where fstMetadata is null, in which case it would return null. It is essentially a sugar syntactic utility to make the migration easier. So we will change this code:to this:
instead of: