-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent error handling #32
Comments
When a error is detected at higher level in the pipeline (e.g., tokenize), it seems natural that the lower level annotators (e.g., pos) annotate nothing and just ignore that sentence (or a document, if that contains sentences with errors). Or the output keeps all |
One problem of this approach is that, e.g., <sentence id="s0">
<tokens annotators="ssplit tokenize pos" errors="e0">
...
</tokens>
<erorrs>
<error id="e0" by="pos">...</error>
</errors>
</sentence> Another merit of this approach is that we can refer to the same error message from different elements, e.g., |
This is the final design now accepted in 038c850. <sentence id="s0">
<tokens .../>
<error annotator="knp">...</error>
</sentence> We do not record error id, and also links between elements on which the error occurs and Basically each annotator is agnostic about annotating In the current implementation, only This is a concrete example, which occurs when <root>
<document id="d0">
<sentences>
<sentence id="s0">
*
<tokens annotators="juman" normalized="false">
<token id="s0_tok0" form="*" characterOffsetBegin="0" characterOffsetEnd="1" yomi="*" lemma="*" pos="未定義語" posId="15" pos1="その他" pos1Id="1" cType="*" cTypeId="0" cForm="*" cFormId="0" misc="NIL"/>
</tokens>
<error annotator="knp">jigg.pipeline.ProcessError: ;; Invalid input <* * * 未定義語 15 その他 1 * 0 * 0 NIL > ! # S-ID:2 KNP:4.12-CF1.1 DATE:2016/03/16 SCORE:0.00000 ERROR:Cannot make mrph EOS</error>
</sentence>
</sentences>
</document>
</root> Error message of KNP is recorded in the text of |
TODO: check whether error handling works correctly for CoreNLP. |
Here is a proposal for how to keep track errors on the output XML when some errors are detected.
Example:
That is, an error message is surrounded by
<error>
, which keeps the annotator causing the error.This design may handle the situation where multiple annotators annotate the same XML element and only one of them fails in annotation:
errors
attribute in each element may be redundant but seems useful to check errors. I'm not sure.The text was updated successfully, but these errors were encountered: