-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert token to text #5
Comments
For example, the annotator marks the long answer using byte offsets, token offsets, and an index into the list of long answer candidates: |
you might want to try something like this import jsonlines
INPUT_FILE = "nq-train-sample.jsonl"
START_TOKEN = 3521
END_TOKEN = 3525
QAS_ID = 4549465242785278785
REMOVE_HTML = True
def get_span_from_token_offsets(f, start_token, end_token, qas_id,
remove_html):
for obj in f:
if obj["example_id"] != qas_id:
continue
if remove_html:
answer_span = [
item["token"]
for item in obj["document_tokens"][start_token:end_token]
if not item["html_token"]
]
else:
answer_span = [
item["token"]
for item in obj["document_tokens"][start_token:end_token]
]
return " ".join(answer_span)
with jsonlines.open(INPUT_FILE) as f:
result = get_span_from_token_offsets(f, START_TOKEN, END_TOKEN, QAS_ID,
REMOVE_HTML)
print(result)
you can read your prediction file to get the various start_tokens, end_tokens, and example_ids, then iteratively call the function to get a list of the prediction spans (write to file or whatever) hope this helps! |
Is there a way to convert the output (currently in the form of tokens) of the model to text for easy interpretation and testing?
The text was updated successfully, but these errors were encountered: