Convert token to text #5

vishalkatiyar007 · 2019-02-25T11:04:32Z

Is there a way to convert the output (currently in the form of tokens) of the model to text for easy interpretation and testing?

vishalkatiyar007 · 2019-02-26T09:08:10Z

For example, the annotator marks the long answer using byte offsets, token offsets, and an index into the list of long answer candidates:
"long_answer": { "start_byte": 32, "end_byte": 106, "start_token": 5, "end_token": 22, "candidate_index": 0 }.
How to map these bytes and tokens to the text containing the answer.

filbertphang · 2019-03-04T00:33:07Z

you might want to try something like this

import jsonlines

INPUT_FILE = "nq-train-sample.jsonl"
START_TOKEN = 3521
END_TOKEN = 3525
QAS_ID = 4549465242785278785
REMOVE_HTML = True


def get_span_from_token_offsets(f, start_token, end_token, qas_id,
                                remove_html):
    for obj in f:
        if obj["example_id"] != qas_id:
            continue

        if remove_html:
            answer_span = [
                item["token"]
                for item in obj["document_tokens"][start_token:end_token]
                if not item["html_token"]
            ]
        else:
            answer_span = [
                item["token"]
                for item in obj["document_tokens"][start_token:end_token]
            ]

        return " ".join(answer_span)


with jsonlines.open(INPUT_FILE) as f:
    result = get_span_from_token_offsets(f, START_TOKEN, END_TOKEN, QAS_ID,
                                         REMOVE_HTML)

print(result)

Output: March 18 , 2018

you can read your prediction file to get the various start_tokens, end_tokens, and example_ids, then iteratively call the function to get a list of the prediction spans (write to file or whatever)

hope this helps!

vishalkatiyar007 changed the title ~~Model Testing~~ Convert token to text Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert token to text #5

Convert token to text #5

vishalkatiyar007 commented Feb 25, 2019

vishalkatiyar007 commented Feb 26, 2019

filbertphang commented Mar 4, 2019 •

edited

Loading

Convert token to text #5

Convert token to text #5

Comments

vishalkatiyar007 commented Feb 25, 2019

vishalkatiyar007 commented Feb 26, 2019

filbertphang commented Mar 4, 2019 • edited Loading

filbertphang commented Mar 4, 2019 •

edited

Loading