Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check CR in Czech and try to enhance it #36

Open
michnov opened this issue Nov 11, 2015 · 2 comments
Open

check CR in Czech and try to enhance it #36

michnov opened this issue Nov 11, 2015 · 2 comments
Assignees

Comments

@michnov
Copy link
Contributor

michnov commented Nov 11, 2015

cs-en/batch2q/runs/$ANY/treexfiles/0011.streex##15

Check why the first Czech dropped pronoun in 3rd person (created by mistake) is labelled as non-anaphoric.
Check if the attribute 'aux_gram/gender' is correctly used in CR.

@michnov michnov self-assigned this Nov 11, 2015
@michnov
Copy link
Contributor Author

michnov commented Nov 13, 2015

aux_gram/gender - it is set only for dropped subjects
multi_gram/gender - set for all semantic nouns
Coref::CS::SetMultiGender

  • does the job of turning aux_gram to multi_gram and setting multi_gram for the rest
  • must precede coreference resolution
    fixed in commits 1d03e9f .. 4b17d19
TODO

there is a bug in generating coreference features for the anaphor: c_anaph_gen_* do not comply with c_anaph_gen

  • this change should require retraining the model

@michnov
Copy link
Contributor Author

michnov commented Jan 5, 2016

Changes made in the gram_gender_multivalue branch. So far (commit 1e8413e), the following has been done:

  • gram/gender and gram/number supports multiple values, delimited by the | sign
    • the wilds aux_gram/gender and multi_gram_gender/gender replaced by gram_gender
    • multivalues also for gram/number
    • A2T::DisambiguateGrammatemes applies on gram_gender and gram_number
    • TODO: delete Block::Coref::SetMultiGender
  • not using the wilds in the coref features Tool::Coreference::CS::PronCorefFeatures
    • a bug with mismatch in c_{anaph|cand}_gen_* vs. c_{anaph|cand}_gen fixed
  • model retrained with new features
    • performs worse in terms of BLEU on qtleap/cs_en/news: 13.11 vs. 13.06
    • TODO: the performance should be increased
  • antecedent candidates checked if they are not already coreferential with an anaphor
    • to prevent cycles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant