-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix WordNet 3.0 gloss inconsistencies #160
Comments
G'day,
we have tried to fix these in the new English wordnet, and there is a good
python interface:
https://github.com/globalwordnet/english-wordnet
https://pypi.org/project/wn/
I think it makes more sense to move to this than try to port backfixes to
3.0
…On Sat, Sep 11, 2021 at 12:08 AM John Merkel ***@***.***> wrote:
@fcbond <https://github.com/fcbond>, @stevenbird
<https://github.com/stevenbird> There are several consistency issues with
the gloss portions of WordNet 3.0 making parsing difficult
<nltk/nltk#2527 (comment)>. Would
it be possible for us to manually fix these issues without breaking word
associations as seen with the problems currently facing the update to
WordNet 3.1 <#18>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#160>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIPZRXGQNEWLPYE7MBQ2A3UBISF5ANCNFSM5DZTMIJA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
|
Replying specifically to this: It is incredibly difficult to alter WNDB data without breaking things, as the synset IDs are byte-offsets in the file, so any modified gloss has to have the same number of bytes as before. Secondly, we're not allowed to change the Princeton WordNet data and still call it as such (it would have to be called the "NLTK Wordnet of English" or something).
That issue was closed 2 years ago, which suggests to me that there are no plans to add WordNet 3.1 to the NLTK. There was an attempt at adding next-generation wordnet support to, or alongside, the NLTK (see https://github.com/nltk/wordnet), and it included WordNet 3.1 data as an option. Development stalled, however, so I took over the effort (and package name on PyPI) with an entirely new module, which Francis has linked above. |
@goodmami, thanks for the update. This sounds like a more sustainable option. How easily could a user of the NLTK wordnet package port their code to use your package? Does it include the similarity metrics? |
Hi,
in general I think it is quite easy to port the code. The documentation
has some notes on migration from the current interface:
https://wn.readthedocs.io/en/latest/guides/nltk-migration.html
It does have the similarity metrics.
https://wn.readthedocs.io/en/latest/api/wn.similarity.html
@goodmami did a lot of work :-).
…On Tue, Sep 14, 2021 at 10:05 AM Steven Bird ***@***.***> wrote:
@goodmami <https://github.com/goodmami>, thanks for the update. This
sounds like a more sustainable option. How easily could a user of the NLTK
wordnet package port their code to use your package? Does it include the
similarity metrics?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#160 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIPZRQSAL3N6H26SESJEHLUB2UCTANCNFSM5DZTMIJA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
|
Thanks, @fcbond! @stevenbird, Wn has the similarity metrics, information content (it even reads the Back to the current issue: in the modern WN-LMF format for wordnets, Definition and Example elements are structurally separate, having been split from WNDB's combined "gloss" line in the format-conversion process. That process, however, may not account for the inconsistencies noted by @genericallyterrible, who did a nice and thorough analysis in nltk/nltk#2527. So as to not let that effort go to waste, it might be good to compare it with the WNDB-to-LMF converter. The relevant code is here. |
@fcbond, @stevenbird There are several consistency issues with the gloss portions of WordNet 3.0 making parsing difficult. Would it be possible for us to manually fix these issues without breaking word associations as seen with the problems currently facing the update to WordNet 3.1?
The text was updated successfully, but these errors were encountered: