Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the difference of NN, NNS, NP "query" in the csv files of English and German? #4

Open
UWong-cmyk opened this issue Feb 27, 2024 · 3 comments
Labels
documentation Improvements or additions to documentation

Comments

@UWong-cmyk
Copy link

UWong-cmyk commented Feb 27, 2024

Hello! I've read your paper about syntactic pattern of German and English to find related qualia structures, and I've found the related csv files of syntactic patterns. However, I couldn't differentiate the NPs in English qualia patterns and German qualia patterns. I find that the NP which follows [NN, NNS],NP in the English csv files is necessary.

image

While in the German one, the NP in the similar pattern seems optional.

image

image

Why is the NP in English necessary? Thank you for your attention.

@biertz biertz added the documentation Improvements or additions to documentation label Feb 27, 2024
@lorikdumani
Copy link
Member

Hi, you have spotted that very well. In fact, NPs in the patterns are necessary for English, but not so much for German.
For clarity, take a look at the constituency trees. On closer inspection, you can observe that the POS tagger for the German language rather rarely yields matches containing NPs, while the POS tagger for the English language almost exclusively produces matches featuring NPs.
Hence, our patterns reflect our observations of the POS taggers' working methods.

However, the pattern [NP,NOUN] does not mean that NP is optional, but can be chosen. This means that the match in this example must either start with NP or with NOUN. I suppose what you mean is that NP could have been left out of the German patterns since the differences in the result would probably have been negligible.
Optional POS tags are enclosed in brackets (such as (DET) or (ADJ) in the same pattern).

@UWong-cmyk
Copy link
Author

Thank you for your clear explanation! The NP or NOUN can be selected both in German form.

However, what puzzled me is that in English csv form, does that mean the NP must follow the [NN, NNS] instead of [NN, NNS, NP]? It seems that sometimes NN or NNS can be also represented as a NP in the higher level of themselves in the constituency tree. If NN or NNS is followed by a NP, are there any grammar errors here?

Sorry to disturb you again.

@lorikdumani
Copy link
Member

Don't worry, of course you are not disturbing anyone.

After reviewing the code, we found a bug that is responsible for this phenomenon. You are absolutely right that if NN or NNS is followed by a NP, it actually leads to a grammatical error. However, the bug is tricky because it also makes the NP (as you noted) optional, i.e. as if it were in the square brackets. Nevertheless, it should actually be there anyway.
Long story short: We will fix the bugs as soon as possible.
Many thanks for your great help in finding them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants