Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

French regression between Stanza 1.8.1 and 1.8.2 #1404

Open
blegaut opened this issue Jul 18, 2024 · 16 comments
Open

French regression between Stanza 1.8.1 and 1.8.2 #1404

blegaut opened this issue Jul 18, 2024 · 16 comments
Labels

Comments

@blegaut
Copy link

blegaut commented Jul 18, 2024

Describe the bug
Take the following sentence: Assurez-vous d'être à l'heure !

The word vous has a wrong dependency relation with Stanza 1.8.2, but correct with Stanza 1.8.1
Stanza 1.8.1 :

          {
            "id": 2,
            "text": "-vous",
            "lemma": "vous",
            "upos": "PRON",
            "feats": "Emph=No|Number=Plur|Person=2|PronType=Prs",
            "head": 1,
            **"deprel": "obj",**
            "start_char": 7,
            "end_char": 12,
            "ner": "O",
            "multi_ner": [
              "O"
            ]
          },

Stanza 1.8.2 :

          {
            "id": 2,
            "text": "-vous",
            "lemma": "vous",
            "upos": "PRON",
            "feats": "Emph=No|Number=Plur|Person=2|PronType=Prs",
            "head": 1,
            **"deprel": "nsubj",**
            "start_char": 7,
            "end_char": 12,
            "ner": "O",
            "multi_ner": [
              "O"
            ]
          },

To Reproduce
Steps to reproduce the behavior: see above

Expected behavior
I would expect the same analysis independent of the version

Environment (please complete the following information):

  • OS: MacOS
  • Python version: python 3.12
  • Stanza version: [e.g., 1.0.0]

Additional context
Add any other context about the problem here.

@blegaut blegaut added the bug label Jul 18, 2024
@AngledLuffa
Copy link
Collaborator

So, I'm not surprised there are FR changes over time. We created a "combined" FR model to be the default out of four mostly compatible treebanks.

There is exactly one line with Assurez-vous in it (zero with assurez-vous) and the dependency is actually neither obj nor nsubj:

# text = Assurez-vous de boire suffisamment (au moins un à deux verres) avant et après le traitement par Aclasta, selon les instructions de votre médecin : ceci afin d'éviter une déshydratation.
1       Assurez assurer VERB    _       Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   0       root    _       SpaceAfter=No
2       -vous   vous    PRON    _       Number=Plur|Person=2|PronType=Prs|Reflex=Yes    1       expl:pv _       _

Does this dependency look reasonable to you? At any rate, I can rebuild the FR models with the latest versions of the datasets, and perhaps it will improve performance somewhat.

@blegaut
Copy link
Author

blegaut commented Jul 18, 2024

Thanks for your quick reply.

Yes expl:pv is definitively the best option here. I hope that it works when you rebuild the FR models. Please let me know how and when I can test it.

Thanks,

Bernard

@AngledLuffa
Copy link
Collaborator

Mmm, unfortunately, the models continue to call it nsubj after rebuilding with the latest versions of the git data. That's also true for the version using a transformer. One option here is to throw together a couple sentences which cover the dependency and add that to the training data. I don't know any French, so I don't think I should be the one to do it, but if you have suggested dependencies for a couple sentences, that would likely be enough.

(We could also start with parses for a couple sentences with that pair of words and correct the errors that show up.)

@blegaut
Copy link
Author

blegaut commented Jul 22, 2024

Hello, I am happy to contribute by providing a couple of corrected sentences. What would be the expected format and the proper repository ?

I also noticed some other regressions after the rebuilding with the latest versions of the git data. Is there any way to access the previous versions ?

Thanks

@AngledLuffa
Copy link
Collaborator

Is there any way to access the previous versions ?

Well..... yes, that's technically possible. They should be in the HuggingFace history for the FR models. Although the idea behind making the newer models is there will be other things that work better with the updated data

https://huggingface.co/stanfordnlp/stanza-fr

If you can come up with some example regression sentences, perhaps the best format would just be text sentences (cut down so they demonstrate the error but aren't 50 words long), I'll run them through our best models, and you can let me know where you spot the errors

@blegaut
Copy link
Author

blegaut commented Jul 22, 2024

Here are some example regression sentences:

  • Nous vous recommandons vivement d'investir dans un système aux normes.

the root should be the verb recommandons rather than the subject Nous

  • Élaborez un plan de gestion de crise.

the root should be the verb Élaborez rather than plan

  • Il semble que vous ne soyez pas informé.

almost every dependency relationships are wrong

  • Mettez en place des politiques de recouvrement plus strictes!

the root should be the verb Mettez rather than place

  • Nos experts peuvent vous conseiller.
  • experts should be the subject

Thanks,

Bernard

@AngledLuffa
Copy link
Collaborator

If I put some of these into the "accurate" models with a Transformer, it already does some of these recommendations. I can post some here:

# recommandons is the verb

# text = Nous vous recommandons vivement d'investir dans un système aux normes.
# sent_id = 0
1       Nous    nous    PRON    _       Emph=No|Number=Plur|Person=1|PronType=Prs       3       nsubj   _       start_char=0|end_char=4|ner=O
2       vous    vous    PRON    _       Emph=No|Number=Plur|Person=2|PronType=Prs       3       iobj    _       start_char=5|end_char=9|ner=O
3       recommandons    recommander     VERB    _       Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin   0       root    _       start_char=10|end_char=22|ner=O
4       vivement        vivement        ADV     _       _       3       advmod  _       start_char=23|end_char=31|ner=O
5       d'      de      ADP     _       _       6       mark    _       start_char=32|end_char=34|ner=O|SpaceAfter=No
6       investir        investir        VERB    _       VerbForm=Inf    3       xcomp   _       start_char=34|end_char=42|ner=O
7       dans    dans    ADP     _       _       9       case    _       start_char=43|end_char=47|ner=O
8       un      un      DET     _       Definite=Ind|Gender=Masc|Number=Sing|PronType=Art       9       det     _       start_char=48|end_char=50|ner=O
9       système système NOUN    _       Gender=Masc|Number=Sing 6       obl:arg _       start_char=51|end_char=58|ner=O
10-11   aux     _       _       _       _       _       _       _       start_char=59|end_char=62|ner=O
10      à       à       ADP     _       _       12      case    _       _
11      les     le      DET     _       Definite=Def|Number=Plur|PronType=Art   12      det     _       _
12      normes  norme   NOUN    _       Gender=Fem|Number=Plur  9       nmod    _       start_char=63|end_char=69|ner=O|SpaceAfter=No
13      .       .       PUNCT   _       _       3       punct   _       start_char=69|end_char=70|ner=O|SpaceAfter=No

# Élaborez is the verb

# text = Élaborez un plan de gestion de crise.
# sent_id = 0
1       Élaborez        élaborer        VERB    _       Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   0       root    _       start_char=0|end_char=8|ner=O
2       un      un      DET     _       Definite=Ind|Gender=Masc|Number=Sing|PronType=Art       3       det     _       start_char=9|end_char=11|ner=O
3       plan    plan    NOUN    _       Gender=Masc|Number=Sing 1       obj     _       start_char=12|end_char=16|ner=O
4       de      de      ADP     _       _       5       case    _       start_char=17|end_char=19|ner=O
5       gestion gestion NOUN    _       Gender=Fem|Number=Sing  3       nmod    _       start_char=20|end_char=27|ner=O
6       de      de      ADP     _       _       7       case    _       start_char=28|end_char=30|ner=O
7       crise   crise   NOUN    _       Gender=Fem|Number=Sing  5       nmod    _       start_char=31|end_char=36|ner=O|SpaceAfter=No
8       .       .       PUNCT   _       _       1       punct   _       start_char=36|end_char=37|ner=O|SpaceAfter=No

# would you check this?

# text = Il semble que vous ne soyez pas informé.
# sent_id = 0
1       Il      lui     PRON    _       Emph=No|Gender=Masc|Number=Sing|Person=3|PronType=Prs   2       expl:subj       _       start_char=0|end_char=2|ner=O
2       semble  sembler VERB    _       Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   0       root    _       start_char=3|end_char=9|ner=O
3       que     que     SCONJ   _       _       8       mark    _       start_char=10|end_char=13|ner=O
4       vous    vous    PRON    _       Emph=No|Number=Plur|Person=2|PronType=Prs       8       nsubj:pass      _       start_char=14|end_char=18|ner=O
5       ne      ne      ADV     _       Polarity=Neg    8       advmod  _       start_char=19|end_char=21|ner=O
6       soyez   être    AUX     _       Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   8       aux:pass        _       start_char=22|end_char=27|ner=O
7       pas     pas     ADV     _       Polarity=Neg    8       advmod  _       start_char=28|end_char=31|ner=O
8       informé informer        VERB    _       Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass     2       csubj   _       start_char=32|end_char=39|ner=O|SpaceAfter=No
9       .       .       PUNCT   _       _       2       punct   _       start_char=39|end_char=40|ner=O|SpaceAfter=No

# Mettez is the verb

# text = Mettez en place des politiques de recouvrement plus strictes!
# sent_id = 0
1       Mettez  mettre  VERB    _       Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   0       root    _       start_char=0|end_char=6|ner=S-LOC
2       en      en      ADP     _       _       3       case    _       start_char=7|end_char=9|ner=O
3       place   place   NOUN    _       Gender=Fem|Number=Sing  1       obl:mod _       start_char=10|end_char=15|ner=O
4-5     des     _       _       _       _       _       _       _       start_char=16|end_char=19|ner=O
4       de      de      ADP     _       _       6       case    _       _
5       les     le      DET     _       Definite=Def|Number=Plur|PronType=Art   6       det     _       _
6       politiques      politique       NOUN    _       Gender=Fem|Number=Plur  1       obl:arg _       start_char=20|end_char=30|ner=O
7       de      de      ADP     _       _       8       case    _       start_char=31|end_char=33|ner=O
8       recouvrement    recouvrement    NOUN    _       Gender=Masc|Number=Sing 6       nmod    _       start_char=34|end_char=46|ner=O
9       plus    plus    ADV     _       _       10      advmod  _       start_char=47|end_char=51|ner=O
10      strictes        strict  ADJ     _       Gender=Fem|Number=Plur  6       amod    _       start_char=52|end_char=60|ner=O|SpaceAfter=No
11      !       !       PUNCT   _       _       1       punct   _       start_char=60|end_char=61|ner=O|SpaceAfter=No

# experts is the subject

# text = Nos experts peuvent vous conseiller.
# sent_id = 0
1       Nos     son     DET     _       Number=Plur|Number[psor]=Plur|Person[psor]=1|Poss=Yes|PronType=Prs      2       det     _       start_char=0|end_char=3|ner=S-LOC
2       experts expert  NOUN    _       Gender=Masc|Number=Plur 3       nsubj   _       start_char=4|end_char=11|ner=O
3       peuvent pouvoir VERB    _       Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   0       root    _       start_char=12|end_char=19|ner=O
4       vous    vous    PRON    _       Emph=No|Number=Plur|Person=2|PronType=Prs       5       obj     _       start_char=20|end_char=24|ner=O
5       conseiller      conseiller      VERB    _       VerbForm=Inf    3       xcomp   _       start_char=25|end_char=35|ner=O|SpaceAfter=No
6       .       .       PUNCT   _       _       3       punct   _       start_char=35|end_char=36|ner=O|SpaceAfter=No

@blegaut
Copy link
Author

blegaut commented Jul 22, 2024

Everything looks good ! Thank you

@AngledLuffa
Copy link
Collaborator

This is what it came up with for ...

# text = Assurez-vous d'être à l'heure !
# sent_id = 0
1       Assurez assurer VERB    _       Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   0       root    _       start_char=0|end_char=7|ner=O|SpaceAfter=No
2       -vous   vous    PRON    _       Emph=No|Number=Plur|Person=2|PronType=Prs       1       nsubj   _       start_char=7|end_char=12|ner=O
3       d'      de      ADP     _       _       4       mark    _       start_char=13|end_char=15|ner=O|SpaceAfter=No
4       être    être    AUX     _       VerbForm=Inf    1       ccomp   _       start_char=15|end_char=19|ner=O
5       à       à       ADP     _       _       7       case    _       start_char=20|end_char=21|ner=O
6       l'      le      DET     _       Definite=Def|Number=Sing|PronType=Art   7       det     _       start_char=22|end_char=24|ner=O|SpaceAfter=No
7       heure   heure   NOUN    _       Gender=Fem|Number=Sing  4       obl:arg _       start_char=24|end_char=29|ner=O
8       !       !       PUNCT   _       _       1       punct   _       start_char=30|end_char=31|ner=O|SpaceAfter=No

but you were saying the expl:pv dep is better?

Can you suggest one or two other sentences with Assurez-vous or assurez-vous in them?

@blegaut
Copy link
Author

blegaut commented Jul 23, 2024

yes, sure. Here are a few sentences:

  • Puisque vous êtes équipé d'un logiciel de facturation, assurez-vous d'utiliser le système de relance afin de résorber les retards de paiement que vous déplorez.
  • Assurez-vous de bien suivre la réglementation qui encadre votre secteur d'activité
  • Assurez-vous de couvrir les risques potentiels, y compris les incendies, les catastrophes naturelles et le vol.

@AngledLuffa
Copy link
Collaborator

# sent_id = 0
1       Puisque puisque SCONJ   _       _       4       mark    _       start_char=0|end_char=7|ner=O
2       vous    vous    PRON    _       Number=Plur|Person=2|PronType=Prs       4       nsubj:pass      _       start_char=8|end_char=12|ner=O
3       êtes    être    AUX     _       Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   4       aux:pass        _       start_char=13|end_char=17|ner=O
4       équipé  équiper VERB    _       Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part|Voice=Pass     11      advcl   _       start_char=18|end_char=24|ner=O
5       d'      de      ADP     _       _       7       case    _       start_char=25|end_char=27|ner=O|SpaceAfter=No
6       un      un      DET     _       Definite=Ind|Gender=Masc|Number=Sing|PronType=Art       7       det     _       start_char=27|end_char=29|ner=O
7       logiciel        logiciel        NOUN    _       Gender=Masc|Number=Sing 4       obl:arg _       start_char=30|end_char=38|ner=O
8       de      de      ADP     _       _       9       case    _       start_char=39|end_char=41|ner=O
9       facturation     facturation     NOUN    _       Gender=Fem|Number=Sing  7       nmod    _       start_char=42|end_char=53|ner=O|SpaceAfter=No
10      ,       ,       PUNCT   _       _       4       punct   _       start_char=53|end_char=54|ner=O
11      assurez assurer VERB    _       Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   0       root    _       start_char=55|end_char=62|ner=O|SpaceAfter=No
12      -vous   vous    PRON    _       Emph=No|Number=Plur|Person=2|PronType=Prs       11      nsubj   _       start_char=62|end_char=67|ner=O
13      d'      de      ADP     _       _       14      mark    _       start_char=68|end_char=70|ner=O|SpaceAfter=No
14      utiliser        utiliser        VERB    _       VerbForm=Inf    11      ccomp   _       start_char=70|end_char=78|ner=O
15      le      le      DET     _       Definite=Def|Gender=Masc|Number=Sing|PronType=Art       16      det     _       start_char=79|end_char=81|ner=O
16      système système NOUN    _       Gender=Masc|Number=Sing 14      obj     _       start_char=82|end_char=89|ner=O
17      de      de      ADP     _       _       18      case    _       start_char=90|end_char=92|ner=O
18      relance relance NOUN    _       Gender=Fem|Number=Sing  16      nmod    _       start_char=93|end_char=100|ner=O
19      afin    afin    ADV     _       _       14      advmod  _       start_char=101|end_char=105|ner=O
20      de      de      ADP     _       _       21      mark    _       start_char=106|end_char=108|ner=O
21      résorber        résorber        VERB    _       VerbForm=Inf    19      ccomp   _       start_char=109|end_char=117|ner=O
22      les     le      DET     _       Definite=Def|Number=Plur|PronType=Art   23      det     _       start_char=118|end_char=121|ner=O
23      retards retard  NOUN    _       Gender=Masc|Number=Plur 21      obj     _       start_char=122|end_char=129|ner=O
24      de      de      ADP     _       _       25      case    _       start_char=130|end_char=132|ner=O
25      paiement        paiement        NOUN    _       Gender=Masc|Number=Sing 23      nmod    _       start_char=133|end_char=141|ner=O
26      que     que     PRON    _       PronType=Rel    28      obj     _       start_char=142|end_char=145|ner=O
27      vous    vous    PRON    _       Emph=No|Number=Plur|Person=2|PronType=Prs       28      nsubj   _       start_char=146|end_char=150|ner=O
28      déplorez        déplorer        VERB    _       Mood=Ind|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   23      acl:relcl       _       start_char=151|end_char=159|ner=O|SpaceAfter=No
29      .       .       PUNCT   _       _       11      punct   _       start_char=159|end_char=160|ner=O|SpaceAfter=No


# text = Assurez-vous de bien suivre la réglementation qui encadre votre secteur d'activité
# sent_id = 0
1       Assurez assurer VERB    _       Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   0       root    _       start_char=0|end_char=7|ner=O|SpaceAfter=No
2       -vous   vous    PRON    _       Number=Plur|Person=2|PronType=Prs       1       nsubj   _       start_char=7|end_char=12|ner=O
3       de      de      ADP     _       _       5       mark    _       start_char=13|end_char=15|ner=O
4       bien    bien    ADV     _       _       5       advmod  _       start_char=16|end_char=20|ner=O
5       suivre  suivre  VERB    _       VerbForm=Inf    1       xcomp   _       start_char=21|end_char=27|ner=O
6       la      le      DET     _       Definite=Def|Gender=Fem|Number=Sing|PronType=Art        7       det     _       start_char=28|end_char=30|ner=O
7       réglementation  réglementation  NOUN    _       Gender=Fem|Number=Sing  5       obj     _       start_char=31|end_char=45|ner=O
8       qui     qui     PRON    _       PronType=Rel    9       nsubj   _       start_char=46|end_char=49|ner=O
9       encadre encadrer        VERB    _       Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   7       acl:relcl       _       start_char=50|end_char=57|ner=O
10      votre   son     DET     _       Number=Sing|Poss=Yes    11      det     _       start_char=58|end_char=63|ner=O
11      secteur secteur NOUN    _       Gender=Masc|Number=Sing 9       obj     _       start_char=64|end_char=71|ner=O
12      d'      de      ADP     _       _       13      case    _       start_char=72|end_char=74|ner=O|SpaceAfter=No
13      activité        activité        NOUN    _       Gender=Fem|Number=Sing  11      nmod    _       start_char=74|end_char=82|ner=O|SpaceAfter=No



# text = Assurez-vous de couvrir les risques potentiels, y compris les incendies, les catastrophes naturelles et le vol.
# sent_id = 0
1       Assurez assurer VERB    _       Mood=Imp|Number=Plur|Person=2|Tense=Pres|VerbForm=Fin   0       root    _       start_char=0|end_char=7|ner=O|SpaceAfter=No
2       -vous   vous    PRON    _       Emph=No|Number=Plur|Person=2|PronType=Prs       1       nsubj   _       start_char=7|end_char=12|ner=O
3       de      de      ADP     _       _       4       mark    _       start_char=13|end_char=15|ner=O
4       couvrir couvrir VERB    _       VerbForm=Inf    1       ccomp   _       start_char=16|end_char=23|ner=O
5       les     le      DET     _       Definite=Def|Number=Plur|PronType=Art   6       det     _       start_char=24|end_char=27|ner=O
6       risques risque  NOUN    _       Gender=Masc|Number=Plur 4       obj     _       start_char=28|end_char=35|ner=O
7       potentiels      potentiel       ADJ     _       Gender=Masc|Number=Plur 6       amod    _       start_char=36|end_char=46|ner=O|SpaceAfter=No
8       ,       ,       PUNCT   _       _       12      punct   _       start_char=46|end_char=47|ner=O
9       y       y       PRON    _       Emph=No|ExtPos=ADP|Person=3|PronType=Prs        12      case    _       start_char=48|end_char=49|ner=O
10      compris comprendre      VERB    _       Gender=Masc|Tense=Past|VerbForm=Part|Voice=Pass 9       fixed   _       start_char=50|end_char=57|ner=O
11      les     le      DET     _       Definite=Def|Number=Plur|PronType=Art   12      det     _       start_char=58|end_char=61|ner=O
12      incendies       incendie        NOUN    _       Gender=Masc|Number=Plur 6       nmod    _       start_char=62|end_char=71|ner=O|SpaceAfter=No
13      ,       ,       PUNCT   _       _       15      punct   _       start_char=71|end_char=72|ner=O
14      les     le      DET     _       Definite=Def|Number=Plur|PronType=Art   15      det     _       start_char=73|end_char=76|ner=O
15      catastrophes    catastrophe     NOUN    _       Gender=Fem|Number=Plur  12      conj    _       start_char=77|end_char=89|ner=O
16      naturelles      naturel ADJ     _       Gender=Fem|Number=Plur  15      amod    _       start_char=90|end_char=100|ner=O
17      et      et      CCONJ   _       _       19      cc      _       start_char=101|end_char=103|ner=O
18      le      le      DET     _       Definite=Def|Gender=Masc|Number=Sing|PronType=Art       19      det     _       start_char=104|end_char=106|ner=O
19      vol     vol     NOUN    _       Gender=Masc|Number=Sing 12      conj    _       start_char=107|end_char=110|ner=O|SpaceAfter=No
20      .       .       PUNCT   _       _       1       punct   _       start_char=110|end_char=111|ner=O|SpaceAfter=No

Each of the -vous is an nsubj instead of expl:pv. Also, any thoughts on the previous one aside from the nsubj -> expl:pv change?

@blegaut
Copy link
Author

blegaut commented Jul 23, 2024

I would say that the change nsubj to expl:pv is required for all occurrences of -vous. I can't see any other changes in theses sentences. Thanks

@AngledLuffa
Copy link
Collaborator

Alright, I put a candidate fake training file here:

stanfordnlp/handparsed-treebank@0fac6a8

Any thoughts on these?

Also sent them to a former colleague who's worked on French datasets before.

@AngledLuffa
Copy link
Collaborator

If you find any other regressions, please don't hesitate to send them our way. I can rerun the depparse training with these sentences and see if it helps.

@AngledLuffa
Copy link
Collaborator

welll.... just training on those sentences isn't helping either model get the expl:pv relation in Assurez-vous. Maybe a couple more sentences would help, maybe not (there is a cutoff of 7 where it starts finetuning words, so it may indeed help to add a couple more). At any rate, I suggest using the default_accurate package, since you seemed pretty satisfied with the other parses above

@AngledLuffa
Copy link
Collaborator

Alright, I realized I had mistrained the models with the new dependencies. The new models seem to get expl:pv for a couple of the examples I tried for assurez-vous. I posted those as the new defaults. I'll send those sentences to a former colleague to see if she has any suggestions on the dependencies, just to make sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants