replaced call to _prepare_decoder_attention_mask()
with _prepare_4d_causal_attention_mask()
#1982
Job | Run time |
---|---|
3m 11s | |
3m 11s |