Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent translation (non-glossary) Dutch to English (British and American) #17

Open
boeryepes opened this issue Jul 21, 2022 · 3 comments

Comments

@boeryepes
Copy link

Hi DeepL team, i'm really happy with your API solution and it's a great boost to our business.

Recently we did a translation of customer document - twice over the span of 3-4 months) and discovered on both occasions a huge variation of the translation of the same original Dutch words.

The words in question are somewhat industry specific so we did not expect a good translation out of the box, but to our suprise the same word got translated in 17 different ways (dito with the plural) - the word mostly occurred in sentences so plenty of context for the DeepL engine we assume. Total document size approx. 70 pages.

Original Dutch word: Dienstindeler
Preferred translation: shift planner
DeepL generated translations: Service Informer, Service User, service marker, service provider, duty manager, duty scheduler, service administrator, Service Inspector, service director, service scheduler, service evaluator, service integrator, service area, service end-user, service end user, service member, service participant

We ended up using our own post-translation glossary fixer to go from 17 variations back to the preferred translation.

I don't assume this is related to #16 but who knows ...

Keep up the good work!

@boeryepes
Copy link
Author

Note, we observed the same inconsistency with regular words: e.g. Bijlage (Dutch) is translated in both Appendix and in Annex

@daniel-jones-deepl
Copy link
Member

Hi @boeryepes, thanks for creating this issue and the next feedback, and sorry that my reply is so delayed.

Was this document a PDF document, and was it a scanned document? OCR can sometimes affect the sentences breaking context.

We are working on releasing Dutch -> English glossaries; when they are available this situation might be improved, but it is great that you have a workaround anyway.

@boeryepes
Copy link
Author

boeryepes commented Aug 13, 2022

Thanks for the replay @daniel-jones-deepl. This document was not a PDF.

FYI, I am aware of the issues with OCR which is why I tend to preprocess PDFs into DOCXs to avoid the OCR limitations.

Will the Dutch/English glossaries improve the issue with issue#16 that I logged in June or will it suffer from the same? This is important as glossaries are understood to be 'fixed' translations that should not suffer from the typical machine learning context-issues, i.e. different context leads to different result.

As long as issue #16 persists, the use of DeepL to translate technical texts is too limited. I currently have 1 technical term in a document that DeepL translates into 20 different ways. Pretty extreme!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants