Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to translate blocks of subtitles instead of translating line by line #19

Open
mrnossiom opened this issue Oct 19, 2021 · 6 comments

Comments

@mrnossiom
Copy link

Hi,

I ran into a problem when trying to translate subtitles.
The program translate lines separately from each other, but this make the translation incorrect.
Instead, it would be great to translate block by block to give Google Translate more context.

I can try to think of a better solution to enhance this function:

const originalLines: string[] = [];
const lineIndexes: number[] = [];
for (const selection of selections) {
for (let lineIndex = selection.start.line; lineIndex <= selection.end.line; ++lineIndex) {
const line = textEditor.document.lineAt(lineIndex);
if (!line.isEmptyOrWhitespace && !TimeLine.parse(line.text) && !isSequenceLine(textEditor.document, lineIndex)) {
originalLines.push(line.text);
lineIndexes.push(lineIndex);
}
}
}

Thanks for your answer.

@snow212-cn
Copy link

It's urgent to improve this ability. I need it badly

@klausbadelt
Copy link

@mrnossiom @snow212-cn I agree this could radically improve translation quality. Could you volunteer a PR?

@klausbadelt
Copy link

The code seems to already do what you're asking for though, after review. All text lines are collected, then translated in 8000 character blocks. (@pepri correct me if I misunderstand the code).

async function translateLines(language: string, originalLines: string[]): Promise<string[]> {
const result: string[] = [];
const lines: string[] = [];
let length = 0;
for (const originalLine of originalLines) {
const originalLineLength = encodeURIComponent(originalLine).length;
if (length + originalLineLength + 1 > 8000) {
const translatedText = await translateText(language, lines.join('\n'));
Array.prototype.push.apply(result, translatedText.split('\n'));
lines.length = 0;
length = 0;
}
lines.push(originalLine);
length += originalLineLength + 1;
}
const translatedText = await translateText(language, lines.join('\n'));
Array.prototype.push.apply(result, translatedText.split('\n'));
return result.map(x => x.replace(/(<)(\/?)\s*([bi])\s*(>)/gi, '$1$2$3$4'));
}

I think this request could be closed.

@pepri
Copy link
Owner

pepri commented Jan 14, 2022

I batch the lines when sending them for translation, but the lines are translated independently by the translation service. To improve this, I would need to join the lines to be translated together, but then I have to split them again as I want to keep multiple lines. I already tested this with # character that works as a separator for this purpose (# character itself should be escaped so it is not lost in translation). I might implement it when I feel like it.

@mrnossiom
Copy link
Author

mrnossiom commented Jan 14, 2022

Hello @pepri and @klausbadelt,

So I did a little research and found that the free version we are using isn't documented at all. I think it is an old endpoint keep for backward compatibility reasons. I also found that Google Cloud APIs for batch traduction requires an account with billing setup.

Furthermore, I use another translator since a while called DeepL, they have a fresh API with free plan of 500k chars a month. The API supports batch translation, by sending multiples sentences at once and add context to the translation. The only drawback it that it needs an API key along with an account.

  • DeepL does the same thing as Google, and could be even better in certain cases.
  • Has a better API and is well documented.
  • Doesn't require a complex account setup. A quick tutorial in the readme should do the trick.

Maybe, we could implement the functionality with DeepL API but keep Google API wonky translations as an alternative. Tell me what you think.

Links

@mrnossiom
Copy link
Author

Hey @pepri,
Could you please answer my question above ?
I know it's more of a side project but I would really like to improve it.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants