Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when detecting html, allow for xhtml #303

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jeremybmerrill
Copy link

some html emails have additional attrs (like an xml namespace) in the <html> tag, so that "<html>" in body_content is false, e.g. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head>

msg_parser failed in parsing this email, outputting an .eml file marked as content-type text/plain but containing HTML tags.

This one-byte proposed change causes msg_parser to behave correctly, outputting an HTML-based .eml file that's parsed appropriately by other libraries.

Unfortunately, I cannot include the file that caused this error; it's confidential. I was unable to find or generate a non-confidential file that exhibits the same properties. If you have any tips on how to do so using only Mac Outlook (or maybe Outlook Web Access), I'd be happy to do so.

because some html emails have additional attrs (like an xml namespace) and so look like
<html xmlns=whatever
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant