-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e-mail messages with application/rtf body are imported as attachments, not message body #3897
Comments
This (previously WIP, now abandoned) mentions the same problems alephdata/ingest-file#20 Where is the mime detection done? I think it could work to try fix the output of readpst - by adding transport headers or otherwise; fix the RTF-parts of the e-mails, too. As I already walk over all e-mails to fix the RTF-parts, adding required headers for mime detection (message/rfc822 instead of text/html) could work, too. |
Messages can pretty easily be "tricked" into being message/rfc822, by simply adding |
In order to fix messages that have an RTF-only message body, I'm manually starting a Python script:
It's a hack. But it works and it really helps the search process. This could be run right after readpst but I really don't think this is production quality. Anyway, maybe it helps someone make a proper fix. |
OK, here's more analysis and an awful corner case. I'm documenting it here because I don't think there's a better place. I unpacked a pst file with the regular Then the awful part of the finding is, that my attachments begin with
That shouldn't happen, the readpst man page says that for |
Looking briefly, the From quoting problem is in
|
While importing an e-mail-archive in the (IMHO cursed) .PST-format, I came across a mailbox having all
application/rtf
for body type.Yep, that's right:
Content-Disposition: attachment
, but still this is the actual e-mail body.Now in Aleph, these messages will show up as empty, with
rtf-body.rtf
document as attachment.I tried to work around it by unpacking the mail archive manually with readpst; then fixing the messages with a small python script (essentially replacing the
rtf
part with anhtml
part. I used python'semail.parser
and simply checked if the firstcontent_type
would beapplication/rtf
- if so, pipe that throughunrtf
and repack the message. Filthy, but working for the mail box itself).This workaround would not help in Aleph, because the mime detection wizardry afterwards recognized
text/html
for mime type, instead of message/rfc822 - and actual attachments of the message would not be recognized anymore.The latter may count as a separate bug: a message that starts with the following should IMHO not be detected as
text/html
?The text was updated successfully, but these errors were encountered: