Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove duplicate in content type from bogus raw emails #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions email/constructor_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -545,3 +545,12 @@ func bytesOrPanic(b []byte, err error) []byte {
}
return b
}

func Test_RemoveMimeDuplicate(t *testing.T) {
contentType := "image/png; x-unix-mode=0644; name=image001.png; name=image001.png"

contentType = removeDuplicate(contentType)
if contentType != "image/png; x-unix-mode=0644; name=image001.png;" {
t.Fatal("Remove duplicate return an invalid value", contentType)
}
}
30 changes: 29 additions & 1 deletion email/parser.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (
"fmt"
"io"
"io/ioutil"
"log"
"mime"
"mime/multipart"
"mime/quotedprintable"
Expand All @@ -26,6 +27,7 @@ import (
func ParseMessage(r io.Reader) (*Message, error) {
msg, err := mail.ReadMessage(&leftTrimReader{r: bufioReader(r)})
if err != nil {
log.Println("ReadMessage")
return nil, err
}
// decode any Q-encoded values
Expand Down Expand Up @@ -58,7 +60,16 @@ func parseMessageWithHeader(headers Header, bodyReader io.Reader) (*Message, err
if contentType := headers.Get("Content-Type"); len(contentType) > 0 {
mediaType, mediaTypeParams, err = mime.ParseMediaType(contentType)
if err != nil {
return nil, err
// handle duplicate property bogus content type
// and try to remove the duplicate to prevent parser error
if strings.Index(err.Error(), "duplicate") > -1 {
contentType = removeDuplicate(contentType)
mediaType, mediaTypeParams, err = mime.ParseMediaType(contentType)
}

if err != nil {
return nil, err
}
}
} // Lack of contentType is not a problem

Expand Down Expand Up @@ -93,6 +104,23 @@ func parseMessageWithHeader(headers Header, bodyReader io.Reader) (*Message, err
}, nil
}

// removeDuplicate removes duplicate from bogus content type preventing parsing of email
func removeDuplicate(s string) string {
m := make(map[string]bool)
parts := strings.Split(s, ";")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to re-lookup content types, but isn't the very first entry a special case?
Golang map keys are random, so that image/png entry might get put at the back of the list.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I'm reading this in the RFC 2046

In general, the top-level media type is used to declare the general
type of data, while the subtype specifies a specific format for that
type of data.

Not sure if "In General" means "required" though? I can have the first attribute saved, and remove duplicate than returning first attr + the cleaned up ones?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, I think

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just keep them all in order with a slice beside the map...

for _, p := range parts {
if _, ok := m[p]; !ok {
m[p] = true
}
}

var cleaned string
for k := range m {
cleaned += k + ";"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a space after the ;?
Also, you can probably just return strings.Join(m, "; ") as that is both cleaner and faster

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I got ahead of myself, it is a map not a slice/array. I guess you could use a bytes.Buffer, though it won't really matter for a small number of entries.

}
return cleaned
}

// readParts parses out the parts of a multipart body, including the preamble and epilogue.
func readParts(bodyReader io.Reader, boundary string) ([]*Message, error) {

Expand Down