Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle <unclosed </tags [rt.cpan.org #47748] #1

Open
oalders opened this issue Aug 24, 2020 · 0 comments
Open

Handle <unclosed </tags [rt.cpan.org #47748] #1

oalders opened this issue Aug 24, 2020 · 0 comments

Comments

@oalders
Copy link
Member

oalders commented Aug 24, 2020

Migrated from rt.cpan.org#47748 (status was 'new')

Requestors:

From [email protected] on 2009-07-09 17:02:41
:

The other day, I received a spam e-mail with a text/html body part like
this:

==============================================================
blah blah<br><br
<a href=http://domain/path.html target=_blank>Go!</a><br><p>blah
==============================================================

My spam filter failed to parse the href URL from the message body due to
the unclosed "<br" tag.  Closing it causes HTML::Parser to correctly
parse the URL.

I noticed that http://search.cpan.org/dist/HTML-Parser/Parser.pm#BUGS says:

«Unclosed start or end tags, e.g. "<tt<b>...</b</tt>" are not recognized.»

I don't understand what the implication of this is, however.  Is it a
conscious decision not to support unclosed tags, or has there just been
no use case for a fix?

I tried how various browsers handle the HTML code from the spam message
above:

At least the following do render the link despite the preceding broken
"<br" tag:  Firefox 3, Konqueror from KDE 3.5.9, Safari 3 & 4, Mail.app

At least the following do NOT render the link:  IE 6, Opera 9.63

I'd appreciate it if an option could be added to HTML::Parser to
recognize unclosed tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant