Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When generating html, ronn-ng may convert &lt;ws-name&gt; to <ws-name> </ws-name> #44

Open
spacewander opened this issue Apr 10, 2020 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@spacewander
Copy link
Contributor

When I try to convert a markdown file to html,

text below

git bulk [-g] ([-a]|[-w &lt;ws-name&gt;]) &lt;git command&gt; <br/>
git bulk --addworkspace &lt;ws-name&gt; &lt;ws-root-directory&gt; (--from &lt;URL or file&gt;) <br/>
git bulk --removeworkspace &lt;ws-name&gt; <br/>

is incorrectly converted to

<p>git bulk [-g] ([-a]|[-w <ws-name>]) <git command> </git></ws-name><br>
git bulk --addworkspace <ws-name> <ws-root-directory> (--from <url or file>) </url></ws-root-directory></ws-name><br>
git bulk --removeworkspace &lt;ws-name&gt; <br>

The markdown file I use is from https://github.com/tj/git-extras/blob/master/man/git-bulk.md.
The command I run is:

ronn -r \
        --manual "Git Extras" \
        --pipe \
        man/git-bulk.md > man/git-bulk.html
@apjanke
Copy link
Owner

apjanke commented Apr 11, 2020

Looks like the &lt;/&gt; entities are being converted to angle brackets and processed as HTML tags, when they shouldn't be. Definitely a ronn bug. Will look in to it.

@apjanke
Copy link
Owner

apjanke commented Sep 12, 2020

Having a hard time pinning down where and why this is happening, but it's some time after the Kramdown processing, so it looks like it may be in Ronn's use of Nokigiri, maybe related to sparklemotion/nokogiri#551.

@apjanke
Copy link
Owner

apjanke commented Sep 12, 2020

Yep, it's Nokogiri doing it. If I add in some debug code:

    def process_html!
      wrapped_html = "<html>\n  <body>\n#{input_html}\n  </body>\n</html>"
      puts "Wrapped HTML: #{wrapped_html}\n\n\n"
      @html = Nokogiri::HTML.parse(wrapped_html)
      puts "Parsed Nokogiri data: #{@html.inspect}\n"

I get this:

$ bundle exec ronn entity_encoding_test.ronn
     roff: ./entity_encoding_test.1
Wrapped HTML: <html>
  <body>

<p>Your output &lt;i&gt;might&lt;/i&gt; look like this:</p>

<pre><code>* Chris
*
* &amp;lt;b&amp;gt;GitHub&amp;lt;/b&amp;gt;
* &lt;b&gt;GitHub&lt;/b&gt;
</code></pre>

<p>Here&#39;s some special entities:</p>

<ul>
  <li>&amp;bull;       &bull;</li>
  <li>&amp;nbsp;       &nbsp;</li>
  <li>&amp;copy;       &copy;</li>
  <li>&amp;rdquo;      &rdquo;</li>
  <li>&amp;mdash;      &mdash;</li>
  <li>&amp;reg;        &reg;</li>
  <li>&amp;sect;       &sect;</li>
  <li>&amp;ge;         &ge;</li>
  <li>&amp;le;         &le;</li>
  <li>&amp;ne;         &ne;</li>
  <li>&amp;equiv;      &equiv;</li>
</ul>

<p>Here&#39;s a line that uses non-breaking spaces to force the
last&nbsp;few&nbsp;words&nbsp;to&nbsp;wrap&nbsp;together.</p>

<p>And stuff like this:</p>

<p>git bulk [-g] ([-a]|[-w &lt;ws-name&gt;]) &lt;git command&gt; <br />
git bulk --addworkspace &lt;ws-name&gt; &lt;ws-root-directory&gt; (--from &lt;URL or file&gt;) <br />
git bulk --removeworkspace &lt;ws-name&gt; <br /></p>

<p>Should have the <code>&amp;lt;</code>/<code>&amp;gt;</code> entities stay as <code>&amp;lt;</code>/<code>&amp;gt;</code> in HTML, but be
turned into literal brackets in the ROFF.</p>


  </body>
</html>


Parsed Nokogiri data: #<Nokogiri::HTML::Document:0x1040 name="document" children=[#<Nokogiri::XML::DTD:0x9ec name="html">, #<Nokogiri::XML::Element:0x102c name="html" children=[#<Nokogiri::XML::Text:0xa00 "\n  ">, #<Nokogiri::XML::Element:0x1004 name="body" children=[#<Nokogiri::XML::Text:0xa14 "\n\n">, #<Nokogiri::XML::Element:0xa3c name="p" children=[#<Nokogiri::XML::Text:0xa28 "Your output <i>might</i> look like this:">]>, #<Nokogiri::XML::Text:0xa50 "\n\n">, #<Nokogiri::XML::Element:0xa8c name="pre" children=[#<Nokogiri::XML::Element:0xa78 name="code" children=[#<Nokogiri::XML::Text:0xa64 "* Chris\n*\n* &lt;b&gt;GitHub&lt;/b&gt;\n* <b>GitHub</b>\n">]>]>, #<Nokogiri::XML::Text:0xaa0 "\n\n">, #<Nokogiri::XML::Element:0xac8 name="p" children=[#<Nokogiri::XML::Text:0xab4 "Here's some special entities:">]>, #<Nokogiri::XML::Text:0xadc "\n\n">, #<Nokogiri::XML::Element:0xd98 name="ul" children=[#<Nokogiri::XML::Text:0xaf0 "\n  ">, #<Nokogiri::XML::Element:0xb18 name="li" children=[#<Nokogiri::XML::Text:0xb04 "&bull;       •">]>, #<Nokogiri::XML::Text:0xb2c "\n  ">, #<Nokogiri::XML::Element:0xb54 name="li" children=[#<Nokogiri::XML::Text:0xb40 "&nbsp;        ">]>, #<Nokogiri::XML::Text:0xb68 "\n  ">, #<Nokogiri::XML::Element:0xb90 name="li" children=[#<Nokogiri::XML::Text:0xb7c "&copy;       ©">]>, #<Nokogiri::XML::Text:0xba4 "\n  ">, #<Nokogiri::XML::Element:0xbcc name="li" children=[#<Nokogiri::XML::Text:0xbb8 "&rdquo;      ”">]>, #<Nokogiri::XML::Text:0xbe0 "\n  ">, #<Nokogiri::XML::Element:0xc08 name="li" children=[#<Nokogiri::XML::Text:0xbf4 "&mdash;      —">]>, #<Nokogiri::XML::Text:0xc1c "\n  ">, #<Nokogiri::XML::Element:0xc44 name="li" children=[#<Nokogiri::XML::Text:0xc30 "&reg;        ®">]>, #<Nokogiri::XML::Text:0xc58 "\n  ">, #<Nokogiri::XML::Element:0xc80 name="li" children=[#<Nokogiri::XML::Text:0xc6c "&sect;       §">]>, #<Nokogiri::XML::Text:0xc94 "\n  ">, #<Nokogiri::XML::Element:0xcbc name="li" children=[#<Nokogiri::XML::Text:0xca8 "&ge;         ≥">]>, #<Nokogiri::XML::Text:0xcd0 "\n  ">, #<Nokogiri::XML::Element:0xcf8 name="li" children=[#<Nokogiri::XML::Text:0xce4 "&le;         ≤">]>, #<Nokogiri::XML::Text:0xd0c "\n  ">, #<Nokogiri::XML::Element:0xd34 name="li" children=[#<Nokogiri::XML::Text:0xd20 "&ne;         ≠">]>, #<Nokogiri::XML::Text:0xd48 "\n  ">, #<Nokogiri::XML::Element:0xd70 name="li" children=[#<Nokogiri::XML::Text:0xd5c "&equiv;      ≡">]>, #<Nokogiri::XML::Text:0xd84 "\n">]>, #<Nokogiri::XML::Text:0xdac "\n\n">, #<Nokogiri::XML::Element:0xdd4 name="p" children=[#<Nokogiri::XML::Text:0xdc0 "Here's a line that uses non-breaking spaces to force the\nlast few words to wrap together.">]>, #<Nokogiri::XML::Text:0xde8 "\n\n">, #<Nokogiri::XML::Element:0xe10 name="p" children=[#<Nokogiri::XML::Text:0xdfc "And stuff like this:">]>, #<Nokogiri::XML::Text:0xe24 "\n\n">, #<Nokogiri::XML::Element:0xeb0 name="p" children=[#<Nokogiri::XML::Text:0xe38 "git bulk [-g] ([-a]|[-w <ws-name>]) <git command> ">, #<Nokogiri::XML::Element:0xe4c name="br">, #<Nokogiri::XML::Text:0xe60 "\ngit bulk --addworkspace <ws-name> <ws-root-directory> (--from <URL or file>) ">, #<Nokogiri::XML::Element:0xe74 name="br">, #<Nokogiri::XML::Text:0xe88 "\ngit bulk --removeworkspace <ws-name> ">, #<Nokogiri::XML::Element:0xe9c name="br">]>, #<Nokogiri::XML::Text:0xec4 "\n\n">, #<Nokogiri::XML::Element:0xfdc name="p" children=[#<Nokogiri::XML::Text:0xed8 "Should have the ">, #<Nokogiri::XML::Element:0xf00 name="code" children=[#<Nokogiri::XML::Text:0xeec "&lt;">]>, #<Nokogiri::XML::Text:0xf14 "/">, #<Nokogiri::XML::Element:0xf3c name="code" children=[#<Nokogiri::XML::Text:0xf28 "&gt;">]>, #<Nokogiri::XML::Text:0xf50 " entities stay as ">, #<Nokogiri::XML::Element:0xf78 name="code" children=[#<Nokogiri::XML::Text:0xf64 "&lt;">]>, #<Nokogiri::XML::Text:0xf8c "/">, #<Nokogiri::XML::Element:0xfb4 name="code" children=[#<Nokogiri::XML::Text:0xfa0 "&gt;">]>, #<Nokogiri::XML::Text:0xfc8 " in HTML, but be\nturned into literal brackets in the ROFF.">]>, #<Nokogiri::XML::Text:0xff0 "\n\n\n  ">]>, #<Nokogiri::XML::Text:0x1018 "\n">]>]>

You can see that the &lt;/&gt; entities are still entities in the input HTML I'm passing in, but get transformed into literal angle brackets in the parsed Nokogiri objects. Hmmmm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: High priority
Development

No branches or pull requests

2 participants