Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utility.trim collapse whitespace for adjacent text nodes #73 #113

Merged
merged 3 commits into from
May 24, 2018

Conversation

EdgeCaseBerg
Copy link
Contributor

This fixes #73. I've got a few unit tests that are passing that verify the text is trimmed.

I have a couple questions regarding the new combineAdjacentTextNodes method though:

  1. I have it as private right now since it's internal to trimming, but should it be a general utility method? If so, is the method name ok?
  2. In the foldLeft within that function, I'm accumulating with an immutable Seq. If we're going over a lot of nodes that seems like a lost of wasted time copying over arrays, would it be alright to use a mutable Seq here for performance, or is there some other buffer I should use?

@EdgeCaseBerg
Copy link
Contributor Author

@biswanaths ?

@biswanaths
Copy link
Contributor

@EdgeCaseBerg Thanks for the PR. Let me go through the request.

Good Day.

@biswanaths
Copy link
Contributor

biswanaths commented Oct 11, 2016

@Test
  def issue73HasANodeInTextNodes : Unit = {
    val x = <div>{Text("   My name ")}{Text("  is  ")}<b> not </b>{Text("  Harry   ")}</div>
    assertEquals(<div>My name is<b>not</b>Harry</div>, Utility.trim(x)) 
  } 

What I did over here is inserted a node inside text nodes. Do we have to look into cases like this ?

@EdgeCaseBerg
Copy link
Contributor Author

EdgeCaseBerg commented Oct 11, 2016

That seems very much related to #77 (whitespace trimming = aggressive). If I look at the original ticket for this issue I see the comment

I think the semantics that were intended for trim are "collapse the whitespace". That means you can't traverse the tree node by node making decisions about one at a time. You have to examine sequences of text nodes.

I was only thinking about adjacent text nodes with this because it seems like something that can be solved. When there's another tag in their midst, with just another text node inside of it, I almost want to say that it too should be treated like a text node (so in this case, keep both sides of WS). However, I think that's also letting the trimmer try to read the semantics of the XML, which isn't something a general utility should do in my opinion.

In your example <b> is like a bold tag, and so we think to ourselves: Ah, of course we should preserve the space here, but is it the job of the trimmer to be reading into the semantics of what the tags mean just because they happen to have space in them?

I think we could do a look ahead, by examining the first child of the next non-text node, but then what about in cases like:

<div>{Text("   My name ")}{Text("  is  ")}<b> not </b><i> and I mean it, </i>{Text("  Harry   ")}</div>

or

<div>{Text("   My name ")}{Text("  is  ")}<b> not <i> and I mean it, </i></b>{Text("  Harry   ")}</div>

If we were to treat Text elements and elements that have a text child the same, it might be able to handle these cases and it might even work. I could probably write an extractor to do this so we could do

case x @ isTextOrNodeWithText => 

And we'd be able to handle it. But I still think that's trying to read meaning during trimming, which will result in cases like this being messed up:

<div>        Hi there Harry!        </div><div>       How do you do Sally?       </div>

Sorry about the stream of consciousness, just thinking out loud. So in regards to

Do we have to look into cases like this ?

I think my answer is that we shouldn't because that's trying to guess at the intent of the XML itself and not just trying to collapse text nodes and their whitespace together.

@SethTisue
Copy link
Member

needs a rebase now. other than that, where do we stand here...? is there any consensus on whether changes are needed before merging?

@EdgeCaseBerg
Copy link
Contributor Author

EdgeCaseBerg commented Feb 7, 2017

@SethTisue
Just saw this. I think I'll have time to rebase this later tonight sorry, unexpected busyness, I'll do my best tonight or tomorrow for sure!

@EdgeCaseBerg
Copy link
Contributor Author

ran git rebase master after pulling the latest master, then pushed up to the branch on my fork

@EdgeCaseBerg
Copy link
Contributor Author

EdgeCaseBerg commented Nov 6, 2017

I don't know why github's interface had to run the merge conflict commit twice (3b86f51 and b34f5ae) but that's fixed again though I haven't had the chance to compile things locally since I just noticed the merge conflict randomly today. So whenever a maintainer wants to merge this they can. aw man, I broke the compilation with the github UI change 😢 When I'm not at work later today I'll fix that

@EdgeCaseBerg
Copy link
Contributor Author

K, rebased again to resolve that merge conflict that popped up. @SethTisue / @ashawley do you know if this might make it into a release soon?

@ashawley
Copy link
Member

ashawley commented Nov 7, 2017

Hey @EdgeCaseBerg! Thanks for rebasing and getting the test suite to work. I'm leaning towards including your patch with version 1.1.1, which is the milestone version after the next planned release. This is for no particular reason than wanting to study the consequences further. I'll be in touch soon after the release is made.

@ashawley ashawley added this to the 1.1.1 milestone Nov 13, 2017
@SethTisue
Copy link
Member

@EdgeCaseBerg rebase against current master...?

@EdgeCaseBerg
Copy link
Contributor Author

@SethTisue sorry about the wait, github's "bell" notifications never catch my eye and my email is too full of work emails in my filtered github folder to catch everything. I'm rebasing now and will push things up when I'm done sorting things out

@EdgeCaseBerg
Copy link
Contributor Author

@SethTisue / @ashawley I've rebased again and all the tests pass once more.

.travis.yml Outdated
@@ -22,6 +22,7 @@ addons:
jdk:
- openjdk6
- oraclejdk8
- oraclejdk9

notifications:
email: [email protected]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ethan,

Something went awry with your rebase: 55 commits, 100 files changed.

I don't have your branch locally, but according to the most recent Travis build last fall the commit id of your branch's head was 6578a8e back then. Presumably, you could try doing the rebase again with:

git checkout -B 'utility-collapse-ws-#73' 6578a8e
git rebase master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give that a try. I'm on a new computer so maybe something went wrong from that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked out to the commit you mentioned and rebased, then force pushed the branch on my repository to get that up there. I'm not seeing any of the tests I added failed though locally I do see

[error] Failed tests:
[error]         scala.xml.CompilerErrors
[error]         scala.xml.PrintEmptyElementsTest
[error]         scala.xml.XMLTestJVM
[error]         scala.xml.XMLTest

Though I don't really know anything about those tests or why they would be failing on my machine... I'll wait for travis to confirm though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better. Thanks for rebasing.

I'll wait for travis to confirm though

Travis seems happy.

acc.dropRight(1) :+ Text(l + r)
}
case _ => acc :+ n
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ethan,

This pattern match is kind of hairy.

Couldn't you drop the lastOption business, use foldRight, and have it just be the following?

      case (Text(l), Text(r) +: tt) => Text(l + r) +: tt
      case (t, tt) => t +: tt

Will it have the same result? Does that improve comprehensibility, as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, foldRight I always use foldLeft so it never comes to mind. That would work,

import scala.xml._
val node = <A><One>1</One>{Text("Hello ")}{Text("World")}{Text(" I like pie.")}<Two>2</Two><Three><Nope>no</Nope></Three></A>

node.child.foldRight(Seq.empty[Node]) {
	case (Text(left), Text(right) +: accMinusLast) => Text(left + right) +: accMinusLast
	case (n, acc) => n +: acc 
}

// Seq[scala.xml.Node] = List(<One>1</One>, Hello World I like pie., <Two>2</Two>, <Three><Nope>no</Nope></Three>)

Looks like it would. I wasn't aware you can pattern match the last element in a list but I think that code is more elegant so I can update the PR with that and the tests will inform us if we're getting the same result (answer is probably yes)

@ashawley
Copy link
Member

Sorry it took so long to get back to this for a review. Unfortunately, this library has a long life cycle.

When time passes after I write code, I know I have to get reacquainted and relearn it, so appreciate your effort to make your brain travel back in time.

Remove lastOption and Option matching in favor of a more elegant
foldRight solution proposed by ashawley.
@EdgeCaseBerg
Copy link
Contributor Author

so appreciate your effort to make your brain travel back in time

Time travel successful. I updated the helper method you put a comment on and ran the tests locally (they passed), now just waiting for travis to confirm.

}
children.foldRight(Seq.empty[Node]) {
case (Text(left), Text(right) +: accMinusLast) => Text(left + right) +: accMinusLast
case (n, acc) => n +: acc
}
Copy link
Member

@ashawley ashawley May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware you can pattern match the last element in a list

It is pattern matching the front of the list with foldRight ("fold right starts from the left" is how I remember it). Is it important to start from the end? I presume it's not. You're just trying to merge adjacent Texts I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry, I think I wrote that when I was tired this morning and reading +: as :+ and thinking out loud about

scala> val (last :+ list) = Seq(1,2,3,4)
last: Seq[Int] = List(1, 2, 3)
list: Int = 4

Correct that the order doesn't matter so long as the accumulated list is properly prepended/appended according to the direction it was traversed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. Yeah, it's a lot for a person to keep in their head... between the colons, plus signs and foldings.

Copy link
Member

@ashawley ashawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's best to keep this new internal-only method private as you did.

I'm not especially worried about performance, since it is just an additional iterative pass over the sequence of nodes. Also, this method isn't used generally by the library, so it's even less of a priority.

Thanks for fixing this!

@ashawley ashawley merged commit 390bd11 into scala:master May 24, 2018
@ashawley ashawley mentioned this pull request May 28, 2018
@hosamaly hosamaly mentioned this pull request Jun 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scala.xml.Utility.trim() doesn't properly handle adjacent Text nodes
4 participants