Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust indentation when migrating edits or insertions. #41

Open
sageserpent-open opened this issue May 18, 2024 · 16 comments
Open

Adjust indentation when migrating edits or insertions. #41

sageserpent-open opened this issue May 18, 2024 · 16 comments
Labels
enhancement New feature or request

Comments

@sageserpent-open
Copy link
Owner

This is to support Python, F#, Scala 3 etc where indentation is important.

We want migrated code to compile correctly in its new home, so it should indent consistently with the surrounding code.

This may come for free when migrating edits - need a test case to check this.

Insertions need to have their relative indentation wrt their anchors determined, so that they can be aligned with the anchors' destinations.

@sageserpent-open sageserpent-open added the enhancement New feature or request label May 18, 2024
@sageserpent-open
Copy link
Owner Author

This should follow #9.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Jul 26, 2024

Something simple...

The base file:

    First line.
    Second line.
        Third line.
        Fourth line.
        Fifth line.
    Sixth line.
    Seventh line.
    Eighth line.

Let's apply a global indentation change on the left:

        First line.
        Second line.
            Third line.
            Fourth line.
            Fifth line.
        Sixth line.
        Seventh line.
        Eighth line.

Let's muck around with the indentation on the right:

    First line.
    Second line.
        Third line.
            Fourth line.
    Fifth line.
    Sixth line.
    Seventh line.
    Eighth line.

So no code motion, and all changes are just on indentation.

I want to see this as a merge:

        First line.
        Second line.
            Third line.
                Fourth line.
        Fifth line.
        Sixth line.
        Seventh line.
        Eighth line.

So the global indentation change on the left applies a a global correction, but the line-by-line indentation changes are also taken into account.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Jul 26, 2024

What should happen if the right makes significant changes that also include line-by-line indentation changes?

Say we have on the right:

    First line.
    Second line.
        Geaenderte dritte Zeile.
            Geaenderte vierte Zeile.
    Fifth line.
    Sixth line.
    Seventh line.
    Eighth line.

Obviously, I want to see this as a merge:

        First line.
        Second line.
            Geaenderte dritte Zeile.
                Geaenderte vierte Zeile.
        Fifth line.
        Sixth line.
        Seventh line.
        Eighth line.

Again, the global indentation change on the left applies a global correction.

@sageserpent-open
Copy link
Owner Author

Suppose we add code motion into the mix, first with just whitespace changes...

On the left, let's move a block in addition to a global indentation change:

        First line.
        Second line.
        Sixth line.
        Seventh line.
            Third line.
            Fourth line.
            Fifth line.
        Eighth line.

My intuition leads me to this desired merge:

        First line.
        Second line.
        Sixth line.
        Seventh line.
            Third line.
                Fourth line.
        Fifth line.
        Eighth line.

The line-by-line indentation changes are applied at the move destination.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Jul 26, 2024

Finally, let's mix code motion with significant changes that also include line-by-line indentation changes...

The base file:

    First line.
    Second line.
        Third line.
        Fourth line.
        Fifth line.
    Sixth line.
    Seventh line.
    Eighth line.

On the left:

        First line.
        Second line.
        Sixth line.
        Seventh line.
            Third line.
            Fourth line.
            Fifth line.
        Eighth line.

On the right:

    First line.
    Second line.
        Geaenderte dritte Zeile.
            Geaenderte vierte Zeile.
    Fifth line.
    Sixth line.
    Seventh line.
    Eighth line.

I think I want to see this:

        First line.
        Second line.
        Sixth line.
        Seventh line.
            Geaenderte dritte Zeile.
                Geaenderte vierte Zeile.
        Fifth line.
        Eighth line.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Jul 26, 2024

Let's go with these four examples and write tests for them. As long as they pass, I don't care what else happens regarding indentation for now, well as long as the changes other than indentation are correctly merged (including whitespace).

Note the subtletly there - the idea is to merge indentation changes, but whitespace changes within the main body of a line of text are merged as usual. If a line is indented differently on the left and right and also has whitespace-only changes within the main body of the line on either side, then once the indentation is sorted out, the existing rules for selecting whitespace changes from either side will come into play.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Jul 26, 2024

TODO: what about insertions? Let's defer these until the test examples above pass...

@sageserpent-open
Copy link
Owner Author

TODO: suppose a migrated edit or insertion has ragged boundaries that are somewhere within non-indentation text at either the source or destination of a move?

That is expected to include situations where the section has one or more linebreaks within it, but the boundaries are nonetheless ragged.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Aug 5, 2024

An outline of how this might be achieved...

  • Adjacent sections in the base, left and right sources provide indentation deltas that describe how the indentation of a succeeding section differs from its predecessor in a given file.
  • The first section in a file has a standalone indentation delta without a successor.
  • A succeeding section starting with a ragged edge uses what follows its first linebreak to contribute to its indentation delta.
  • A preceding section ending in a ragged edge uses what follows its last linebreak to contribute to its indentation delta.
  • If there is no such contribution from the predecessor, then the indentation delta just uses the succeeding section.
  • If there is no such contribution from the successor, then there is no indentation delta at all.
  • The indentation delta represents the pairing of the two lines from the predecessor and successor, finding a common whitespace prefix after the starting linebreak (or beginning of file), with (possibly empty) trailing indentation whitespace from the predecessor that needs to be deleted and (possibly empty) trailing indentation whitespace from the successor that needs to be inserted. Where there is no predecessor, the indentation whitespace for deletion is empty.
  • So a straight increase in indentation would be modelled as a delta with just an insertion, likewise a straight decrease in indentation would be a deletion only. Conversion of tabs to spaces and vice-versa would result in a combination of deletion and insertion.
  • The idea is to walk through adjacent sections for each file being merged on each side, generating indentation deltas for each pair of sections that are associated with the successor sections. The merge will look at the indentation deltas for the base, left and right sections in a match and apply a simple merge rule of "if only one of the left or right changes the delta, then that is the merged delta" - if both the left and right change the delta and they agree, that's also OK.
  • Having merged the deltas, these can be applied to the merged sections, walking down the merged files. This then aligns each succeeding section with its predecessor, using the merged delta to change the indentation level appropriately.
  • Moved sections / migrated edits / migrated insertions use their associated merged deltas wrt the predecessor section at the destination to pick up the destination indentation context.
  • It may be the case that merging cannot pick a delta due to conflicts.
  • It may be the case that applying a merged delta won't work cleanly, because the whitespace being deleted isn't actually present. If what is being deleted has the actual margin as a suffix, then we go with that. Otherwise we leave the margin as is.

@sageserpent-open
Copy link
Owner Author

NOTE: have to watch out for when a decrease in indentation of the leading line would cause further lines in the section to exceed the available margin.

Would it be better to compute the maximum common indentation of a section and base the indentation deltas off that, rather than between the successor's initial line and the predecessor's final line?

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Aug 6, 2024

Let's try the first example, doing a thought-experiment using the maximum common indentation delta...

The base file:

    First line.                // Section A, M.C.I: 4 - delta of 4 spaces.
    Second line.          
        Third line.           // Section B, M.C.I: 8 - delta of 4 spaces.
        Fourth line.
        Fifth line.
    Sixth line.               // Section C, M.C.I: 4 - delta of -4 spaces.
    Seventh line.
    Eighth line.

Let's apply a global indentation change on the left:

        First line.                // Section A, M.C.I: 8 - delta of 8 spaces.
        Second line.
            Third line.           // Section B, M.C.I: 12 - delta of 4 spaces.
            Fourth line.
            Fifth line.
        Sixth line.               // Section C, M.C.I: 8 - delta of -4 spaces.
        Seventh line.
        Eighth line.

Let's muck around with the indentation on the right:

    First line.                // Section A, M.C.I: 4 - delta of 4 spaces.
    Second line.
        Third line.           // Section B, M.C.I: 4 - delta of 0 spaces.
            Fourth line.
    Fifth line.
    Sixth line.               // Section C, M.C.I: 4 - delta of 0 spaces.
    Seventh line.
    Eighth line.

Merging the deltas:

Section A, delta of 8 spaces (from the left).
Section B, delta of 0 spaces (from the right).
Section C, delta of 0 spaces (from the right).

So that would yield...

        First line.
        Second line.
            Third line.
                Fourth line.
        Fifth line.
        Sixth line.
        Seventh line.
        Eighth line.

Looks good so far.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Aug 6, 2024

Let's try the second example, doing a thought-experiment using the maximum common indentation delta...

Say we have on the right:

    First line.                                     // Section A, M.C.I: 4 - delta of 4 spaces.
    Second line.
        Geaenderte dritte Zeile.        // Section B, M.C.I: 4 - delta of 0 spaces.
            Geaenderte vierte Zeile.
    Fifth line.
    Sixth line.                                    // Section C, M.C.I: 4 - delta of 0 spaces.
    Seventh line.
    Eighth line.

Merging the deltas:

Section A, delta of 8 spaces (from the left).
Section B, delta of 0 spaces (from the right).
Section C, delta of 0 spaces (from the right).

So that would yield...

        First line.
        Second line.
            Geaenderte dritte Zeile.
                Geaenderte vierte Zeile.
        Fifth line.
        Sixth line.
        Seventh line.
        Eighth line.

Again, looks good.

@sageserpent-open
Copy link
Owner Author

sageserpent-open commented Aug 6, 2024

The third example, using the maximum common indentation delta...

The base file:

    First line.                      // Section A, M.C.I: 4 - delta of 4 spaces.
    Second line.
        Third line.                // Section B, M.C.I: 8 - delta of 4 spaces.
        Fourth line.
        Fifth line.
    Sixth line.                    // Section C, M.C.I: 4 - delta of -4 spaces.
    Seventh line.
    Eighth line.                  // Section D, M.C.I: 4 - delta of 0 spaces.

On the left, let's move a block in addition to a global indentation change:

        First line.                      // Section A, M.C.I: 8 - delta of 8 spaces.
        Second line.
        Sixth line.                     // Section C, M.C.I: 8 - delta of 0 spaces.
        Seventh line.
            Third line.                 // Section B, M.C.I: 12 - delta of 4 spaces.
            Fourth line.
            Fifth line.
        Eighth line.                   // Section D, M.C.I: 8 - delta of -4 spaces.

Let's muck around with the indentation on the right:

    First line.                      // Section A, M.C.I: 4 - delta of 4 spaces.
    Second line.
        Third line.                // Section B, M.C.I: 4 - delta of 0 spaces.
            Fourth line.
    Fifth line.
    Sixth line.                    // Section C, M.C.I: 4 - delta of 0 spaces.
    Seventh line.
    Eighth line.                 // Section D, M.C.I: 4 - delta of 0 spaces.

Merging the deltas:

Section A, delta of 8 spaces (from the left).
Section B, delta of 0 spaces (from the right).
Section C, delta of 0 spaces (from the move destination on the left).
Section D, delta of -4 spaces (from the left)

So that would yield...

        First line.
        Second line.
        Sixth line.
        Seventh line.
            Third line.
                Fourth line.
        Fifth line.
    Eighth line.

This is slightly different from the third example as presented, but makes sense in its own right.

Perhaps section D's delta should be calculated wrt the nearest predecessor section that didn't move - namely section C? So that would give 0 on the base, 0 on the left and 0 and right.
Likewise, section B's delta could be calculated wrt the nearest predecessor section that didn't move - namely section A? So that would give 4 on the base, 4 on the left and 0 and the right.

Merging the deltas:

Section A, delta of 8 spaces (from the left).
Section B, delta of 0 spaces (from the right).
Section C, delta of 0 spaces (from the move destination on the left).
Section D, delta of 0 spaces (preserved)

So that would yield...

        First line.
        Second line.
        Sixth line.
        Seventh line.
            Third line.
                Fourth line.
        Fifth line.
        Eighth line.

Which is precisely the original example.

@sageserpent-open
Copy link
Owner Author

It looks like merging the delta of the maximum common indentation for each section could work well, but we have to apply this after discovering the moves, skipping over sections that have moved out or in, including migrated ones. Move destinations simply apply their original delta regardless of what is going on in the source of the move or its edit.

No idea what to do about insertions yet, though...

@sageserpent-open
Copy link
Owner Author

Should insertions use the delta with their preceding (or failing that, succeeding) anchor?

@sageserpent-open
Copy link
Owner Author

Added CodeMotionAnalysisExtensionTest.indentation in 5dbb4fa - this fails. Work is currently on hold for this ticket; this is just a aide-memoire.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant