Skip to content
Vincent Cantin edited this page Nov 16, 2013 · 1 revision

#Java Language

Data to extract from the content

In order to describe a change using semantic differences, we need to be able to recognize those differences, which means that we need to extract some semantic data out of the content. This section is listing what kind of data is needed.

1. Addition/removal/rename of program elements

We need to parse those elements and to identify them, so we can see if they are missing, if they were added, or if they were renamed.

Using the name?

For some elements which have a name (class, method, class field, local field, etc ...), the name can be used as an identifier to detect addition or removal. However, if we only rely on the name to identify them, we will fail to identify elements which were renamed. As a result, the name should at most only be a hint to identify an element.

It can be used as a relatively safe identifier only if the content of the element did not change.

Content similarity measurement

Another solution would be to see what those elements are made of: "If a duck as the same size of a duck, the color of a duck, if it sounds like a duck, then it might be a duck".

For each type of program element, we need to define a measurement which indicates the similarities of the content of two elements.

Usage similarity measurement

Another solution would be to see how those elements are used in the program: "If it catches fishes like a duck, talks to other ducks like a duck, if the farmer feed it like a duck, and it is sleeps at the place for ducks, then it might be a duck".

For each type of program element, we need to define a measurement which indicates the similarities of how two elements are referred and/or used in the program.

3 measurements for 1 decision

All 3 measurements should be combined together in order to help identify the program elements after and before a change. Experimentations can help deciding the best way of combining them.

2. Movement of program elements

Once we have a robust way to identify the program elements (see the section above), to detect if one was moved to another place is trivial.

3. Modification of program elements

The modification of a program element can be seen as a modification of each surrounding program element. Will each description will be correct, the user would probably prefer the description of a modification on the smallest of those element, optionally the smallest one which has a name.

If a modified element was renamed and modified, user would probably prefer to have the rename described first, then the modification. In other words, this type of semantic difference will always be used in the last position when describing a change using a list of semantic differences.

4. Reformulation of code blocks

Here, it means the changes which present the same execution paths in a different manner.

For example, before the change:

if (object == null) {
  return null;
}

return object.toString();

After the change:

if (object != null) {
  return object.toString();
}

return null;

5. Reformulation of logical expressions

For example, before the change: !(a || b)

After the change: (!a && !b)

6. Modification of the spacing

Here, spacing means space characters, tabs and new lines which do not change the semantic of the program.

This is a special case of modification because each space added, deleted, or moved does not deserve a unique identifier: Nobody need to know if it is the same space character which was moved from another place, or if it was deleted from one place, then added into another.