Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match merging against obfuscation #1202

Merged
merged 59 commits into from
Aug 15, 2023
Merged

Match merging against obfuscation #1202

merged 59 commits into from
Aug 15, 2023

Conversation

uuqjz
Copy link
Contributor

@uuqjz uuqjz commented Jul 20, 2023

This PR proposes Match Merging, a new defence mechanism against obfuscations like insertions, alterations and swapping.

How does it work:

The greedy string tiling algorithm at the core of JPlag calculates matches between submissions.
In order to not raise similarity with very short matches the MinimumTokenMatch (MTM) defines the minimal length of a token-match in order to be considered for the similarity calculation. In Java it is 9 and in CPP 12.
Tools like Mossad or JPlag-GEN use this for plagiarism obfuscations by applying changes to submissions which keep the semantics identically but break up large matches into multiple smaller ones.
When done often enough most matches are below MTM and therefore ignored, resulting in a very low similarity score between the original and obfuscated plagiarism.
My approach aims to revert these changes on submissions by merging neighbouring matches and is based on two parameters mergeBuffer and seperatingThreshold.
mergeBuffer defines how lower the length of a match can be than the MTM.
seperatingThreshold defines how many tokens can be between two neighbouring matches.
My approach considers previously ignored matches and regular ones and merges them based on the parameters.
Merged-over tokens are removed from the submission.
Currently, both parameters default to 0 which disables my logic.
The default values will be changed in a later PR.

Which classes are affected and why:

  • MatchMerging contains the core logic for my approach
  • JPlag now calls MatchMerging in its pipeline
  • MergingParameters stores mergeBuffer and seperatingThreshold and follows ClusteringOptions
  • JPlagOptions, CLI and CliOptions now include MergingParameters also following ClusteringOptions
  • JPlagComparison has a new field ignoredMatches, which stores previously ignored Matches
  • GreedyStringTiling now lowers its MinimumMatchLength according to mergeBuffer and stores the previously ignored matches in ignoredMatches
  • Submission has a new copy function which is necessary as my approach removes tokens that are merged-over from the submission
  • MergingTest contains 7 test cases for my approach based on two samples

cli/src/main/java/de/jplag/cli/CLI.java Outdated Show resolved Hide resolved
Copy link
Member

@tsaglam tsaglam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very exhaustive review, be aware. One underlying issue I see is the anti-pattern of relying too much on fields instead of passing via parameters. All your private methods are void, thus, do not return a result. Instead, they read and write into the shared state, mainly the global matches list. I think we should refactor this so the global matches are passed as a parameter instead, and it is no longer a field. Then, the return values of the private methods should be used. computeNeighbors should return the neighbors. removeToken should return the list of altered matches (thus solving a problem described in one of the comments below). Similar for the others.

cli/src/main/java/de/jplag/cli/CliOptions.java Outdated Show resolved Hide resolved
cli/src/main/java/de/jplag/cli/CliOptions.java Outdated Show resolved Hide resolved
cli/src/main/java/de/jplag/cli/CliOptions.java Outdated Show resolved Hide resolved
core/src/main/java/de/jplag/JPlag.java Outdated Show resolved Hide resolved
core/src/main/java/de/jplag/Submission.java Outdated Show resolved Hide resolved
core/src/test/java/de/jplag/merging/MergingTest.java Outdated Show resolved Hide resolved
core/src/main/java/de/jplag/merging/MatchMerging.java Outdated Show resolved Hide resolved
core/src/main/java/de/jplag/merging/MatchMerging.java Outdated Show resolved Hide resolved
core/src/main/java/de/jplag/merging/MatchMerging.java Outdated Show resolved Hide resolved
core/src/main/java/de/jplag/merging/MatchMerging.java Outdated Show resolved Hide resolved
@uuqjz
Copy link
Contributor Author

uuqjz commented Aug 4, 2023

@tsaglam can you take a look at the code smell?

My idea would be to make removeBuffer a void function but that's contrary to one of your requests changes.

@tsaglam
Copy link
Member

tsaglam commented Aug 7, 2023

can you take a look at the code smell?

This is caused by the habit in the code to re-use input collections of a method as output collection. This should be avoided because it can lead to side-effects that were not considered. E.g. someone changing the returned list can modify the input list.

@uuqjz
Copy link
Contributor Author

uuqjz commented Aug 7, 2023

For the sake of unambiguity, I'd like to apply the following naming conventions to MatchMerging:

  • When talking about two submissions call them left and right
  • When talking about two neighboring matches call them upper and lower

This adresses the ambiguity of first and second.

Please give me feedback on this and I'll make the changes.

@uuqjz
Copy link
Contributor Author

uuqjz commented Aug 10, 2023

As discussed with Timur:
--merge-buffer and --seperating-threshold will be renamed and have their default values changed in another PR.

@uuqjz uuqjz requested a review from tsaglam August 10, 2023 15:50
Copy link
Member

@tsaglam tsaglam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just 4 minor things, and then we can merge! 👍

core/src/main/java/de/jplag/JPlag.java Outdated Show resolved Hide resolved
core/src/main/java/de/jplag/merging/MergingParameters.java Outdated Show resolved Hide resolved
core/src/main/java/de/jplag/merging/Neighbor.java Outdated Show resolved Hide resolved
@tsaglam
Copy link
Member

tsaglam commented Aug 14, 2023

@Kr0nox, alright from your side?

@uuqjz
Copy link
Contributor Author

uuqjz commented Aug 14, 2023

@tsaglam
I made the last changes.
Let's merge!

Copy link
Member

@tsaglam tsaglam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I reviewed it early as I missed that the file end behavior was not in there. Now everything is in, right? One minor thing regarding duplication.

core/src/main/java/de/jplag/merging/MatchMerging.java Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Aug 15, 2023

[JPlag Plagiarism Detector] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

95.1% 95.1% Coverage
0.0% 0.0% Duplication

@uuqjz
Copy link
Contributor Author

uuqjz commented Aug 15, 2023

I don't have merging rights, so you have to click the button

@tsaglam tsaglam merged commit 2d85029 into jplag:develop Aug 15, 2023
16 checks passed
@tsaglam
Copy link
Member

tsaglam commented Aug 15, 2023

I don't have merging rights, so you have to click the button

Yes, that is intentional. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue/PR that involves features, improvements and other changes major Major issue/feature/contribution/change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants