Skip to content

Commit

Permalink
Added RYG distributions
Browse files Browse the repository at this point in the history
  • Loading branch information
niemasd committed Apr 13, 2024
1 parent 6ecd1a6 commit e7e4694
Showing 1 changed file with 64 additions and 8 deletions.
72 changes: 64 additions & 8 deletions teach_online/academic_integrity.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,8 +178,11 @@ we would have *n*/2 cheating pairs of students out of *n*(*n*–1)/2 total pairs
In a class of *n* = 100 students,
we would have 100/2 = 50 cheating pairs and out of 100(99)/2 = 4,950 total pairs of students
(just over 1% of all pairs of students).
Thus, we can use the distribution of all pairwise MESS calculations as an approximation of the null distribution,
and we can try to identify collaboration by looking at outliers of this distribution.
Thus, we can use the distribution of all pairwise MESS calculations as an approximation of the null distribution
(null hypothesis = "MESS score resulted from no collaboration"),
and we can try to identify collaboration by looking at outliers of this distribution
(e.g. perform one-sided tests of statistical significance,
as well as multiple hypothesis test correction).

```{figure} ../images/mess_distribution.png
---
Expand All @@ -205,16 +208,69 @@ However, there are a handful of limitations of this method:
* Thus, this method is *specific* (i.e., high MESS typically implies collaboration), but not *sensitive* (i.e., it can miss true cases of cheating)

MESS gives us a way of looking at the *uniqueness* of shared incorrect responses,
but we can actually gain interesting insights from the *number* of shared incorrect responses
but we can also gain interesting insights from the *number* of shared incorrect responses
in the context of all incorrect responses they submitted.
TODO WRITE ABOUT RYG DISTRIBUTION
Specifically,
expanding on {cite:t}`moshiri_scalable_2022`,
*while* performing all MESS calculations,
we can also count the following for every pair of students
(the colors are arbitrary and aim to align with "scarier color" = "more suspicious"):

* Red Count = The number of questions both students missed with the *exact same* wrong answer
* If students collaborate, we expect a disproportionately large number of identical wrong answers between them
* Yellow Count = The number of questions both students missed, but with *different* wrong answers
* If students collaborate, we might expect them to put the same wrong answer,
so Yellow questions could be evidence against collaboration
* However, if students collaborate and are torn between two potential answers,
one might guess one answer, and one might guess another,
so Yellow questions could be evidence supporting collaboration
* Thus, overall, Yellow questions are semi-neutral
* Green Count = The number of questions only one of the two students missed
* In other words, the number of questions one student got right and one student got wrong
* If students collaborate, we expect them to miss the same questions, so a high Green Count could be evidence against collaboration
* Black Count = Red Count + Yellow Count + Green Count
* In other words, this is the total number of questions *at least* one student missed
* Why is this helpful? We'll discuss it a bit later in this section

Recall from the earlier thought experiment that
we can safely assume that the *vast majority* of pairwise comparisons are *not* cheating pairs.
As a result, we can look at the distributions of the Red, Yellow, and Green counts
across all pairs of students in the class as approximations of their null distributions
(null hypothesis = "Red, Yellow, and Green Counts resulted from no collaboration"),
and we can try to identify collaboration by looking at outliers of these distributions.
The range of possible Red, Yellow, and Green Counts for a given pair of students is bounded by their Black Count
(Black = Red + Yellow + Green),
so we can do the following:

* Plot the 2D distributions of Red, Yellow, and Green Counts (vertical axis) vs. Black Count (horizontal axis)
* In other words, each pair of students defines 3 points: (Black, Red), (Black, Yellow), and (Black, Green)
* Plot a given pair's (Black, Red), (Black, Yellow), and (Black, Green) points
* Check if the pair's Red, Yellow, and Green Counts deviate significantly from what is expected at that Black Count
* Estimate expected values based on the null distributions at that Black Count
* Perform a statistical test to check for significance
(e.g. [Fisher's exact test](https://en.wikipedia.org/wiki/Fisher%27s_exact_test)
or [χ2 test](https://en.wikipedia.org/wiki/Chi-squared_test) with 2 degrees of freedom)

```{figure} ../images/ryg_distributions.png
---
height: 500px
name: ryg_distributions
---
Distributions of all pairwise Red, Yellow, and Green Counts vs. Black Counts in a 500-person Advanced Data Structures course (log-scale).
2D [Kernel Density Estimates (KDEs)](https://en.wikipedia.org/wiki/Kernel_density_estimation) are shown as colored contours,
and best-fit lines are shown for each distribution.
A single pair of students with suspiciously outlying Red (9), Yellow (0), and Green (0) Counts for their given Black Count (9)
is shown as a black vertical line with colored dots.
```

We wrote a Python program to perform all pairwise MESS calculations,
We wrote suite of Python programs to perform all pairwise
Red Count, Yellow Count, Green Count, and MESS calculations,
calculate a best-fit [Exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution),
plot the distribution,
and perform other downstream analyses on [GitHub](https://github.com/niemasd/MESS).
plot the distributions,
and perform other downstream analyses,
which is available as an open source project on [GitHub](https://github.com/niemasd/MESS).
The tools in this repository support exams with multiple choice, short answer, math, Parsons, etc. problems:
they simply perform string equality comparisons between responses.
they simply perform string equality comparisons between responses to determine response equality.

```{glossary}
Detection
Expand Down

0 comments on commit e7e4694

Please sign in to comment.