Skip to content

Commit

Permalink
generate_intercepts() made compatible with simdata() from mirt.
Browse files Browse the repository at this point in the history
  • Loading branch information
tzoltak committed Jul 21, 2021
1 parent f17ad82 commit 5dfb6d4
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 10 deletions.
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
# rstyles 0.4.0 (20.072021)
# rstyles 0.4.0 (21.072021)

## New features

- `make_test()` assigns names to the created items by default and provides additional `names` argument if user wants to provide names himself/herself.
- `generate_test_responses()` uses items' names (if there are any) to name columns of the returned matrix.
- `generate_test_responses()` converts matrix it returns to numeric one (if only this is possible without loss of information); it also provides additional argument `tryConvertToNumeric` that allows to bring back its former behavior (i.e. returning a character matrix).
- `generate_intercepts_sml()`, and consequently `generate_intercepts()` when called with `FUNt` argument, returns intercepts matrix with additional first columns of zeros to make it compatible with the format that uses function `simdata()` from *mirt* package (`generate_test_responses()` was, and still is, able to deal with providing it intercepts either with or without such additional zeros).

## Documentation

- Additional section in README.md describing the way function `simdata()` from package *mirt* may be used to speed up generation of GPCM responses.
- Some improvements in documentation.

# rstyles 0.3.0 (5.05.2021)
Expand Down
10 changes: 6 additions & 4 deletions R/generate_items_parameters.R
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ generate_slopes <- function(nItems, scoringMatrix, ..., FUN = identity,
#' length.out = 4))
#' @export
generate_intercepts <- function(nItems, scoringMatrix, FUNd, argsd = NULL,
FUNt = NULL, argst = NULL) {
FUNt = NULL, argst = NULL) {
stopifnot("Argument `nItems` must be a positive integer." =
is.numeric(nItems),
"Argument `nItems` must be a positive integer." =
Expand Down Expand Up @@ -425,7 +425,9 @@ generate_intercepts_sml <- function(nItems, scoringMatrix, FUNd, argsd,
sort(intercepts[[i]], decreasing = TRUE) - mean(intercepts[[i]])
intercepts[[i]] <- intercepts[[i]] + difficulties[i]
}
return(t(matrix(unlist(intercepts), ncol = nItems,
dimnames =
list(paste0("d", 1L:(nrow(scoringMatrix) - 1L)), NULL))))
intercepts <- t(matrix(unlist(intercepts), ncol = nItems,
dimnames =
list(paste0("d", 1L:(nrow(scoringMatrix) - 1L)),
NULL)))
return(cbind(d0 = rep(0, nrow(intercepts)), intercepts))
}
22 changes: 17 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ There are four steps one needs to follow to simulate responses to a test:
4. Generate responses using information from the three previous steps.

- Now one may generate responses using function `generate_test_responses()`, provided with a matrix of values of the latent traits and a list of objects describing items included in the test.
- To generate responses according to GPCM you may also use function `simdata()` from package *mirt* (see below).

## Examples

Expand All @@ -59,7 +60,7 @@ There are four steps one needs to follow to simulate responses to a test:
- Below perhaps the most widely known IRTree model is used: Middle-Acquiescence-Extreme (MAE) model for a 5-point Likert scale items (Böckenholt, 2012, 2017).
- Test consist of 20 items.
- Items' *slopes* are generated from a log-normal distribution with expected value and standard deviation on the log scale being 0 and 0.2 respectively.
- Items' *intercepts* (*thresholds*) are generated from a normal distribution wit expected value of 0 and standard deviation of 1.5.
- Items' *intercepts* (*thresholds*) are generated from a normal distribution with expected value of 0 and standard deviation of 1.5.
- Latent traits are assumed to be standard normal and independent of each other (this is not a very plausible assumption).
- There are 1000 *respondents* (responses that are generated).
- Function `mirt()` from package *mirt* is used to estimate 2PL IRT model on the generated data, using so-called *pseudo-items* approach (function `expand_responses()` enables reshaping data to the *pseudo-items* form).
Expand Down Expand Up @@ -98,7 +99,7 @@ mSqt <- mirt(respWide,
'2PL')
```

### Partially-compensatory GPCM including *middle*, *extreme* and *acquiescence* response styles
### Partially-compensatory random-thresholds GPCM including *middle*, *extreme* and *acquiescence* response styles

- Below the model is defined in which apart of the *trait the test is supposed to measure*, named "i", there are three additional latent traits describing response styles that affect responses *simultaneously*. This traits may be interpreted as describing *middle* ("m"), *extreme* ("e") and *acquiescence* ("a") response styles.
- Test consist of 20 items, half of which is *reversed* (i.e. *negatively* associated with the trait called "i").
Expand Down Expand Up @@ -140,8 +141,6 @@ colnames(theta) <- colnames(vcovTraits)
# generating responses
resp <- generate_test_responses(theta, items)
resp <- apply(resp, 1:2, as.numeric)
colnames(resp) <- paste0("i", 1:ncol(resp))
# scaling
mSml <- suppressMessages(mirt(resp,
Expand All @@ -164,7 +163,7 @@ Also, it is possible to specify distinct *scoring matrices* for the *reversed* a

### Log-normal distribution parameters

Log-normal distribution is parameterized on the log scale (i.e. by parameters of the *underlying* normal distribution) but while generating parameters one is always interested in the parameters on the *exponential* scale, i.e. the scale of the sampled values. To deal with this problem package *rstyles* provides a set o functions:
Log-normal distribution is parameterized on the log scale (i.e. by parameters of the *underlying* normal distribution) but while generating parameters one is always interested in the parameters on the *exponential* scale, i.e. the scale of the sampled values. To deal with this problem package *rstyles* provides a set of functions:

- `lnorm_mean()` and `lnorm_sd()` enables to compute respectively expected value and standard deviation of the log-normal distribution with a given *meanlog* and *sdlog* parameters (compare `?rlnorm`);
- `find_pars_lnorm()` returns values of the *meanlog* and *sdlog* parameters one should use to get expected value and standard deviation of the log-normal distribution specified as arguments to this function.
Expand All @@ -175,6 +174,19 @@ If one wants to generate responses from a mixture of different *populations* (gr

One may also generate results from the different *sub-test* (collections of items) independently and then bind them using for example `cbind()` but in such a case the same matrix of the generated values of the latent traits should (typically) be used while generating responses to each *sub-test*.

### Speeding up generation of GPCM responses using `simdat()` function from the *mirt* package

# Function `simdata()` from the *mirt* package will be much faster than `generate_test_responses()` while generating GPCM responses, especially with large number of items or observations (or both). Luckily (since version 0.4.0 of *rstyles*), matrices of parameters generated by `generate_slopes()` and `generate_intercepts()` are fully-compatible with the `simdata()` function.

For example in the listing included in *Partially-compensatory GPCM including "middle", "extreme" and "acquiescence"" response styles* section above you can substitute call to `generate_test_responses()` by the following call to `simdata()`:

```{r}
respSimdata <- simdata(slopes, intercepts, N = nrow(theta), itemtype = "gpcm",
Theta = theta, gpcm_mats = rep(list(sM), ncol(slopes)))
```

However, remember that `simdata()` always returns responses as numbers starting from 0 (the first category), irrespective the scoring matrix you provide it.

# To do

- functions to compute non-GPCM (2PLM) models at nodes of *sequentially* responded items;
Expand Down

0 comments on commit 5dfb6d4

Please sign in to comment.