Add trtAssignment as a "distribution" to defData to ease flow? #69

kgoldfeld · 2020-10-12T21:35:28Z

Currently, the treatment assignment process using trtAssign breaks the flow of the creation of a data set. Usually there is an outcome variable that is a function of the treatment assignment - so that we need to add a column to the table after the treatment assignment is made.

# Data definitions (requires two definitions for the same data set)

d1 <- defData(varname = "g", formula = ".2;.4;.4", dist = "categorical")
d1 <- defData(d1, varname = "x1", formula = "0;10", dist = "uniform")
d1 <- defData(d1, varname = "x2", formula = "2+.5*2", variance = 3, dist = "normal")

d2 <- defDataAdd(varname = "y", formula = "2 + x2 * 3 + rx*2", variance = 5, dist = "normal")

# Data generation (requires three function calls)

dd <- genData(500, d1)
dd <- trtAssign(dd, strata = "g", grpName = "rx")
dd <- addColumns(d2, dd)

What if we added a "trtAssign" distribution to the data def table so that the treatment assignment can be part of a single data generation process? It would look like this:

# Only one data definition

d <- defData(varname = "g", formula = ".2;.4;.4", dist = "categorical")
d <- defData(d, varname = "x1", formula = "0;10", dist = "uniform")
d <- defData(d, varname = "x2", formula = "2+.5*2", variance = 3, dist = "normal")
d <- defData(d, varname = "rx", formula = "1;1", variance = "g", dist = "trtAssign")
d <- defData(d, varname = "y", formula = "2 + x2 * 3 + rx*2", variance = 5, dist = "normal")

# Only one function call to generate the data

dd <- genData(500, d)

The formula for trtAssign represents the treatment assignment ratio defaults to "1;1", but could be of any length - so, if it is "1;1;1;2" that would be four groups. The variance parameter represents the stratification. Multiple levels of stratification would be represented as "a;b;c", where a, b, and c are variable names (really need to be categorical or factors). The functionality is exactly as it is in function trtAssign.

The text was updated successfully, but these errors were encountered:

assignUser · 2020-10-13T00:55:07Z

I see what you mean, makes sense. I will likely have time to look it over at the end of the week.

kgoldfeld · 2020-10-13T01:16:58Z

Should be able to use trtAssign code, or some of it. I guess we would keep trtAssign as well.

assignUser · 2020-10-30T14:35:14Z

This just sparked an idea... we could possibly rework the data definitions / "dist" column to contain not only distributions but rather "modifications" that are applied to the data so: dists, trtAssign, addMissing, user defined functions(#71) ....
That way the complete data workflow would be contained in the definition and clearly readable, which, as i understood, is a high priority for you.

This would defintely be quite some work but could be of value and as we are considering breaking changes in several different places, it might be a good time to implement such sweeping changes (maybe as simstudy 1.0.0 ?)

kgoldfeld · 2020-10-30T14:55:49Z

I see the appeal of that, though I do have to say I like the current flow of keeping the missing data process different from the underlying (true) data generation process. They are two different processes, so I think I would like to keep them separate. As you know, though, I am very keen on being able to define the randomized treatment assignment in the data definition - that to me is a key part of the underlying data generation process. And the truncation obviously.

Maybe by excluding the missing data from this will simplify things so that it is not as big a lift once the new dataDef arguments are in place.

assignUser · 2021-07-08T20:27:48Z

#71 and #75 both seem relevant!

kgoldfeld added the feature feature request or enhancement label Oct 12, 2020

This was referenced Oct 30, 2021

Release simstudy 0.3.0 #109

Closed

make trtAssign/Obesere available as distributions #114

Merged

kgoldfeld closed this as completed in #114 Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trtAssignment as a "distribution" to defData to ease flow? #69

Add trtAssignment as a "distribution" to defData to ease flow? #69

kgoldfeld commented Oct 12, 2020

assignUser commented Oct 13, 2020

kgoldfeld commented Oct 13, 2020

assignUser commented Oct 30, 2020

kgoldfeld commented Oct 30, 2020

assignUser commented Jul 8, 2021

Add trtAssignment as a "distribution" to defData to ease flow? #69

Add trtAssignment as a "distribution" to defData to ease flow? #69

Comments

kgoldfeld commented Oct 12, 2020

assignUser commented Oct 13, 2020

kgoldfeld commented Oct 13, 2020

assignUser commented Oct 30, 2020

kgoldfeld commented Oct 30, 2020

assignUser commented Jul 8, 2021