Union module combines given tables into one while keeping unique entry ids. Union is a local module. Like DataTransform, this module can be run on the side of Host or Guest, and running this module does not require any interaction with outside parties.
Union currently only supports joining by entry id. For tables of data instances, their header, idx and label column name (if label exists) should match.
When an id appears more than once in the joining tables, user can specify whether to keep the duplicated instances by setting parameter keep_duplicate to True. Otherwise, only the entry from its first appearance will be kept in the final combined table. Note that the order by which tables being fed into Union module depends on the job setting. As shown below:
with FATE-Pipeline:
{
"union_0": {
"module": "Union",
"input": {
"data": {
"data": ["data_transform_0.data", "data_transform_1.data", "data_transform_2.data"]
}
},
"output": {
"data": ["data"]
}
}
}
with DSL v2:
{
"union_0": {
"module": "Union",
"input": {
"data": {
"data": ["data_transform_0.data", "data_transform_1.data", "data_transform_2.data"]
}
},
"output": {
"data": ["data"]
}
}
}
Upstream tables will enter Union module in this order: data_transform_0.data, data_transform_1.data, data_transform_2.data .
If an id 42 exists in both data_transform_0.data and data_transform_1.data, and:
- 'keep_duplicate` set to false: the value from data_transform_0.data is the one being kept in the final result, its id unchanged.
- 'keep_duplicate` set to true: the value from data_transform_0.data and the one from data_transform_1.data are both kept; the id in data_transform_0.data will be transformed to 42_data_transform_0, and the id in data_transform_1.data to 42_data_transform_1.