Design Decisions

How to process multiple data samples at the same time (minibatch)?

Decision

We can use the convention employed by most TF models and assume that tensors passed through the SPN graph have at least two dimensions:

D0: instances in a minibatch
D1-DN: instance data

How to specify node-specific aspects of learning/inference?

It is often the case that we want to specify node-specific aspects of learning/inference, e.g.:

specify that some nodes should be learned using soft GD and some using hard GD
specify that weights of only some nodes should be learned
etc.

Decision

We can use a similar idea that TF employs when specifying which variables should be included during training (trainable flag specified during variable creation). Nodes in SPN graph should have flags that specify such options. These flags will be read and used by the algorithm generating TF graph from SPN graph.

Should TF graph be created when the SPN graph is created?

Should we generate TF graph when the SPN graph is built or defer that process to methods invoked on the graph that was previously created?

Reasoning

TF graph cannot be modified, while being able to adjust the SPN graph or build it from the top down might be a nice feature. Moreover, many algorithms can be applied to an SPN graph and the structure of the model should be independent from the algorithms used to learn/infer in the model. At the same time, adding all possible algorithms in the form of a TF graph would unnecessarily populate the graph.

Decision

SPN graph should not generate any TF graph elements that depend on other SPN graph nodes, but might add elements that must be present for every algorithm operating on the node. We can create variables/placeholders for storing parameters at the time of SPN node creations, but should not create operations that depend on the presence of other SPN nodes. We assume that the SPN graph is created first, and then certain methods are invoked on that graph to generate TF graph for particular computations.

How can we accept custom operations as part of an SPN graph?

If we assume that the SPN graph is created independently from the TF graph, how can we include custom operations which are not defined as SPN nodes provided by the library?

Reasoning

Option 1

We can make it easy to create custom SPN nodes by implementing a node interface. The implementation would contain the custom operations defined in terms of TF ops.

Pros:

Simple, easy to understand, solution
Encapsulates custom operations

Cons:

Additional code is needed to wrap the TF ops into a node

Option 2

We can allow using tensors directly as inputs to the SPN nodes.

Pros:

slightly less code

Cons:

Nodes cannot be used as input to the custom ops, since the TF ops of nodes are not yet generated at the SPN graph creation time

Decision

We use Option 1.

How to optimize processing of layers of SPN nodes?

It is often the case that an SPN consists of layers of identical operations (multiple IVs, layer of sums or products). How should be represent such layers?

Reasoning

Option 1

Divide the layers into separate SPN nodes in the SPN graph.

Pros:

TensorBoard will visualize each node in the graph independently.

Cons:

This will lead to a large graph and many TF operations.
This might impact the TF performance (although this is not confirmed).
In case of IVs it will make it cumbersome to feed that many separate placeholders.

Option 2

Provide separate SPN node classes and SPN layer classes

This leads to the same problems as Option 1.

Option 3

Make SPN node classes generate multiple outputs corresponding to multiple operations of the same type.

Pros:

SPN nodes can make optimizations while processing multiple operations of the same type at the same time (e.g. use matrix multiplication to perform multiple sums)
Input data can be modeled using a single placeholder containing values of multiple variables

Cons:

The structure of the SPN cannot be fully visualized in TensorBoard

Decision

We use option 3.

When should inference/learning operations be created to allow for computing using multiple devices?

As noted above, we do not want to generate TF operations at the same time when SPN nodes are created. How should we generate these operation so that we can:

compute multiple mini-batches in parallel on different devices
perform different types of inferences on the same minibatch on different devices
perform gradient computations on multiple GPU devices in parallel while weight updates on the CPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Decisions

How to process multiple data samples at the same time (minibatch)?

Decision

How to specify node-specific aspects of learning/inference?

Decision

Should TF graph be created when the SPN graph is created?

Reasoning

Decision

How can we accept custom operations as part of an SPN graph?

Reasoning

Option 1

Option 2

Decision

How to optimize processing of layers of SPN nodes?

Reasoning

Option 1

Option 2

Option 3

Decision

When should inference/learning operations be created to allow for computing using multiple devices?

How can we specify name scopes for grouping multiple SPN nodes?

How can we place an SPN in a specific TF graph?

Clone this wiki locally