-
Notifications
You must be signed in to change notification settings - Fork 3
Design Decisions
We can use the convention employed by most TF models and assume that tensors passed through the SPN graph have at least two dimensions:
- D0: instances in a minibatch
- D1-DN: instance data
It is often the case that we want to specify node-specific aspects of learning/inference, e.g.:
- specify that some nodes should be learned using soft GD and some using hard GD
- specify that weights of only some nodes should be learned
- etc.
We can use a similar idea that TF employs when specifying which variables should be included during training (trainable
flag specified during variable creation). Nodes in SPN graph should have flags that specify such options. These flags will be read and used by the algorithm generating TF graph from SPN graph.
Should we generate TF graph when the SPN graph is built or defer that process to methods invoked on the graph that was previously created?
TF graph cannot be modified, while being able to adjust the SPN graph or build it from the top down might be a nice feature. Moreover, many algorithms can be applied to an SPN graph and the structure of the model should be independent from the algorithms used to learn/infer in the model. At the same time, adding all possible algorithms in the form of a TF graph would unnecessarily populate the graph.
SPN graph should not generate any TF graph elements that depend on other SPN graph nodes, but might add elements that must be present for every algorithm operating on the node. We can create variables/placeholders for storing parameters at the time of SPN node creations, but should not create operations that depend on the presence of other SPN nodes. We assume that the SPN graph is created first, and then certain methods are invoked on that graph to generate TF graph for particular computations.
If we assume that the SPN graph is created independently from the TF graph, how can we include custom operations which are not defined as SPN nodes provided by the library?
We can make it easy to create custom SPN nodes by implementing a node interface. The implementation would contain the custom operations defined in terms of TF ops.
Pros:
- Simple, easy to understand, solution
- Encapsulates custom operations
Cons:
- Additional code is needed to wrap the TF ops into a node
We can allow using tensors directly as inputs to the SPN nodes.
Pros:
- slightly less code
Cons:
- Nodes cannot be used as input to the custom ops, since the TF ops of nodes are not yet generated at the SPN graph creation time
We use Option 1.
It is often the case that an SPN consists of layers of identical operations (multiple IVs, layer of sums or products). How should be represent such layers?
Divide the layers into separate SPN nodes in the SPN graph.
Pros:
- TensorBoard will visualize each node in the graph independently.
Cons:
- This will lead to a large graph and many TF operations.
- This might impact the TF performance (although this is not confirmed).
- In case of IVs it will make it cumbersome to feed that many separate placeholders.
Provide separate SPN node classes and SPN layer classes
This leads to the same problems as Option 1.
Make SPN node classes generate multiple outputs corresponding to multiple operations of the same type.
Pros:
- SPN nodes can make optimizations while processing multiple operations of the same type at the same time (e.g. use matrix multiplication to perform multiple sums)
- Input data can be modeled using a single placeholder containing values of multiple variables
Cons:
- The structure of the SPN cannot be fully visualized in TensorBoard
We use option 3.
As noted above, we do not want to generate TF operations at the same time when SPN nodes are created. How should we generate these operation so that we can:
- compute multiple mini-batches in parallel on different devices
- perform different types of inferences on the same minibatch on different devices
- perform gradient computations on multiple GPU devices in parallel while weight updates on the CPU
TF can automatically places operations that support GPU computations on a GPU, but does not automatically employ multiple GPU devices. Therefore, the device on which the op will run has to be specified manually, at TF op creation time. In order to use the same mechanism with SPN computations, we need to know when exactly TF ops are created by SPN algorithms and be able to place them in contexts specifying devices.
Variables and placeholders being part of the SPN node definition can be generated at the SPN node creation time. Therefore, in order to specify the device on which the variables exist, we can simply create SPN nodes inside a device TF context.
Inference operations should be generated every time a method is executed on an SPN graph. That method can be a part of the SPN node class or belong to a separate class implementing a specific algorithm. This will make it clear when particular operations are created and make it possible to place the call to such method inside a device context.
Additionally, it will make it possible to generate two sets of TF operations performing the same type of computation (e.g. inference on a minibatch) employing the same weights. These computations can then be used to process multiple minibatches on different devices with the same weights.
More complex algorithms can be split into sub-algorithms the same way the GradientDescentOptimizer
splits gradient calculation and application. This will make it possible to place parts of the computational graph of the algorithm on different devices if necessary.
Nodes should use name scopes to group operations inside a node. However, we might want to also group multiple nodes into a name scope.
We can place nodes in name scopes when the nodes are created. This way variables will be created in a scope. We can then re-create that same scope within the scope of an inference/learning computation by looking up the scopes of the variables.
It is sometimes convenient to maintain multiple TF graphs. How can we specify the graph in which the SPN nodes should be created?
All operations and variables that are used together must be placed in a single TF graph.
When the SPN graph is created, we can place nodes in a context of a specific different graph. When SPN nodes are added as children to other SPN nodes, it should be verified that the TF graphs they exist in are the same. Then, when adding inference/learning operations, we automatically add them to the TF graph in which the SPN graph exists.
There is a special node in the SPN graph containing weights that is connected to the node performing an operation.
Pros:
- Better encapsulation, weights-related computations are stored in a separate class
- Sharing weights is less constrained and can be done by connecting a single weights node to multiple operation nodes
Cons:
- For operations using weights, always two nodes must be created, one for the operation, one for the weights.
Weights are a part of the node implementing the operation using them.
Pros:
- Weights are tightly coupled with an operation and conceptually can be seen as a part of it.
- Less nodes to create.
Cons:
- Weight sharing is less flexible and requires that operations sharing weights are implemented in a single node.
- Weight-related computations are mixed with operation-related computations in the same class
Not sure.
Often, particularly in case of discriminative models, the latent variables that are summed out by the sums of an SPN need to be made explicit. It should be possible to both set and infer their values.
A separate IVs node, identical to the one used for IVs corresponding to other variables is used. It is connected to the sum node. If the sum node has such a node connected, it uses the values of the variables during inference (if evidence is provided) and the IVs node provides methods for generating TF ops performing inference of the latent variable values if no evidence or partial evidence is provided (e.g. MPE inference).
Pros:
- Better encapsulation, IV stuff is kept in IV nodes
- IV sharing is less constrained and can be done by connecting a single IV node to multiple sums
Cons:
- None?
The IVs for the hidden variables are a part of the sum node and the interface of the sum node is extended.
Pros:
- None?
Cons:
- More difficult sharing of IVs between sum nodes in the same SPN layer. Sharing would have to happen within a sum node implementing multiple sums. Such behavior can still be achieved even if we decide to place the IVs outside the sum node.
We decide to use Option 1, and if a node implementing multiple sums using the same IVs is created, it can still be connected to an outside IV node.