You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importrandomimportnumpyasnpimporttorchfromtorchimportoptimfromtorch.utils.dataimportDataLoaderfromtorchvisionimporttransforms, datasetsfromcirkit.symbolic.circuitimportCircuitfromcirkit.templates.region_graphimportRandomBinaryTreefromcirkit.symbolic.layersimportCategoricalLayerfromcirkit.templates.circuit_templates._factoriesimportname_to_parameter_factory, name_to_initializerfromcirkit.pipelineimportcompileNUM_INPUT_UNITS=64NUM_SUM_UNITS=64PIXEL_RANGE=255# Load the MNIST data set and data loaderstransform=transforms.Compose([
transforms.ToTensor(),
# Set pixel values in the [0-255] rangetransforms.Lambda(lambdax: (PIXEL_RANGE*x).long())
])
defdefine_circuit_from_rg(rg):
# Here is where Overparametrisation comes ininput_factory=lambdax, y, z: CategoricalLayer(scope=x,
num_categories=PIXEL_RANGE+1,
num_channels=1, # These are grayscale imagesnum_output_units=NUM_INPUT_UNITS# Overparametrisation
)
### =========== With init below model trains fine ===================================# sum_weight_init = name_to_initializer('normal')# sum_weight_params = name_to_parameter_factory('softmax', initializer=sum_weight_init)### ========== but if no init - as below, we get nan loss ===========================sum_weight_params=None# This line leads to nan losscircuit=Circuit.from_region_graph(rg,
input_factory=input_factory,
sum_weight_factory=sum_weight_params,
num_sum_units=NUM_SUM_UNITS,
sum_product='cp')
returncircuitdeftrain_circuit(cc):
# Set some seedsrandom.seed(42)
np.random.seed(42)
torch.manual_seed(42)
# torch.cuda.manual_seed(42)# Set the torch device to usedevice=torch.device('cuda')
# Compile the circuitcircuit=compile(cc)
# Move the circuit to chosen devicecircuit=circuit.to(device)
num_epochs=5step_idx=0running_loss=0.0# Initialize a torch optimizer of your choice,# e.g., Adam, by passing the parameters of the circuitoptimizer=optim.Adam(circuit.parameters(), lr=0.01)
forepoch_idxinrange(num_epochs):
fori, (batch, _) inenumerate(train_dataloader):
# The circuit expects an input of shape (batch_dim, num_channels, num_variables),# so we unsqueeze a dimension for the channel.BS=batch.shape[0]
batch=batch.view(BS, 1, -1).to(device)
# Compute the log-likelihoods of the batch, by evaluating the circuitlog_likelihoods=circuit(batch)
# We take the negated average log-likelihood as lossloss=-torch.mean(log_likelihoods)
loss.backward()
# Update the parameters of the circuits, as any other model in PyTorchoptimizer.step()
optimizer.zero_grad()
running_loss+=loss.detach() *len(batch)
step_idx+=1ifstep_idx%100==0:
print(f"Step {step_idx}: Average NLL: {running_loss/ (100*len(batch)):.3f}")
running_loss=0.0data_train=datasets.MNIST('datasets', train=True, download=True, transform=transform)
train_dataloader=DataLoader(data_train, shuffle=True, batch_size=256)
# We can also specify depth and number of repetitions# depth=None means maximum possiblernd=RandomBinaryTree(28*28, depth=None, num_repetitions=1)
circuit=define_circuit_from_rg(rnd)
train_circuit(circuit)
In the above code when the sum weight parameterisation is not specified, the result is a loss of nan during training.
This may be confusing for somebody not familiar with the internals of the library - is there a way to avoid this?
The text was updated successfully, but these errors were encountered:
sum_weight_params=None# This line leads to nan losscircuit=Circuit.from_region_graph(rg,
input_factory=input_factory,
sum_weight_factory=sum_weight_params,
num_sum_units=NUM_SUM_UNITS,
sum_product='cp')
python example.py
Step 100: Average NLL: nan
Step 200: Average NLL: nan
Step 300: Average NLL: nan
This is due to sum weights being inited to Normal by default, but they are expected to be positive in "common" circuits, and negative values generate nan in log-sum-exp.
However we also have many projects using negative weights (with sum-product or complex-lse-sum semiring), so it makes sense to use Normal init.
This may be confusing for somebody not familiar with the internals of the library - is there a way to avoid this?
Considering this, I would agree to change the default init for sum.
But in any way, we should properly doc the default init for layers and tell the users when they should NOT rely on the default.
Code to reproduce:
In the above code when the sum weight parameterisation is not specified, the result is a loss of nan during training.
This may be confusing for somebody not familiar with the internals of the library - is there a way to avoid this?
The text was updated successfully, but these errors were encountered: