This project explores Bayesian Networks, focusing on the K2 algorithm and its implementation using the R library BNstruct.
Project for Advanced Statistics for Physics Analysis, University of Padua, A.Y. 2023/24
Group members: Paolo Lapo Cerni, Lorenzo Vigorelli, Arman Singh Bains
In this repository, you can find the summary of our research on these topics, organized into three main files:
functions.R
defines the pipelines for the K2 algorithm and the helper functions to study and manipulate the Bayesian NetworksAnalysis.ipynb
is the notebook containing some comparisons of the performance of the algorithm and some usage of the BNstruct libraryBN_Presentation.pdf
contains the slides of the oral presentation we gave to present our work.
Bayesian Networks are probabilistic graphical models that represent a set of variables and their conditional dependencies via a directed acyclic graph (DAG). They are widely used for reasoning under uncertainty, where nodes represent variables, and edges depict the probabilistic dependencies between them.
The K2 algorithm is a popular search-based method for learning the structure of Bayesian Networks from data. It incrementally constructs the network by adding parent nodes to each node, maximizing the likelihood of the observed data under the network structure.
The scoring function has the following shape:
where
-
$\pi_i$ is the set of parents of the node$x_i$ . -
$q_i$ is the number of all possible instantiations of the parents of$x_i$ in the data. -
$r_i$ is the number of all the possible values of the attribute$x_i$ . -
$\alpha_{ijk}$ is the number of cases in the data in which$x_i$ is equal to its$k$ -th value, and the parents in$\pi_i$ are equal to their$j$ -th instantiation. -
$N_{ij}=\sum_k \alpha_{ijk}$ is the number of cases in the data in which the parents in$\pi_i$ are equal to their$j$ -th instantiation
The algorithm is efficient and works well with prior knowledge but requires a predefined node ordering.
BNstruct is an R library designed for learning and inference in Bayesian Networks, particularly when dealing with incomplete and discrete data. It provides various tools to construct, manipulate, and analyze Bayesian Networks, making it an essential resource for researchers and practitioners working in the field. Have a look at the official documentation.
[1] M. Scutari and J. B. Denis, Bayesian Networks, CRC Press, 2022, Taylor and Francis Group
[2] G. F. Cooper and E. Herskovits, A Bayesian Method for the Induction of Probabilistic Networks from Data, Machine Learning 9, (1992) 309
[3] C. Ruiz, Illustration of the K2 Algorithm for learning Bayes Net Structures, http://web.cs.wpi.edu/~cs539/s11/Projects/k2_algorithm.pdf"
[4] A. Franzin et al.,