Bayesian Networks

These are exam preparation notes, subpar in quality and certainly not of divine quality.

See the index with all articles in this series

Bayes Rule

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

Inference

Bayesian Interference is about inferring probabilities from prior probabilities.

Given a set of prior events, a bayesian network estimates the probability for another event.

Justification of using heuristics:

In real-world scenarios you never know the true probability of events. To somehow try to estimate \(P\) it makes sense to look at historic events and infer its statistical probability into the future.

To get these probabilities, a set of of so-called atomic events. Atomic events a value for each random variable and are mutually exclusive.

The problem with bayesian networks is that a fully connected network has an upper bound of connections of \(2^N\). This makes it impractical to precalculate all inferred probabilties.

This is why we need further prior knowledge. The additional prior knowledge in this case is the causality between events. If the causality between events is known or can be ruled out, this affects the interconnections and drastically reduces the number of edges.

From unconditional to conditional

\[P(x|\underline e) = \frac{P(x,\underline e)}{P(\underline e)} = \alpha P(x, \underline e) =\alpha \sum_\underline y P(x,\underline e, \underline y)\]

\(\alpha\) can be found using:

\[\frac{1}{\alpha}=P(\underline e)=\sum_{x,\underline y} P(x,\underline e, \underline y)\]

Summing up over x and y is posisble because they both sum to one, leaving only \(P(\underline e)\)

Conditional independence

X and Y are conditionally independent given Z if

\[P(X,Y|Z) = P(X|Z)P(Y|Z)\]

alternative notation

\[X ⫫ Y | Z\]

Markov Blanket

If the Markov Blanket of a node A is given, A is conditionally independent.

Members of the Markov blanket are:

parents,
children,
parents of the children

A bad picture from wikipedia

Topological Sorting

Sort a graph in an order, so that no node comes prior to a node pointing to it.

Message Passing

in Bipartite Trees

In bipartite trees there are three phases of message passing

request
collect
distribute

Unaffected nodes are marginialized out. If at a junction, multiply with both edges.

Note: The reason the bipartite tree is undirected is because bayesian probabilities are reversible. Also the nodes \(f_i\) are already aware of the direction.

Junction Trees

A junction tree (aka clique tree, join tree) is a decomposed graph that is now a tree with special properties. This makes solving certain problems, especially interference easier.

Not all acyclic graphs map to trees. Junction trees only exist iff graph is decomposable.

A separator separates two sets of nodes A and B if every path from A to B has to path through the separator.

A proper decomposition A,B,C if C is separator to A and B, C is complete.

A set of nodes is complete if all nodes are fully interconnected.

A clique is a maximally complete subgraph.

[A moral graph is the] counterpart of a directed acyclic graph is formed by adding edges between all pairs of nodes that have a common child, and then making all edges in the graph undirected.

Source

Composing the tree

construct DAG
convert to moral graph
construct a chordal graph
identify cliques
construct bipartite graph
create junction tree
Do message passing