The sum-product algorithm is not limited with inference of probabilistic graphical model. It is a general method for marginalization of high-dimensional functions[1].

It is required that the high-dimensional function can be factorized as a product of local functions with lower dimensionality. An intuitive representation of a factorization is factor graph.

Both Bayesian networks and Markov Random Fields can be represented by factor graphs (ref. Equation (8.5) and Equation (8.39) in [2], and details in Section 8.4.3).

Factor graphs are bipartite graphs, where edges connect variable nodes and function nodes. That is to say, neighbors of a variable node are all function nodes; neighbors of a function node are all variable nodes.

If there is no loop in the factor graph, sum-product on this graph is to pass messages in both ways along each edge.

If there are loops in the factor graph, sum-product should be done in an iterative fashion.

It is a good start point to understand sum-product from the specific problem of inference in a chain model (which results in the forward/backward algorithm). Then extend the chain by branches to make it a tree (e.g, the latent topics Z in supervised LDA affected by the Dirichlet prior, document rates and content words). Then extend the tree to have loops.

As sum-product is expressed by message-passing , it is suitable to be developed using parallel computing frameworks like MPI and BSP.

References:

Factor Graphs and the Sum-Product Algorithm, Frank R. Kschischang, Senior Member, Brendan J. Frey, and Hans-Andrea Loeliger, IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 2, FEBRUARY 2001

Pattern Recognition and Machine Learning, Christopher Bishop, 2004, Section 8.4.