Some Questions about LDA with Partial Answers

  • Is the disambiguation ability of LDA related with mvLDA?
    Yes. This makes the weighting of various vocabularies reasonable. For details, please refer to this post in this Blog.
  • The feature weighting problem in mvLDA, sLDA and TOT.
    1. TOT seems do not has the feature weighting problem, because the time-stamp is a sample from a Beta distribution and is thus constrained in the range [0,1].
    2. DisLDA seems do not has the problem, as yd is a class indicator.
    3. sLDA has the problem, as stated in our paper draft.
    4. mvLDA also has the problem, which is related with the disambiguation ability of the full conditional posterior.

  • How could LDA used to identify domain specific words?
    Initial results look not bad. Progress, related data, trained model and action items are updated in this post.
  • How would LDA becomes if the training corpus contains only one document?
    This would degenerate LDA to a mixture of multinomial distributions. But, how the co-occurrences of words affect the training process and manifest in the learning result in this degenerated case?
  • Could sum-product (belief propagation) be used for inference in mixture models, pLSA and LDA?
    Yes. I believe it is applicable to all three models. A brief on inference in mixture models is in this post. I should write down how to do inference in pLSA and LDA with factor graphs.
  • Could sum-product algorithm (belief propagation) and expectation propagation used with GBR to scale up the LDA training process?
    No, they cannot scale up the LDA training process. Although they can be used with GBR to do inference, but practically, as documents are iid and can be sharded across various computers, we can use MapReduce to get all benefits that can be get using GBR in inference. A more serious problem is that after inference (E-step), however, we need an M-step which definitely requires AllReduce support. This answers the question posed in this previous post.
  • How to evaluate the “weighted-sum inference algorithm” for LDA?
    It is in fact the initialization step of Gibbs sampling / sum-product / expectation propagation.