In the recent months, I have been learning non-parametric Bayesian

methods for topic modeling. Here follows some documents I feel

helpful in the learning process, and what I want to do using the

learned knowledge. Any of your comments are highly appreciated.

0. Measure Theory and sigma-algebra

http://en.wikipedia.org/wiki/Measure_theory

http://en.wikipedia.org/wiki/Sigma-algebra

Measure theory is the basis of Dirichlet process and many other

stochastic processes. Sigma-algebra is the support of measure theory.

They are keys to generalize finite latent factors to infinity.

1. Basics of Dirichlet process:

http://velblod.videolectures.net/2007/pascal/mlss07_tuebingen/teh_yee_whye/teh_yee_whye_dp_article.pdf

This introduction is a course note written by Yee Teh, the author

of hierarchical Dirichlet process (HDP).

2. Dirichlet Process Mixture Models

http://gs2040.sp.cs.cmu.edu/neal.pdf

This paper presents the Dirichlet process mixture model which is a

mixture with (potentially) infinite number of components. This paper

explains how to generalize traditional mixture models using a

Dirichlet process as the prior of components. This generalization

makes it possible to estimate the number of components using Gibbs

sampling in tractable amount of runtime complexity.

3. Hierarchical Dirichlet Process

http://www.cs.berkeley.edu/~jordan/papers/hdp.pdf

As LDA models each document using a mixture of finite number of

topics, hierarchical Dirichlet process (HDP) models each document by a

mixture of infinite number of topics, where each finite mixture is a

Dirichlet process mixture model.