I am interesting with labelling an arbitrary document with predefined or learned labels. I prefer using LDA with this problem, because we have a highly scalable implementation of LDA, and LDA can explain each document by topics. Nevertheless, I am doing a survey before programming. I have read the following papers. Would anyone please provide any hint?
- DiscLDA: Discriminative learning for dimensionality reduction and classification. NIPS 2008.
- D. Blei and M. Jordan. Modeling Annotated Data. SIGIR 2003.
- Chemudugunta, C., Holloway, A., Smyth, P., & Steyvers, M. Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning. In: 7th International Semantic Web Conference, 2008.
- Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections. WWW 2008
- Qiaozhu Mei, Xuehua Shen, ChengXiang Zhai. Automatic Labeling of Multinomial Topic Models, KDD 2007.
- Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, ChengXiang Zhai. Semantic Annotation of Frequent Patterns, ACM TKDD, 1(3), 2007.