In the following paper published on ISMIR 2008:
- Oh Oh Oh Whoah! Towards Automatic Topic Detection In Song Lyrics, Florian Kleedorfer et al.
the authors present their work using NMF (Non-negative Matrix Factorization) to analyze semantic topics from song lyrics. In Section 3.4:
“We decide to use NMF for automatic topic detection as it is a clustering technique that results in additive representation of items (e.g., song X is represented as 10% topci A, 30% topic B and 60% topic C), a property that distinguishes it from most other clustering techniques.”
However, “most other techniques” including pLSA, LDA and Mix-Noisy-OR models all have the “distinguishing property” stated by the authors. In addition, the equivalence between NMF and pLSA has been well studied in the following papers:
- Relation between PLSA and NMF and implications, SIGIR 2006.
- On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing. Computational Statistics and Data Analysis 52 (2008) 3913–3927.
The authors also criticize that LSA cannot process large sparse matrices. However, LSA is in fact applying SVD on term-document-matrix (TDM), and there are many SVD algorithms that can decompose large sparse matrices.