A Small Computing Cluster on Board

In my previous post, I mentioned a newly developed GPU-based parallel Gibbs sampling algorithm for inference of LDA. Of course, as you know, there are many other GPU-based parallel algorithms that can solve many interesting applications efficiently using NVidia’s CUDA programming framework.
More over, by Googling “CUDA MapReduce”, you will find MapReduce implementations based on CUDA and GPU, developed by researchers at UC Berkeley, U Texas, Hong Kong Univ. of Sci.&Tech, and etc.
About the supporting hardware, I recently noticed NVidia’s Tesla processor board, which contains a 240-core Tesla 10 GPU and 4GB on-board memory. This card can be installed on workstations like Dell Precision T7500. At the time when this essay is written, the price of such a system is about RMB 43,000.
Last but not least, there are significant differences between GPU-clusters and relatively traditionally computer-clusters. Few listed here:
  1. There is no mature load-balancing mechanism on GPU-clusters. Currently, GPU-based parallel computing is in the early stage of CPU-based parallel computing, which I mean, no automatic balancing over processors used by a task, and no scheduling and balancing over tasks. This prevents multiple projects from sharing a GPU-cluster.
  2. GPU-cluster is based on shared-memory architecture, so it is suitable only for the class of computing-intensive but data-sparse tasks. I do not see more than a few problems in real world that fit in this class.
But anyway, it is not smart to compare GPU-based parallel computing directly with multi-core CPU based solutions, because the latter can be naturally incorporated into multi-computer parallel computing and achieve naturally much higher scalability.