Chinese Whispers in Go, Racket and Other Languages

In my previous post Why Go? Use Racket!, I pointed out that CSP facilities provided with the Go programming language also exist with Racket, a Scheme dialect.  However, it seems that Racket programs run much slower than their Go counterparts.

Weeks earlier, Rob Pike gave a talk Go Concurrency Patterns in Google IO 2012, where he presented an interesting demo program named Chinese Whispers. This program creates a sequence of 10,001 channels interleaved with 10,000 threads (whispers).  Each thread reads from the channel to its left and writes the processed input into the channel to its right.

Here follows the Go program:

package main
import "fmt"

func whisper(left, right chan int) {
	left <- 1 + <- right
func main() {
	const n = 100000
	leftmost := make(chan int)
	right := leftmost
	left := leftmost
	for i := 0; i < n; i++ {
		right = make(chan int)
		go whisper(left, right)
		left = right
	go func(c chan int) { c <- 1 }(right)
	fmt.Println(<- leftmost)

And this is what I wrote using Racket:

(let ((leftmost (make-channel)))
  (define (setup-whispers left i n)
    (if (>= i n)
        (let ((right (make-channel)))
          (thread (lambda ()
                    (channel-put left (+ 1 (channel-get right)))))
          (setup-whispers right (+ 1 i) n))))
  (let ((rightmost (setup-whispers leftmost 0 100000)))
    (thread (lambda () (channel-put rightmost 1)))
    (channel-get leftmost)))

Both programs output the same result 100,001. However, the running time differs significantly:

wangyi@WangYiIMac$time go run daisy-chain.go
real	0m1.045s
user	0m0.367s
sys	0m0.231s

wangyi@WangYiIMac$raco exe daisy-chain.rkt && time ./daisy-chain
real	0m42.905s
user	0m29.579s
sys	0m11.075s

Building and running in Go costs about 1 second, but that in Racket costs about 1 minute. Racket, how did you do this??

Thanks to monad, who provided a Haskell version of the program, which runs as fast as the Go version.

import Control.Monad
import Control.Concurrent

chans = sequence $ replicate 100001 newEmptyMVar

whisper chans = do
  forM_ (zip chans (tail chans)) $ \(t,f) -> forkIO $ do
    a <- readMVar f
    putMVar t (a + 1)
  forkIO $ putMVar (last chans) 1
  takeMVar $ head chans

main = chans >>=  whisper >>= print

Thanks to ligong, who provided the Common Lisp version of this program:

(defpackage :whisper (:use :cl :chanl))
(in-package :whisper)

(defparameter N 10000)

(defun whisper(left right)
  "create a whsiper thread"
  (pcall #'(lambda () (send left (1+ (recv right))))))

(defun main()
  "Simulate passing message from the right most whisper to
the left most"
    (multiple-value-bind (left-most right-most)
	   with left-most = (make-instance 'channel)
	   repeat N
	   for left = left-most then right
	   for right = (make-instance 'channel)
	   do (whisper left right)
	   finally (return (values left-most right)))
      (send right-most 1)
      (print (recv left-most))))

Thanks to jadudm, who provides the occam-pi version:

#INCLUDE "useful.module"

VAL INT N IS 100000:

PROC whisper (CHAN INT left?, right!)
  INT v:
    left ? v 
    right ! (v + 1)

PROC main (CHAN BYTE kyb?, scr!, err!)
    [N+1]CHAN INT whispers:
      -- The first whisperer
      whispers[0] ! 1

      -- N whisperers
      PAR i = 0 FOR N
        whisper (whispers[i]?, whispers[i+1]!)
      -- The last whisperer
      INT end:
        whispers[N] ? end
        -- Print the end result., 0, scr!)
        out.string("*n", 0, scr!)
About these ads

21 Responses to Chinese Whispers in Go, Racket and Other Languages

  1. monad says:

    Maybe you can try Haskell. Haskell also has a lightweight thread model. MVar is like a (one value) channel. and forkIO is like go(in golang). It gets similar performence as the go version.

    import Control.Monad
    import Control.Concurrent

    chans = sequence $ replicate 100001 newEmptyMVar

    whisper chans = do
    forM_ (zip chans (tail chans)) $ \(t,f) -> forkIO $ do
    a >= whisper >>= print

  2. [...] As noted in this post, Racket programs build and run much slower than Go programs. Like this:LikeBe the first to like [...]

  3. ligong says:

    Interesting post.
    I searched common lisp library and found several packages support CSP. All of them seem be based on Bordeaux Threads. I wrote a common lisp version to get a feel about the performance of its thread library
    My environment is linux+sbcl. The execution time is unstable. Most of time, it is less than 40 seconds. The fastest one is only 2s.

  4. Something funny does appear to be happening on the Racket end of things. Someone’s investigating it now. See the beginning of

    • cxwangyi says:

      Good to know. Thanks. Looks like a problem with GC? Looking forward to the fix.

      • Danny Yoo says:

        Yes, looks like things will be a lot better. See Matthew Flatt’s analysis and results in the thread here:

      • cxwangyi says:

        Thanks for the update, and I am now looking forward to v5.3.0.18.

        One more thing: I just compared v5.2.1 with v5.3.0.17, and found the latter runs times slower.

        $time racket-5.2.1/bin/racket /tmp/a.rkt
        real 0m35.430s
        user 0m26.235s
        sys 0m5.334s

        $time ~/racket- /tmp/a.rkt
        real 1m54.070s
        user 0m47.368s
        sys 0m27.836s

      • A nightly build is now available with the GC repairs.

        Meanwhile, I don’t see the difference on my machine that you report for v5.2.1 vs. v5.3.0.17 (i.e., just before the repair) using 64-bit Mac OS X. The numbers that you report look similar to the difference between 32-bit and 64-bit times on my machine, though. Can you say more about your platform, especially 32-bit vs. 64-bit?

      • cxwangyi says:

        It is so embarrassing that you are right, I compared v5.2.1(i386) and v5.3(x64) in my previous comment. I tried the new version ( and put some new numbers into the table below. The new version runs much faster that old versions. Thanks!

  5. Matt says:

    And, just for fun, the occam-pi implementation:

  6. mar says:

    I see strange behaviour with the Racket version ( built from git yesterday). The first time I run it, it takes for ever to run and starts swapping memory. I had to kill it. The second time I run, it runs in about 13 seconds. This is on a 64-bit Debian GNU/Linux machine.

    • A 64-bit run on my machine peaks at 2.6-GB memory use. My guess is that your first run swapped out enough so that memory was readily available for the second run.

      Reducing the initial stack for a thread as in takes the peak memory down to 1.2 GB on my machine. That’s still about 12 KB per thread — which is light-ish, but not as lightweight as some other systems.

      • cxwangyi says:

        Thanks Matthew, I think you are right. I drew the following table, which confirms what you said — shrinking the stack size reduces the sys time used by 64-bit versions significantly. (Measurements were conducted on my old MacBook with Intel Core 2 Duo CPU.)

        Racket Performance
                    5.2.1     5.3      
        real  0m14.382s 0m38.344s 0m39.180s
        user  0m8.992s  0m27.909s 0m29.362s
        sys   0m3.427s  0m5.594s  0m5.568s 
        real  1m4.875s  2m7.687s  1m46.973s
        user  0m14.781s 0m48.462s 0m47.170s
        sys   0m19.032s 0m31.715s 0m26.070s
        size = 
        real  0m10.181s 0m27.370s 0m28.854s
        user  0m7.147s  0m21.768s 0m23.224s
        sys   0m1.858s  0m3.143s  0m3.187s 
        real  0m14.317s 0m45.537s 0m43.995s
        user  0m9.582s  0m33.208s 0m32.149s
        sys   0m3.686s  0m9.507s  0m9.317s 
  7. crib tent says:

    Therefore parents are advised to purchase only bedding that is
    totally suitable for they bed they choose. The same is the case with
    excess protrusions in the chassis of the system, which must also have
    a high-quality mattress. Check also for features like water- or flame-resistance.

  8. The reason for this is very obvious, though some people
    still have no idea why is this so. The lavish beauty of a pink engagement ring
    will astound even the most discriminating admirer. If you’ve chosen to wear pearls for your wedding jewelry, shopping for them now becomes a task.

  9. Lily says:

    Produced and introduced last 1983 when Casio engineers taking on the challenge of producing the world’s toughest watch ever. Additionally, you will find 6 transmission stations worldwide which means you are guaranteed the actual time, every second, wherever you are inside world. It features Auto EL Backlight with After – Glow and clear digital display so you can easily make out the print at any light settings.

  10. Appreciation to my father who informed me regarding this weblog, this website is in fact remarkable.|

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 28 other followers

%d bloggers like this: