cxwangyi 12:11 pm on July 29, 2012

Chinese Whispers in Go, Racket and Other Languages

In my previous post Why Go? Use Racket!, I pointed out that CSP facilities provided with the Go programming language also exist with Racket, a Scheme dialect. However, it seems that Racket programs run much slower than their Go counterparts.

Weeks earlier, Rob Pike gave a talk Go Concurrency Patterns in Google IO 2012, where he presented an interesting demo program named Chinese Whispers. This program creates a sequence of 10,001 channels interleaved with 10,000 threads (whispers). Each thread reads from the channel to its left and writes the processed input into the channel to its right.

Here follows the Go program:

package main
import "fmt"

func whisper(left, right chan int) {
	left <- 1 + <- right
}
func main() {
	const n = 100000
	leftmost := make(chan int)
	right := leftmost
	left := leftmost
	for i := 0; i < n; i++ {
		right = make(chan int)
		go whisper(left, right)
		left = right
	}
	go func(c chan int) { c <- 1 }(right)
	fmt.Println(<- leftmost)
}

And this is what I wrote using Racket:

(let ((leftmost (make-channel)))
  (define (setup-whispers left i n)
    (if (>= i n)
        left
        (let ((right (make-channel)))
          (thread (lambda ()
                    (channel-put left (+ 1 (channel-get right)))))
          (setup-whispers right (+ 1 i) n))))
  (let ((rightmost (setup-whispers leftmost 0 100000)))
    (thread (lambda () (channel-put rightmost 1)))
    (channel-get leftmost)))

Both programs output the same result 100,001. However, the running time differs significantly:

wangyi@WangYiIMac$time go run daisy-chain.go
100001
real	0m1.045s
user	0m0.367s
sys	0m0.231s

wangyi@WangYiIMac$raco exe daisy-chain.rkt && time ./daisy-chain
100001
real	0m42.905s
user	0m29.579s
sys	0m11.075s

Building and running in Go costs about 1 second, but that in Racket costs about 1 minute. Racket, how did you do this??

Thanks to monad, who provided a Haskell version of the program, which runs as fast as the Go version.

import Control.Monad
import Control.Concurrent

chans = sequence $ replicate 100001 newEmptyMVar

whisper chans = do
  forM_ (zip chans (tail chans)) $ \(t,f) -> forkIO $ do
    a <- readMVar f
    putMVar t (a + 1)
  forkIO $ putMVar (last chans) 1
  takeMVar $ head chans

main = chans >>=  whisper >>= print

Thanks to ligong, who provided the Common Lisp version of this program:

(defpackage :whisper (:use :cl :chanl))
(in-package :whisper)

(defparameter N 10000)

(defun whisper(left right)
  "create a whsiper thread"
  (pcall #'(lambda () (send left (1+ (recv right))))))

(defun main()
  "Simulate passing message from the right most whisper to
the left most"
    (multiple-value-bind (left-most right-most)
	(loop
	   with left-most = (make-instance 'channel)
	   repeat N
	   for left = left-most then right
	   for right = (make-instance 'channel)
	   do (whisper left right)
	   finally (return (values left-most right)))
      (send right-most 1)
      (print (recv left-most))))

Thanks to jadudm, who provides the occam-pi version:

#INCLUDE "useful.module"

VAL INT N IS 100000:

PROC whisper (CHAN INT left?, right!)
  INT v:
  SEQ
    left ? v 
    right ! (v + 1)
:

PROC main (CHAN BYTE kyb?, scr!, err!)
  SEQ
    [N+1]CHAN INT whispers:
    PAR
      -- The first whisperer
      whispers[0] ! 1

      -- N whisperers
      PAR i = 0 FOR N
        whisper (whispers[i]?, whispers[i+1]!)
        
      -- The last whisperer
      INT end:
      SEQ
        whispers[N] ? end
        -- Print the end result. 
        out.int(end, 0, scr!)
        out.string("*n", 0, scr!)
:

monad 4:50 am on July 30, 2012

Maybe you can try Haskell. Haskell also has a lightweight thread model. MVar is like a (one value) channel. and forkIO is like go(in golang). It gets similar performence as the go version.

import Control.Monad
import Control.Concurrent

chans = sequence $ replicate 100001 newEmptyMVar

whisper chans = do
forM_ (zip chans (tail chans)) $ \(t,f) -> forkIO $ do
a >= whisper >>= print
- monad 4:53 am on July 30, 2012
  
  previous post is malformed.
  - monad 4:54 am on July 30, 2012
    
    last try: http://gist.github.com/3204798
  - cxwangyi 8:41 am on July 30, 2012
    
    Thanks monad! The version on Github looks pretty. I will add your code into my post.
Pingback: Why Go? Use Racket! « Yi Wang's Tech Notes
ligong 8:02 am on July 31, 2012

Interesting post.
I searched common lisp library and found several packages support CSP. All of them seem be based on Bordeaux Threads. I wrote a common lisp version to get a feel about the performance of its thread library
https://github.com/ligong/lgmisc/blob/master/whisper.lisp
My environment is linux+sbcl. The execution time is unstable. Most of time, it is less than 40 seconds. The fastest one is only 2s.
- cxwangyi 8:36 am on July 31, 2012
  
  Cool! Copy-and-pasting your code into the post.
Danny Yoo (@dyoo1979) 6:40 pm on August 1, 2012

Something funny does appear to be happening on the Racket end of things. Someone’s investigating it now. See the beginning of http://racket-lang.org/irc-logs/20120801.txt
- cxwangyi 8:08 am on August 2, 2012
  
  Good to know. Thanks. Looks like a problem with GC? Looking forward to the fix.
  - Danny Yoo 3:04 am on August 13, 2012
    
    Yes, looks like things will be a lot better. See Matthew Flatt’s analysis and results in the thread here: http://lists.racket-lang.org/users/archive/2012-August/053458.html
  - cxwangyi 7:53 am on August 15, 2012
    
    Thanks for the update, and I am now looking forward to v5.3.0.18.
    
    One more thing: I just compared v5.2.1 with v5.3.0.17, and found the latter runs times slower.
    
    $time racket-5.2.1/bin/racket /tmp/a.rkt
    real 0m35.430s
    user 0m26.235s
    sys 0m5.334s
    
    $time ~/racket-5.3.0.17/bin/racket /tmp/a.rkt
    real 1m54.070s
    user 0m47.368s
    sys 0m27.836s
  - unclosedparenthesis 2:23 pm on August 19, 2012
    
    A nightly build is now available with the GC repairs.
    
    Meanwhile, I don’t see the difference on my machine that you report for v5.2.1 vs. v5.3.0.17 (i.e., just before the repair) using 64-bit Mac OS X. The numbers that you report look similar to the difference between 32-bit and 64-bit times on my machine, though. Can you say more about your platform, especially 32-bit vs. 64-bit?
  - cxwangyi 7:54 am on August 23, 2012
    
    It is so embarrassing that you are right, I compared v5.2.1(i386) and v5.3(x64) in my previous comment. I tried the new version (5.3.0.20) and put some new numbers into the table below. The new version runs much faster that old versions. Thanks!

And, just for fun, the occam-pi implementation:

	— As described in this blog post:
	— https://cxwangyi.wordpress.com/2012/07/29/chinese-whispers-in-racket-and-go/
	—
	— This version was run and tested on the Transterpreter Virtual Machine, a
	— bytecode interpreter for occam-pi programs. Compiled natively, it will
	— be faster; with modification, it could be run on an Arduino (and be slower).
	—
	— For more info, see:
	— http://concurrency.cc/

	#INCLUDE "useful.module"

	VAL INT N IS 100000:

	PROC whisper (CHAN INT left?, right!)
	INT v:
	SEQ
	left ? v
	right ! (v + 1)
	:

	PROC main (CHAN BYTE kyb?, scr!, err!)
	INT s.time, e.time:
	TIMER t:
	SEQ
	t ? s.time
	[N+1]CHAN INT whispers:
	PAR
	— The first whisperer
	whispers[0] ! 1

	— N whisperers
	PAR i = 0 FOR N
	whisper (whispers[i]?, whispers[i+1]!)

	— The last whisperer
	INT end:
	SEQ
	whispers[N] ? end
	— Stop timing after comms are done.
	— Don't measure the time taken to print.
	t ? e.time
	— Print the end result.
	out.int(end, 0, scr!)
	out.string("*n", 0, scr!)

	— Show the time it took to run N whisperers
	— in microseconds. Timing resolution is different
	— if executed on the Arduino.
	out.int((e.time – s.time) / 1000, 0, scr!)
	out.string(" us*n", 0, scr!)
	:

view raw

whispers.occ

hosted with ❤ by GitHub

I see strange behaviour with the Racket version (5.3.0.20 built from git yesterday). The first time I run it, it takes for ever to run and starts swapping memory. I had to kill it. The second time I run, it runs in about 13 seconds. This is on a 64-bit Debian GNU/Linux machine.

A 64-bit run on my machine peaks at 2.6-GB memory use. My guess is that your first run swapped out enough so that memory was readily available for the second run.

Reducing the initial stack for a thread as in http://lists.racket-lang.org/users/archive/2012-August/053458.html takes the peak memory down to 1.2 GB on my machine. That’s still about 12 KB per thread — which is light-ish, but not as lightweight as some other systems.

Thanks Matthew, I think you are right. I drew the following table, which confirms what you said — shrinking the stack size reduces the sys time used by 64-bit versions significantly. (Measurements were conducted on my old MacBook with Intel Core 2 Duo CPU.)

Racket Performance
			5.3.0.20	5.2.1	5.3
default stack size	32bit	real	0m14.382s	0m38.344s	0m39.180s
		user	0m8.992s	0m27.909s	0m29.362s
		sys	0m3.427s	0m5.594s	0m5.568s
	64bit	real	1m4.875s	2m7.687s	1m46.973s
		user	0m14.781s	0m48.462s	0m47.170s
		sys	0m19.032s	0m31.715s	0m26.070s
initial thread stack size = 100	32bit	real	0m10.181s	0m27.370s	0m28.854s
		user	0m7.147s	0m21.768s	0m23.224s
		sys	0m1.858s	0m3.143s	0m3.187s
	64bit	real	0m14.317s	0m45.537s	0m43.995s
		user	0m9.582s	0m33.208s	0m32.149s
		sys	0m3.686s	0m9.507s	0m9.317s

crib tent 10:16 pm on June 26, 2013

Therefore parents are advised to purchase only bedding that is
totally suitable for they bed they choose. The same is the case with
excess protrusions in the chassis of the system, which must also have
a high-quality mattress. Check also for features like water- or flame-resistance.
http://linkvis.com/profile/Randall60 2:49 am on June 27, 2013

The reason for this is very obvious, though some people
still have no idea why is this so. The lavish beauty of a pink engagement ring
will astound even the most discriminating admirer. If you’ve chosen to wear pearls for your wedding jewelry, shopping for them now becomes a task.
Lily 4:17 am on July 21, 2013

Produced and introduced last 1983 when Casio engineers taking on the challenge of producing the world’s toughest watch ever. Additionally, you will find 6 transmission stations worldwide which means you are guaranteed the actual time, every second, wherever you are inside world. It features Auto EL Backlight with After – Glow and clear digital display so you can easily make out the print at any light settings.
linux hosting 8:27 pm on August 4, 2013

Appreciation to my father who informed me regarding this weblog, this website is in fact remarkable.|
Mistivia 7:18 am on February 17, 2024

Thanks to Racket on ChezScheme, the running time now in 2024 doesn’t differ significantly anymore.

$ time racket test.rkt
100001

real 0m2.472s
user 0m1.987s
sys 0m0.479s

Share this:

Related