In my previous post Why Go? Use Racket!, I pointed out that CSP facilities provided with the Go programming language also exist with Racket, a Scheme dialect. However, it seems that Racket programs run much slower than their Go counterparts.
Weeks earlier, Rob Pike gave a talk Go Concurrency Patterns in Google IO 2012, where he presented an interesting demo program named Chinese Whispers. This program creates a sequence of 10,001 channels interleaved with 10,000 threads (whispers). Each thread reads from the channel to its left and writes the processed input into the channel to its right.
Here follows the Go program:
package main
import "fmt"
func whisper(left, right chan int) {
left <- 1 + <- right
}
func main() {
const n = 100000
leftmost := make(chan int)
right := leftmost
left := leftmost
for i := 0; i < n; i++ {
right = make(chan int)
go whisper(left, right)
left = right
}
go func(c chan int) { c <- 1 }(right)
fmt.Println(<- leftmost)
}
And this is what I wrote using Racket:
(let ((leftmost (make-channel)))
(define (setup-whispers left i n)
(if (>= i n)
left
(let ((right (make-channel)))
(thread (lambda ()
(channel-put left (+ 1 (channel-get right)))))
(setup-whispers right (+ 1 i) n))))
(let ((rightmost (setup-whispers leftmost 0 100000)))
(thread (lambda () (channel-put rightmost 1)))
(channel-get leftmost)))
Both programs output the same result 100,001. However, the running time differs significantly:
wangyi@WangYiIMac$time go run daisy-chain.go 100001 real 0m1.045s user 0m0.367s sys 0m0.231s wangyi@WangYiIMac$raco exe daisy-chain.rkt && time ./daisy-chain 100001 real 0m42.905s user 0m29.579s sys 0m11.075s
Building and running in Go costs about 1 second, but that in Racket costs about 1 minute. Racket, how did you do this??
Thanks to monad, who provided a Haskell version of the program, which runs as fast as the Go version.
import Control.Monad
import Control.Concurrent
chans = sequence $ replicate 100001 newEmptyMVar
whisper chans = do
forM_ (zip chans (tail chans)) $ \(t,f) -> forkIO $ do
a <- readMVar f
putMVar t (a + 1)
forkIO $ putMVar (last chans) 1
takeMVar $ head chans
main = chans >>= whisper >>= print
Thanks to ligong, who provided the Common Lisp version of this program:
(defpackage :whisper (:use :cl :chanl))
(in-package :whisper)
(defparameter N 10000)
(defun whisper(left right)
"create a whsiper thread"
(pcall #'(lambda () (send left (1+ (recv right))))))
(defun main()
"Simulate passing message from the right most whisper to
the left most"
(multiple-value-bind (left-most right-most)
(loop
with left-most = (make-instance 'channel)
repeat N
for left = left-most then right
for right = (make-instance 'channel)
do (whisper left right)
finally (return (values left-most right)))
(send right-most 1)
(print (recv left-most))))
Thanks to jadudm, who provides the occam-pi version:
#INCLUDE "useful.module"
VAL INT N IS 100000:
PROC whisper (CHAN INT left?, right!)
INT v:
SEQ
left ? v
right ! (v + 1)
:
PROC main (CHAN BYTE kyb?, scr!, err!)
SEQ
[N+1]CHAN INT whispers:
PAR
-- The first whisperer
whispers[0] ! 1
-- N whisperers
PAR i = 0 FOR N
whisper (whispers[i]?, whispers[i+1]!)
-- The last whisperer
INT end:
SEQ
whispers[N] ? end
-- Print the end result.
out.int(end, 0, scr!)
out.string("*n", 0, scr!)
:
Maybe you can try Haskell. Haskell also has a lightweight thread model. MVar is like a (one value) channel. and forkIO is like go(in golang). It gets similar performence as the go version.
import Control.Monad
import Control.Concurrent
chans = sequence $ replicate 100001 newEmptyMVar
whisper chans = do
forM_ (zip chans (tail chans)) $ \(t,f) -> forkIO $ do
a >= whisper >>= print
previous post is malformed.
last try: http://gist.github.com/3204798
Thanks monad! The version on Github looks pretty. I will add your code into my post.
[...] As noted in this post, Racket programs build and run much slower than Go programs. Like this:LikeBe the first to like [...]
Interesting post.
I searched common lisp library and found several packages support CSP. All of them seem be based on Bordeaux Threads. I wrote a common lisp version to get a feel about the performance of its thread library
https://github.com/ligong/lgmisc/blob/master/whisper.lisp
My environment is linux+sbcl. The execution time is unstable. Most of time, it is less than 40 seconds. The fastest one is only 2s.
Cool! Copy-and-pasting your code into the post.
Something funny does appear to be happening on the Racket end of things. Someone’s investigating it now. See the beginning of http://racket-lang.org/irc-logs/20120801.txt
Good to know. Thanks. Looks like a problem with GC? Looking forward to the fix.
Yes, looks like things will be a lot better. See Matthew Flatt’s analysis and results in the thread here: http://lists.racket-lang.org/users/archive/2012-August/053458.html
Thanks for the update, and I am now looking forward to v5.3.0.18.
One more thing: I just compared v5.2.1 with v5.3.0.17, and found the latter runs times slower.
$time racket-5.2.1/bin/racket /tmp/a.rkt
real 0m35.430s
user 0m26.235s
sys 0m5.334s
$time ~/racket-5.3.0.17/bin/racket /tmp/a.rkt
real 1m54.070s
user 0m47.368s
sys 0m27.836s
A nightly build is now available with the GC repairs.
Meanwhile, I don’t see the difference on my machine that you report for v5.2.1 vs. v5.3.0.17 (i.e., just before the repair) using 64-bit Mac OS X. The numbers that you report look similar to the difference between 32-bit and 64-bit times on my machine, though. Can you say more about your platform, especially 32-bit vs. 64-bit?
It is so embarrassing that you are right, I compared v5.2.1(i386) and v5.3(x64) in my previous comment. I tried the new version (5.3.0.20) and put some new numbers into the table below. The new version runs much faster that old versions. Thanks!
And, just for fun, the occam-pi implementation:
https://gist.github.com/3315898
I see strange behaviour with the Racket version (5.3.0.20 built from git yesterday). The first time I run it, it takes for ever to run and starts swapping memory. I had to kill it. The second time I run, it runs in about 13 seconds. This is on a 64-bit Debian GNU/Linux machine.
A 64-bit run on my machine peaks at 2.6-GB memory use. My guess is that your first run swapped out enough so that memory was readily available for the second run.
Reducing the initial stack for a thread as in http://lists.racket-lang.org/users/archive/2012-August/053458.html takes the peak memory down to 1.2 GB on my machine. That’s still about 12 KB per thread — which is light-ish, but not as lightweight as some other systems.
Thanks Matthew, I think you are right. I drew the following table, which confirms what you said — shrinking the stack size reduces the sys time used by 64-bit versions significantly. (Measurements were conducted on my old MacBook with Intel Core 2 Duo CPU.)
stack
size
thread
stack
size =
100