Scala vs Go – Could people help compare/contrast these on relative merits/demerits?

August 20, 2014

Answer by Nick Snyder:

I have written Go at Google (and in my own time) and Scala at LinkedIn. Both are modern languages with first class concurrency features.

What follows is my subjective comparison of Scala and Go…

Go is an opinionated, minimal language that compiles to machine code.

Scala is a sophisticated, academic, functional, object-oriented, everything-but-the-kitchen-sink, free-for-all language that has a lot of features and runs on the JVM.

Given my experience with both, I would choose Go over Scala every time for one reason: simplicity.

Before I dive in to my answer, I want to make a few general observations:

  • All else equal, less code is easier to understand than more code. All else is rarely equal.
  • Code is read much more often than it is written.
  • Code frequently lives longer than we want it to.
  • The person who tests or maintains a piece of code is frequently not the original author.
  • At scale, the skill level of developers reading/writing/maintaining/testing code is going to be a normal distribution around the mean of "not expert."

Writing code is an act of communication, not just between the author and the compiler (or runtime), but also between the author and a future reader of unknown skill level.

Language complexity

Java 8 language spec is a 780 page PDF (lol, seriously?).
http://docs.oracle.com/javase/sp…

Scala language spec is a 191 page PDF.
http://www.scala-lang.org/docu/f…

The Go language spec is webpage that prints as a 51 page PDF.
http://golang.org/ref/spec

Defining a language is not the same as learning how to use a language, but it is a proxy for how much there is to learn (or how much there is to confuse you when reading someone else's code).

Documentation

I found Go easier to learn, both because it is (objectively) a simpler language and (subjectively) because I think the documentation is better.

Tour:
http://docs.scala-lang.org/tutor…
vs
http://tour.golang.org/#1
http://golang.org/doc/effective_…

FAQ:
http://docs.scala-lang.org/tutor…
vs
http://golang.org/doc/faq

Standard library:
http://www.scala-lang.org/api/cu…
vs
http://golang.org/pkg/

Expressiveness

Given the language specs, it should be no surprise that Scala has more features than Go.

Does that mean Go is any less expressive than Scala?
No, you just might need to write a few extra lines to code to do what you want (but I don't think that is a bad thing).

One of Go's features is that it doesn't have an excess of features, and frankly, I think that feature is undervalued.
http://golang.org/doc/faq#Why_do…

I have yet to encounter a situation in Go where I wished it had a feature that it doesn't.

The Sim City effect

Don't bother Googling this, I just made it up. You know Sim City, right?
SimCity Official Website

If two people start a new city on the same map and play for a few hours, they will produce completely different looking cities.

Why is that?
Because Sim City is a sandbox and there are so many options that it is highly improbable that any two people will make the same series of decisions.

Scala is a sandbox too.
As the other answer alluded to, in addition to the plethora of features that Scala ships with (functional programming features, OO programming features, etc.), Scala exposes capabilities that allow developers to add new features.

This is all well and good until you need to share your sandbox with others and they have no idea what was going on in your head.

Speaking of consistency…

Go is the only language that I know of that has circumvented the whole code style debate by just providing a tool to format your code in the canonical format.
http://golang.org/cmd/gofmt/

Code density

Scala code can be very dense and hard to grok at the early stages of learning.

I would go so far as to say that it is idiomatic in Scala to be as dense as possible, or at least it seems to be that experienced Scala developers have a propensity for terseness.

Example

Consider fetching a user id from a cookie. How much language knowledge do you need to answer the following questions given the implementation?

  • What happens if the cookie is not present?
  • What happens if the cookie value is not a well formatted number?
  • What happens if the cookie value is a negative number?

Scala

1
2
3
4
5
import play.api.mvc.RequestHeader

def getUserId()(implicit request: RequestHeader) = {
  request.cookies.get("uid").map(_.value.toLong).filter(_ > 0)
}

Go

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import (
  "fmt"
  "http"
  "strconv"
)

func getUserId(r *http.Request) (int64, error) {
  c, err := r.Cookie("uid")
  if err != nil {
    return 0, err
  }
  i, err := strconv.ParseInt(c.Value, 10, 64)
  if err != nil {
    return 0, err
  }
  if i <= 0 {
    return 0, fmt.Errorf("invalid user id")
  }
 return i, nil
}

In this particular case, the Scala code is clearly shorter and perhaps more eloquent, but the point that I am trying to illustrate in general is that the Go code is explicit and the Scala code requires context to understand.

Be $#%!ing explicit

In my experience, explicit code has a lot of benefits.

  • Explicit code is easier for novices and for non-authors to grok.
  • Explicit code makes the error cases obvious.
  • Explicit code makes the test cases obvious.
  • Explicit code is easier to debug (try setting a breakpoint in the Scala code above).

I have found Go code to be much more explicit than Scala.

Performance

As a developer, the only performance that I care about is my development cycle because I have yet to encounter a runtime that isn't fast enough for what I want to accomplish. Developer time is more valuable than a computer's time anyway.

In my experience, Go is extremely fast to compile, and Scala is slow. YMMV.

To be fair

I learned Go before I learned Scala so I admit that I probably have a bias toward Go. My first reaction to Go was that its syntax was ugly (like C++) but learning Go felt like this:

I learned Scala out of necessity and have gotten used to it; I even like parts of it now, but learning Scala felt like this:

Scala vs Go – Could people help compare/contrast these on relative merits/demerits?


Convert Chinese LaTeX Source to HTML and PDF

June 27, 2014

I am going to write a book in Chinese. I hope that I can publish its chapters on my blog, so I can get feedback before it is printed. This requires that I can convert my manuscript into HTML format (for publishing in my blog) and PDF format (for printing).

I tried to write in Emacs Org mode, Wiki and Markdown. However, none of them support equatons well. So I decided to use LaTeX.

I tried several tools to convert LaTeX source into HTML, including htlatex and pandochtlatex does not support Chinese well, and pandoc supports only few LaTeX syntax. Finally, I decided to use Hevea, which works good to me.

I use XeLaTeX to convert LaTeX to PDF. Compared with PDFLaTeX, XeLaTeX works better with UTF-8 and TrueType Chinese fonts.

However, Hevea and XeLaTeX have different requirements with the preamble of LaTeX source. So I created tempaltes for them respectively. These templates use LaTeX’s \input directive to include a LaTeX source file containing the real text.

An example project is at https://github.com/wangkuiyi/hevea-xelatex.


Configure an HDFS for Development/Testing

May 10, 2014

I am using the Go implementation of WebHDFS interface: https://github.com/vladimirvivien/gowfs. In order to test it, I need to set up an HDFS on my development computer (Mac OS X 10.8, Hadoop-2.2.0). The author Vladimir Vivien reminded two properties to enable WebHDFS:

  1. Enable dfs.webhdfs.enabled property in hdfs-site.xml
  2. Ensure hadoop.http.staticuser.user property is set in your core-site.xml.

However, those are not enough. If you see error messages like the following reported by the append operation:

Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try.

You need to add the following properties in hdfs-site.xml

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

  <property>
    <name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
    <value>false</value>
  </property>

Editing CSV Files Using Emacs

May 9, 2014

CSV (comma-separated values) files are everywhere, though sometimes field are separated by tabs or spaces instead of comma. I see many people edit CSV files using spreadsheet software like Microsoft Excel. I edit CSV files using Emacs, so I do not have leave my programming environment.

Emacs can recognize CSV files if you installed the csv-mode: http://www.emacswiki.org/emacs-fr/download/csv-mode.el. Just download it and save it anywhere, and add the following line into your ~/.emacs file:

(load-file "/path/to/csv-mode.el")

Before you can edit your CSV file, make sure values were separated by comma. If they are not, the following simple command-line can help:

cat your_file | sed 's/\t/, /g' > your_file.csv

This command converts every tab in your_file into a comma and a space. The result file, your_file.csv, can be recognized by csv-mode now.

After opening your_file.csv using Emacs, you might want to use M-x toggle-truncate-lines to disable the warping of long lines. Then, you can use M-x csv-align-fields to align fields. This makes the file look like it is in Microsoft Excel.

Screenshot


How to Sample a Dirichlet-Multinomial Distribution

May 8, 2014

Consider the problem of sampling from a multinomial distribution Mult(\vec{x}|\vec{p}, n), where \vec{p} is sampled from a Dirichlet prior distribution Dir(\vec{p}|\vec{\alpha}).

A conceptually straight-forward solution is to sample \vec{p} from Dir(\vec{p}|\vec\alpha), and then to generated $\latex n$ samples from the discrete distribution defined by \vec{p}. As described by Wikipedia, sampling \vec{p}=\{p_1,\ldots,p_K\} can be done by drawing samples \{y_1,\ldots,y_K\} from K Gamma distributions: y_k \sim \Gamma(\alpha_k, 1) \text{,  } k\in[1,K], and then get \vec{p} by normalizing y_k: p_k = y_k/(\sum_k y_k). According to Wikipedia, if \alpha_k is a positive integer, we have \sum_{i=1}^{\alpha_k} - \log U_i \sim \Gamma(\alpha_k, 1), where U_i is a sample drawn from the uniform distribution over (0, 1]. However, if \alpha_k‘s are not positive integers, sampling Gamma would become a complex procedure.

Even if we can implement the algorithm that draws samples from Gamma and then Dirichlet, this algorithm would not be numerically robust. Consider that when U_i is close to 0, \log U_i would be Inf. Another dangerous point is that if we get successively K y_k=0, p_k=y_k/(\sum y_k) would lead to either divide-by-zero interrupt or make p_k NaN.

Fortunately, we can make use of the conjugacy between Dirichlet and multinomial. This conjugacy, as explained in the textbook Pattern Recognition and Machine Learning, states that \alpha_k is the prior number of observations of the multinomial output $k$. This leads to the following simple sampling method, which can be generalized further to sample from Dirichlet processes:

  1. \vec{p} = \vec\alpha, i = 0
  2. k \sim Discrete(\vec{p})
  3. p_k = p_k+1, x_k=x_k+1, i=i+1
  4. while i < n, goto 2.

Full Go code is as follows:

func sampleDirichletMultinomial(alpah []float64, n int, rng *rand.Rand) []int {
	dist := make([]float64, len(alpha))
	copy(dist, alpha)
	hist := make([]int, len(alpha))
	for i := 0; i < n; i++ {
		k := sampleDiscrete(dist, rng)
		dist[k] += 1.0
		hist[k]++
	}
	return hist
}

func sampleDiscrete(dist []float64, rng *rand.Rand) int {
	if len(dist) <= 0 {
		panic("sample from empty distribution")
	}
	sum := 0.0
	for _, v := range dist {
		if v < 0 {
			panic(fmt.Sprintf("bad dist: %v", dist))
		}
		sum += v
	}
	u := rng.Float64() * sum
	sum = 0
	for i, v := range dist {
		sum += v
		if u < sum {
			return i
		}
	}
	panic("sampleDiscrete gets out of all possiblilities")
}

Install GDB from Source on Mac OS X

April 21, 2014

It is OK to follow this tutorial to build GDB from source code:

  https://github.com/sirnewton01/godbg
 
But we need to apply a patch before ./configure and make as described in above link:
 
  cd gdb-7.7
  patch < ~/Download/patch // here we need to specify the file to be patched. It is bfd/mach-o.c
  ./configure –prefix=/Users/yiwang/usr –disable-dynamic –enable-static –enable-expact –enable-python
  make -j8
  make install

 


Extract Text from PDF Files

March 23, 2014

I got this solution from Stackoverflow.

A more comfortable way to do text extration: use pdftotext (available for Windows as well as Linux/Unix or Mac OS X). This utility is based either on Poppler or on XPDF. This is a command you could try:

 pdftotext \
   -f 13 \
   -l 17 \
   -layout \
   -opw supersecret \
   -upw secret \
   -eol unix \
   -nopgbrk \
   /path/to/your/pdf
   - |less

This will display the page range 13 (first page) to 17 (last page), preserve the layout of a double-password protected named PDF file (using user and owner passwords secret and supersecret), with Unix EOL convention, but without inserting pagebreaks between PDF pages, piped through less…


Follow

Get every new post delivered to your Inbox.

Join 31 other followers