# Parsing Go Programs with Type Inference

I wrote an example program to demonstrate how a Go program can parse itself into an AST (abstract syntax tree) and infer types of nodes (expressions) in this tree.

# Build WSTP Programs on OSX 10.10 and Mathematica 10

I tried to write a C/C++ program which talks with Mathematica via the WSTP protocol. However, when I build the program using the Makefile provided in directory /Applications/Mathematica.app/SystemFiles/Links/WSTP/DeveloperKit/MacOSX-x86-64/WSTPExamples, clang++ complains during linking that

Undefined symbols for architecture x86_64:
"std::__basic_file::is_open() const", referenced from:
WSTP::MLLog::logSelectorToFileWithName(WSTP::mllogselector, char const*) in libWSTPi4.a(mllog.cpp.o)
...

This is because the libWSTPi4.a file bundled with Mathematica 10 was built with libstdc++ on Mac OS X. Therefore, a solution is to add link flag -stdlib=libstdc++ explicitly in the Makefile. For example:

factor : factor.o
${CXX}${EXTRA_CFLAGS} -I${INCDIR} factor.o -L${LIBDIR} -lWSTPi4 -lm -lpthread -stdlib=libstdc++ -lstdc++ -framework Foundation -o $@ An alternative solution is to libraries in /Applications/Mathematica.app/SystemFiles/Links/WSTP/DeveloperKit/MacOSX-x86-64/CompilerAdditions/AlternativeLibraries. # Scala vs Go – Could people help compare/contrast these on relative merits/demerits? Answer by Nick Snyder: I have written Go at Google (and in my own time) and Scala at LinkedIn. Both are modern languages with first class concurrency features. What follows is my subjective comparison of Scala and Go… Go is an opinionated, minimal language that compiles to machine code. Scala is a sophisticated, academic, functional, object-oriented, everything-but-the-kitchen-sink, free-for-all language that has a lot of features and runs on the JVM. Given my experience with both, I would choose Go over Scala every time for one reason: simplicity. Before I dive in to my answer, I want to make a few general observations: • All else equal, less code is easier to understand than more code. All else is rarely equal. • Code is read much more often than it is written. • Code frequently lives longer than we want it to. • The person who tests or maintains a piece of code is frequently not the original author. • At scale, the skill level of developers reading/writing/maintaining/testing code is going to be a normal distribution around the mean of "not expert." Writing code is an act of communication, not just between the author and the compiler (or runtime), but also between the author and a future reader of unknown skill level. Language complexity Java 8 language spec is a 780 page PDF (lol, seriously?). http://docs.oracle.com/javase/sp… Scala language spec is a 191 page PDF. http://www.scala-lang.org/docu/f… The Go language spec is webpage that prints as a 51 page PDF. http://golang.org/ref/spec Defining a language is not the same as learning how to use a language, but it is a proxy for how much there is to learn (or how much there is to confuse you when reading someone else's code). Documentation I found Go easier to learn, both because it is (objectively) a simpler language and (subjectively) because I think the documentation is better. Standard library: http://www.scala-lang.org/api/cu… vs http://golang.org/pkg/ Expressiveness Given the language specs, it should be no surprise that Scala has more features than Go. Does that mean Go is any less expressive than Scala? No, you just might need to write a few extra lines to code to do what you want (but I don't think that is a bad thing). One of Go's features is that it doesn't have an excess of features, and frankly, I think that feature is undervalued. http://golang.org/doc/faq#Why_do… I have yet to encounter a situation in Go where I wished it had a feature that it doesn't. The Sim City effect Don't bother Googling this, I just made it up. You know Sim City, right? SimCity Official Website If two people start a new city on the same map and play for a few hours, they will produce completely different looking cities. Why is that? Because Sim City is a sandbox and there are so many options that it is highly improbable that any two people will make the same series of decisions. Scala is a sandbox too. As the other answer alluded to, in addition to the plethora of features that Scala ships with (functional programming features, OO programming features, etc.), Scala exposes capabilities that allow developers to add new features. This is all well and good until you need to share your sandbox with others and they have no idea what was going on in your head. Speaking of consistency… Go is the only language that I know of that has circumvented the whole code style debate by just providing a tool to format your code in the canonical format. http://golang.org/cmd/gofmt/ Code density Scala code can be very dense and hard to grok at the early stages of learning. I would go so far as to say that it is idiomatic in Scala to be as dense as possible, or at least it seems to be that experienced Scala developers have a propensity for terseness. Example Consider fetching a user id from a cookie. How much language knowledge do you need to answer the following questions given the implementation? • What happens if the cookie is not present? • What happens if the cookie value is not a well formatted number? • What happens if the cookie value is a negative number? Scala  1 2 3 4 5 import play.api.mvc.RequestHeader def getUserId()(implicit request: RequestHeader) = { request.cookies.get("uid").map(_.value.toLong).filter(_ > 0) } Go  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import ( "fmt" "http" "strconv" ) func getUserId(r *http.Request) (int64, error) { c, err := r.Cookie("uid") if err != nil { return 0, err } i, err := strconv.ParseInt(c.Value, 10, 64) if err != nil { return 0, err } if i <= 0 { return 0, fmt.Errorf("invalid user id") } return i, nil } In this particular case, the Scala code is clearly shorter and perhaps more eloquent, but the point that I am trying to illustrate in general is that the Go code is explicit and the Scala code requires context to understand. Be$#%!ing explicit

In my experience, explicit code has a lot of benefits.

• Explicit code is easier for novices and for non-authors to grok.
• Explicit code makes the error cases obvious.
• Explicit code makes the test cases obvious.
• Explicit code is easier to debug (try setting a breakpoint in the Scala code above).

I have found Go code to be much more explicit than Scala.

Performance

As a developer, the only performance that I care about is my development cycle because I have yet to encounter a runtime that isn't fast enough for what I want to accomplish. Developer time is more valuable than a computer's time anyway.

In my experience, Go is extremely fast to compile, and Scala is slow. YMMV.

To be fair

I learned Go before I learned Scala so I admit that I probably have a bias toward Go. My first reaction to Go was that its syntax was ugly (like C++) but learning Go felt like this:

I learned Scala out of necessity and have gotten used to it; I even like parts of it now, but learning Scala felt like this:

Scala vs Go – Could people help compare/contrast these on relative merits/demerits?

# Convert Chinese LaTeX Source to HTML and PDF

I am going to write a book in Chinese. I hope that I can publish its chapters on my blog, so I can get feedback before it is printed. This requires that I can convert my manuscript into HTML format (for publishing in my blog) and PDF format (for printing).

I tried to write in Emacs Org mode, Wiki and Markdown. However, none of them support equatons well. So I decided to use LaTeX.

I tried several tools to convert LaTeX source into HTML, including htlatex and pandochtlatex does not support Chinese well, and pandoc supports only few LaTeX syntax. Finally, I decided to use Hevea, which works good to me.

I use XeLaTeX to convert LaTeX to PDF. Compared with PDFLaTeX, XeLaTeX works better with UTF-8 and TrueType Chinese fonts.

However, Hevea and XeLaTeX have different requirements with the preamble of LaTeX source. So I created tempaltes for them respectively. These templates use LaTeX’s \input directive to include a LaTeX source file containing the real text.

An example project is at https://github.com/wangkuiyi/hevea-xelatex.

# Configure an HDFS for Development/Testing

I am using the Go implementation of WebHDFS interface: https://github.com/vladimirvivien/gowfs. In order to test it, I need to set up an HDFS on my development computer (Mac OS X 10.8, Hadoop-2.2.0). The author Vladimir Vivien reminded two properties to enable WebHDFS:

1. Enable dfs.webhdfs.enabled property in hdfs-site.xml

However, those are not enough. If you see error messages like the following reported by the append operation:

Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try.

You need to add the following properties in hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>false</value>
</property>

# Editing CSV Files Using Emacs

CSV (comma-separated values) files are everywhere, though sometimes field are separated by tabs or spaces instead of comma. I see many people edit CSV files using spreadsheet software like Microsoft Excel. I edit CSV files using Emacs, so I do not have leave my programming environment.

Before you can edit your CSV file, make sure values were separated by comma. If they are not, the following simple command-line can help:

cat your_file | sed 's/\t/, /g' > your_file.csv

This command converts every tab in your_file into a comma and a space. The result file, your_file.csv, can be recognized by csv-mode now.

After opening your_file.csv using Emacs, you might want to use M-x toggle-truncate-lines to disable the warping of long lines. Then, you can use M-x csv-align-fields to align fields. This makes the file look like it is in Microsoft Excel.

# How to Sample a Dirichlet-Multinomial Distribution

Consider the problem of sampling from a multinomial distribution $Mult(\vec{x}|\vec{p}, n)$, where $\vec{p}$ is sampled from a Dirichlet prior distribution $Dir(\vec{p}|\vec{\alpha})$.

A conceptually straight-forward solution is to sample $\vec{p}$ from $Dir(\vec{p}|\vec\alpha)$, and then to generated $\latex n$ samples from the discrete distribution defined by $\vec{p}$. As described by Wikipedia, sampling $\vec{p}=\{p_1,\ldots,p_K\}$ can be done by drawing samples $\{y_1,\ldots,y_K\}$ from K Gamma distributions: $y_k \sim \Gamma(\alpha_k, 1) \text{, } k\in[1,K]$, and then get $\vec{p}$ by normalizing $y_k$: $p_k = y_k/(\sum_k y_k)$. According to Wikipedia, if $\alpha_k$ is a positive integer, we have $\sum_{i=1}^{\alpha_k} - \log U_i \sim \Gamma(\alpha_k, 1)$, where $U_i$ is a sample drawn from the uniform distribution over $(0, 1]$. However, if $\alpha_k$‘s are not positive integers, sampling Gamma would become a complex procedure.

Even if we can implement the algorithm that draws samples from Gamma and then Dirichlet, this algorithm would not be numerically robust. Consider that when $U_i$ is close to 0, $\log U_i$ would be Inf. Another dangerous point is that if we get successively K $y_k=0$, $p_k=y_k/(\sum y_k)$ would lead to either divide-by-zero interrupt or make $p_k$ NaN.

Fortunately, we can make use of the conjugacy between Dirichlet and multinomial. This conjugacy, as explained in the textbook Pattern Recognition and Machine Learning, states that $\alpha_k$ is the prior number of observations of the multinomial output $k$. This leads to the following simple sampling method, which can be generalized further to sample from Dirichlet processes:

1. $\vec{p} = \vec\alpha$, $i = 0$
2. $k \sim Discrete(\vec{p})$
3. $p_k = p_k+1$, $x_k=x_k+1$, $i=i+1$
4. while $i < n$, goto 2.

Full Go code is as follows:

func sampleDirichletMultinomial(alpah []float64, n int, rng *rand.Rand) []int {
dist := make([]float64, len(alpha))
copy(dist, alpha)
hist := make([]int, len(alpha))
for i := 0; i < n; i++ {
k := sampleDiscrete(dist, rng)
dist[k] += 1.0
hist[k]++
}
return hist
}

func sampleDiscrete(dist []float64, rng *rand.Rand) int {
if len(dist) <= 0 {
panic("sample from empty distribution")
}
sum := 0.0
for _, v := range dist {
if v < 0 {