Learning Go: RPC between Go and C/C++/Ruby/etc

I am curious to develop an online system using Go, but I have to rely
on some ready-to-use code in C/C++ (and maybe other languages), for
example, CJK word segmenters written in C/C++ or Java.

To invoke existing code from Go programs, we can build exisitng code
into shared libraries and wrap them using SWIG, as discussed in
this post
. An alternative to SWIG is to build exisitng code in an
RPC server.

A well-known RPC mechanism is Apache Thrift, which officially
supports C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#,
Cocoa, JavaScript, Node.js, Smalltalk, and OCaml. Go is not in the
list, but we can resort to thrift4go.

Other than Thrift, MsgPack-RPC is
another solution. A major difference between them is that Thrift data
and services must follow pre-defined schema (similar to Google
Protocol Buffer), but MsgPack-RPC is schema-free (the same idea behind
JSON and XML).

Both Thrift and MsgPack-RPC can be used to build multi-threading
servers, at least when you program in C++. Thrift has several C++
base classes, each defines a serving strategy. Among them, the
TNonblockingServer supports thread per request and is very
likely what you want. Other serving strategies process all requests
using a single thread and the polling trick, or process requests from
a connection using a thread (thread per connection). The
single-thread strategy does not make full use of your hardware, and
the thread-per-connection one is vulnerable to DoS attacks [1].

MsgPack-RPC delivers simply the best choice — it always process all
requests in parallel even if the messages were reached via a single
connection [2].

In order to use MsgPack-RPC as a bridge between Go and C++ programs,
we need the following packages:

  1. MsgPack for C++, which serializes and parses data, can be
    installed from pre-built packages. On Mac OS X, type the command:

     brew install msgpack 
  2. mpio, a dependency
    of MsgPack-PRC for C++, need to be built from source code:

    git clone https://github.com/frsyuki/mpio.git
    cd mpio
    ./bootstrap
    ./configure --enable-static --prefix=<where-to-install-mpio>
    make && make install
    
  3. MsgPack-RPC for C++, need to be built from source code:
    git clone https://github.com/bketelsen/msgpack-pack.git
    cd msgpack-rpc/cpp
    ./bootstrap
    ./configure --enable-static --prefix=<where-to-install> \
                --with-mpio=<where-to-install-mpio>
    make && make install
    
  4. MsgPack for Go, need to be built from source code.
    git clone https://github.com/bketelsen/msgpack.git
    cd msgpack/go
    make && make install
    
  5. MsgPack-RPC for Go, need to be built from source code:
  6. git clone https://github.com/bketelsen/msgpack-pack.git
    cd msgpack-rpc/go/rpc
    make && make install
    

There attaches a CJK word segmenter RPC server written in C++, and a
simple RPC client written in Go. Without lossing the generality, the
C++ code uses a proprietary word segmenter. (I modifed the API
slightly to avoid legal issues.)

#include <iconv.h>
#include <msgpack/rpc/server.h>
#include <string>
#include "wordseg/simple_segmenter.h"

namespace rpc {
  using namespace msgpack;
  using namespace msgpack::rpc;
}  // namespace rpc

using std::string;

class WordSegmenter : public rpc::dispatcher {
public:
  typedef rpc::request request;


  void dispatch(request req) {
    try {
      std::string method;
      req.method().convert(&method);

      if (method == "segment") {
	msgpack::type::tuple<std::string> params;
	req.params().convert(&params);
	segment(req, params.get<0>());
      } else {
	req.error(msgpack::rpc::NO_METHOD_ERROR);
      }

    } catch (msgpack::type_error& e) {
      req.error(msgpack::rpc::ARGUMENT_ERROR);
      return;

    } catch (std::exception& e) {
      req.error(std::string(e.what()));
      return;
    }
  }

  bool ConvertUTF8ToGBK(const std::string& input,
			char* output_buffer,
			size_t buffer_size) const {
    iconv_t iconv_desc = iconv_open("gbk", "utf-8");
    if (iconv_desc == 0) {
      return false;
    }

    // Ensures that output is 0-terminated, and it is not safe to
    //  output_buffer[buffer_size - out_bytes_left] = '\0';
    memset(output_buffer, 0, buffer_size);

    char* input_buffer = const_cast<char*>(input.c_str());  // iconv API flaw.
    size_t in_bytes_left = input.size();
    size_t out_bytes_left = buffer_size;
    if (iconv(iconv_desc, &input_buffer, &in_bytes_left,
	      &output_buffer, &out_bytes_left) == (size_t)(-1)) {
      return false;
    }

    iconv_close(iconv_desc);
    return true;
  }


  bool ConvertGBKToUTF8(const char* input,
			char* output_buffer,
			size_t buffer_size) const {
    if (input == NULL) {
      return false;
    }

    iconv_t iconv_desc = iconv_open("utf-8", "gbk");
    if (iconv_desc == 0) {
      return false;
    }

    // Ensures that output is 0-terminated, and it is not safe to
    //  output_buffer[buffer_size - out_bytes_left] = '\0';
    memset(output_buffer, 0, buffer_size);

    char* input_buffer = const_cast<char*>(input);  // iconv API flaw.
    size_t in_bytes_left = strlen(input);
    size_t out_bytes_left = buffer_size;
    if (iconv(iconv_desc, &input_buffer, &in_bytes_left,
	      &output_buffer, &out_bytes_left) == (size_t)(-1)) {
      return false;
    }

    iconv_close(iconv_desc);
    return true;
  }


  bool TokenizeGBKIntoUTF8(const string& input, std::string* output) {
    const int BUFFER_SIZE = 8*1024;  // 8KB in cap.
    char utf8_buffer[BUFFER_SIZE];
    if (!ConvertUTF8ToGBK(input, utf8_buffer, BUFFER_SIZE)) {
      return false;
    }

    HANDLE segmenter = CreateSegHandle();
    if (segmenter == NULL) {
      return false;
    }
    if (!Segment(segmenter, utf8_buffer)) {
      return false;
    }

    char word_buffer[BUFFER_SIZE];
    for (int i = 0; i < GetResultCnt(segmenter); ++i) {
      if (!ConvertGBKToUTF8(GetWordAt(segmenter, i),
			    word_buffer, BUFFER_SIZE)) {
	return false;
      }
      *output += string(word_buffer) + "\n";
    }

    CloseSegHandle(segmenter);
    return true;
  }


  void segment(request req, const std::string& msg) {
    std::string output;
    if (!TokenizeGBKIntoUTF8(msg, &output)) {
      req.error(msgpack::rpc::ARGUMENT_ERROR);
      return;
    }
    req.result(output);
  }
};

A simple client program in Go is as follows:

package main

import (
	"flag"
	"fmt"
	"msgpack/rpc"
	"net"
)

var server *string = flag.String("server", "127.0.0.1:18811", "the server name and port")

func main() {
	conn, err := net.Dial("tcp", *server)
	if err != nil {
		return
	}

	client := rpc.NewSession(conn, true)

	msg := string("用于存储词条及词条相关属性")
	retval, xerr := client.Send("segment", msg)
	if xerr != nil {
		panic("oops")
	}
	fmt.Println(retval.String())
}