Setup HDFS and Its Go Binding on Mac OS X

I am planning a distributed machine learning system ,which is a combination of the Go programming language and the HDFS distributed filesystem.  So I tried on my iMac to setup HDFS and its Go binding.  Luckily, the work seems straightforward.

Install Hadoop

I downloaded the stable version 1.0.3, and unpacked the tar ball.

Install Java

With Mac OS X, run /Applications/Utilities/Java Preference to install Sun Java SE 1.6.0 and set it as the default Java implementation.

Setup Hadoop

Single node setup follows strictly by docs/single_node_setup.html. This includes pseudo-distributed operation.

Build libhdfs

In order to build libhdfs on Mac OS X, please follow instructions at https://github.com/forward/node-hdfs.

Test libhdfs from C

Then we can write a sample C/C++ program:

#include "hdfs.h"
int main() {
  hdfsFS fs = hdfsConnect("localhost", 9000);

  int numEntries = 0;
  hdfsFileInfo* files = hdfsListDirectory(fs, "./", &numEntries);
  for (int i = 0; i < numEntries; ++i) {
    printf("%s\n", files[i].mName);
  }
  hdfsFreeFileInfo(files, numEntries);
  return hdfsDisconnect(fs);
}

Please note that localhost:9000 is what we wrote in core-site.xml when we configure Hadoop single node.

To build the sample program:

     g++ learn-hdfs.cc -o learn-hdfs -I/Users/wangyi/hadoop-1.0.3/src/c++/libhdfs -I/System/Library/Frameworks/JavaVM.framework/Versions/A/Headers  -L/Users/wangyi/hadoop-1.0.3/src/c++/libhdfs/.libs -framework JavaVM -lhdfs

It is notable about the include path and the -framework option.

Setup CLASSPATH

To run the sample program, we need to setup CLASSPATH, as instructed by document docs/libhdfs.html.

Build libhdfs Go Binding

I use https://github.com/zyxar/hdfs.go. But this need some modifications:

  1. The project (and directory) name is hdfs.go, but package is hdfs. I change the directory name into hdfs.
  2. To build on Mac OS X, we need to change the #cgo directives in hdfs.go:
         // #cgo darwin CFLAGS: -I/Users/wangyi/hadoop-1.0.3/src/c++/libhdfs -I/System/Library/Frameworks/JavaVM.framework/Versions/A/Headers
         // #cgo darwin LDFLAGS: -L/usr/lib/java -L/Users/wangyi/hadoop-1.0.3/src/c++/install/lib -lhdfs -framework JavaVM
    

Note that the library directory should contain not only libhdfs.a, but also the dynamic library.

Test libhdfs Go Binding

Run go test for testing. This will test the append function of HDFS, which is disabled by default. To enable it, modify conf/hdfs-site.xml, and add the following:

dfs.support.append
true

Trouble Shooting

“Unable to load realm info from SCDynamicStore.” This is due to a JRE bug and do not harm Hadoop. As discussed in https://issues.apache.org/jira/browse/HADOOP-7489, the Hadoop team won’t fix it.

“Unable to load native-hadoop library for your platform… using builtin-java classes where applicable”:
As stated in http://hadoop.apache.org/docs/r1.0.3/native_libraries.html#Supported+Platforms: The native hadoop library is supported on *nix platforms only. The library does not to work with Cygwin or the Mac OS X platform.

About these ads

One Response to Setup HDFS and Its Go Binding on Mac OS X

  1. Alissa says:

    I similar to this internet site because therefore a first-rate deal
    utile material on here : D.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 31 other followers

%d bloggers like this: