Running Hadoop on Mac OS X (Single Node)

I installed Hadoop, built its C++ components, and built and ran Pipes programs on my iMac running Snow Leopard.

Installation and Configuration

Basically, I followed Michael G. Noll’s guide, Running Hadoop On Ubuntu Linux (Single-Node Cluster), with two things different from the guide.

In Mac OS X, we need to choose to use Sun’s JVM. This can be done using System Preference. Then In both .bash_profile and $HADOOP_HOME/conf/hadoop-env.sh, set the JAVA_HOME environment variable:
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home

I did not create special account for running Hadoop. (I should, for security reasons, but I am lazy and my iMac is only for personal development, but not real computing…) So, I need to chmod a+rwx /tmp/hadoop-yiwang, where yiwang is my account name, as well what ${user.name} refers to in core-site.xml.

After finishing installation and configuration, we should be able to start all Hadoop services, build and run Hadoop Java programs, and monitor their activities.

Building C++ Components

Because I do nothing about Java, I write Hadoop programs using Pipes. The following steps build Pipes C++ library in Mac OS X:

  1. Install XCode and open a terminal window
  2. cd $HADOOP_HOME/src/c++/utils
  3. ./configure
  4. make install
  5. cd $HADOOP_HOME/src/c++/pipes
  6. ./configure
  7. make install

Note that you must build utils before pipes.

Build and Run Pipes Programs

The following command shows how to link to Pipes libraries:

g++ -o wordcount wordcount.cc \
-I${HADOOP_HOME}/src/c++/install/include \
-L${HADOOP_HOME}/src/c++/install/lib \
-lhadooputils -lhadooppipes -lpthread

To run the program, we need a configuration file, as shown by Apache Hadoop Wiki page.

Build libHDFS

There are some bugs in libHDFS of Apache Hadoop 0.20.2, but it is easy to fix them:

cd hadoop-0.20.2/src/c++/libhdfs
./configure
Remove #include "error.h" from hdfsJniHelper.c
Remove -Dsize_t=unsigned int from Makefile
make
cp hdfs.h ../install/include/hadoop
cp libhdfs.so ../install/lib

Since Mac OS X uses DYLD to mange shared libraries, you need to specify the directory holding libhdfs.so using environment variable DYLD_LIBRARY_PATH. (LD_LIBRARY_PATH does not work.):

export DYLD_LIBRARY_PATH=$HADOOP_HOME/src/c++/install/lib:$DYLD_LIBRARY_PATH

You might want to add above line into your shell configure file (e.g., ~/.bash_profile).