Install and Configure Hadoop on Mac OS X

Download and Install

  1. Download Hadoop (at the time of writing this essay, it is version 0.20.1) and unpack it into, say, ~wyi/hadoop-0.20.1.
  2. Install JDK 1.6 for Mac OS X.
  3. Edit your ~/.bash_profile to add the following lines
    export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home
    export HADOOP_HOME=~wyi/hadoop-0.20.1
    export PATH=$HADOOP_HOME/bin:$PATH
  4. Edit ~wyi/hadoop-0.20.1/conf/hadoop-env.sh to define JAVA_HOME varialbe as
    export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home 
  5. Try to run the command hadoop

Run An Example Program
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging. The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

    cd ~/wyi/hadoop-0.20.1
    mkdir input
    cp conf/*.xml input
    bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
    cat output/*
Note that before you re-run this example, you need to delete directory output, otherwise, Hadoop will complain that directory exists.