Integrate Google Open Source Packages

November 21, 2010

Given the solution to making protobuf-2.3.0 using glog-0.3.1 (mentioned in my previous post), we can integrate the following Google open source projects:

  1. gflags
  2. gtest
  3. gmock
  4. glog
  5. protobuf

My friend, Shuo, pioneered such an integration earlier this year, on Ubuntu and using earlier version of above packages. His solution can be accessed here. This post is an update to Shuo’s solution:

  1. Shuo’s solution to make glog use libunwind correctly has been accepted in glog-0.3.1, so, it is no longer a consideration in this solution.
  2. The dependency graph is updated. The new graph implies a unique order to build above packages.
  3. The dependency edge between protobuf and glog does not exists until we apply the change mentioned in this previous post (to make protobuf-2.3.0 use glog-0.3.1).

My experiments were conducted on Snow Leopard, and with the following packages: gflags-1.4, gtest-1.5.0, gmock-1.5.0, glog-0.3.1, protobuf-2.3.0.


Make Google ProtoBuf Use Google Glog

November 21, 2010

When I change my code to use glog, instead of that one I wrote for MRML, I found that protobuf does not use glog but a simple bundled logging solution. So, most of log messages generated by my program are written into log files, but those from protobuf are not. An example is the complain of non-UTF-8 protobuf strings.

So I changed the following source files of protobuf-2.3.0: configure.ac, Makefile.am, common.h and common.cc, to make it use glog. Here attaches my diff for your reference.

--- /Users/wangyi/svnclient/protobuf-2.3.0/configure.ac	2010-01-09 03:20:15.000000000 +0800
+++ configure.ac	2010-11-20 21:52:55.000000000 +0800
@@ -138,6 +138,42 @@
 ACX_PTHREAD
 AC_CXX_STL_HASH
 
+AC_ARG_WITH(glog, AS_HELP_STRING[--with-glog=GLOG_DIR],
+  GLOG_CFLAGS="-I${with_glog}/include"
+  GLOG_LIBS="-L${with_glog}/lib -lglog"
+  CFLAGS="$CFLAGS $GLOG_CFLAGS"
+  CXXFLAGS="$CFLAGS $GLOG_CFLAGS"
+  LIBS="$LIBS $GLOG_LIBS"
+)
+AC_CHECK_LIB(glog, main, ac_cv_have_libglog=1, ac_cv_have_libglog=0)
+if test x"$ac_cv_have_libglog" = x"1"; then
+  AC_DEFINE(HAVE_LIB_GLOG, 1, [define if you have google glog library])
+  if test x"$GLOG_LIBS" = x""; then
+    GLOG_LIBS="-lglog"
+  fi
+else
+  GLOG_CFLAGS=
+  GLOG_LIBS=
+fi
+
+AC_ARG_WITH(gflags, AS_HELP_STRING[--with-gflags=GFLAGS_DIR],
+  GFLAGS_CFLAGS="-I${with_gflags}/include"
+  GFLAGS_LIBS="-L${with_gflags}/lib -lgflags"
+  CFLAGS="$CFLAGS $GFLAGS_CFLAGS"
+  CXXFLAGS="$CFLAGS $GFLAGS_CFLAGS"
+  LIBS="$LIBS $GFLAGS_LIBS"
+)
+AC_CHECK_LIB(gflags, main, ac_cv_have_libgflags=1, ac_cv_have_libgflags=0)
+if test x"$ac_cv_have_libgflags" = x"1"; then
+  AC_DEFINE(HAVE_LIB_GFLAGS, 1, [define if you have google gflags library])
+  if test x"$GFLAGS_LIBS" = x""; then
+    GFLAGS_LIBS="-lgflags"
+  fi
+else
+  GFLAGS_CFLAGS=
+  GFLAGS_LIBS=
+fi
+
 # HACK:  Make gtest's configure script pick up our copy of CFLAGS and CXXFLAGS,
 #   since the flags added by ACX_CHECK_SUNCC must be used when compiling gtest
 #   too.
--- /Users/wangyi/svnclient/protobuf-2.3.0/src/Makefile.am	2010-01-09 03:19:11.000000000 +0800
+++ src/Makefile.am	2010-11-20 23:16:56.000000000 +0800
@@ -76,8 +76,9 @@
 
 lib_LTLIBRARIES = libprotobuf-lite.la libprotobuf.la libprotoc.la
 
-libprotobuf_lite_la_LIBADD = $(PTHREAD_LIBS)
+libprotobuf_lite_la_LIBADD = $(PTHREAD_LIBS) $(GLOG_LIBS) $(GFLAGS_LIBS)
 libprotobuf_lite_la_LDFLAGS = -version-info 6:0:0 -export-dynamic -no-undefined
+libprotobuf_lite_la_CXXFLAGS = $(GLOG_CFLAGS) $(GFLAGS_CFLAGS)
 libprotobuf_lite_la_SOURCES =                                  \
   google/protobuf/stubs/common.cc                              \
   google/protobuf/stubs/once.cc                                \
@@ -95,8 +96,9 @@
   google/protobuf/io/zero_copy_stream.cc                       \
   google/protobuf/io/zero_copy_stream_impl_lite.cc
 
-libprotobuf_la_LIBADD = $(PTHREAD_LIBS)
+libprotobuf_la_LIBADD = $(PTHREAD_LIBS) $(GLOG_LIBS) $(GFLAGS_LIBS)
 libprotobuf_la_LDFLAGS = -version-info 6:0:0 -export-dynamic -no-undefined
+libprotobuf_la_CXXFLAGS = $(GLOG_CFLAGS) $(GFLAGS_CFLAGS)
 libprotobuf_la_SOURCES =                                       \
   $(libprotobuf_lite_la_SOURCES)                               \
   google/protobuf/stubs/strutil.cc                             \
@@ -181,7 +183,8 @@
   google/protobuf/compiler/python/python_generator.cc
 
 bin_PROGRAMS = protoc
-protoc_LDADD = $(PTHREAD_LIBS) libprotobuf.la libprotoc.la
+protoc_LDADD = $(PTHREAD_LIBS) $(GLOG_LIBS) $(GFLAGS_LIBS) libprotobuf.la libprotoc.la
+protoc_CXXFLAGS = $(GLOG_CFLAGS) $(GFLAGS_CFLAGS)
 protoc_SOURCES = google/protobuf/compiler/main.cc
 
 # Tests ==============================================================
--- /Users/wangyi/svnclient/protobuf-2.3.0/src/google/protobuf/stubs/common.h	2010-01-09 03:19:05.000000000 +0800
+++ src/google/protobuf/stubs/common.h	2010-11-20 23:07:02.000000000 +0800
@@ -48,6 +48,8 @@
 #include <stdint.h>
 #endif
 
+#include "glog/logging.h"
+
 namespace std {}
 
 namespace google {
@@ -588,64 +590,6 @@
 // ===================================================================
 // emulates google3/base/logging.h
 
-enum LogLevel {
-  LOGLEVEL_INFO,     // Informational.  This is never actually used by
-                     // libprotobuf.
-  LOGLEVEL_WARNING,  // Warns about issues that, although not technically a
-                     // problem now, could cause problems in the future.  For
-                     // example, a // warning will be printed when parsing a
-                     // message that is near the message size limit.
-  LOGLEVEL_ERROR,    // An error occurred which should never happen during
-                     // normal use.
-  LOGLEVEL_FATAL,    // An error occurred from which the library cannot
-                     // recover.  This usually indicates a programming error
-                     // in the code which calls the library, especially when
-                     // compiled in debug mode.
-
-#ifdef NDEBUG
-  LOGLEVEL_DFATAL = LOGLEVEL_ERROR
-#else
-  LOGLEVEL_DFATAL = LOGLEVEL_FATAL
-#endif
-};
-
-namespace internal {
-
-class LogFinisher;
-
-class LIBPROTOBUF_EXPORT LogMessage {
- public:
-  LogMessage(LogLevel level, const char* filename, int line);
-  ~LogMessage();
-
-  LogMessage& operator<<(const string& value);
-  LogMessage& operator<<(const char* value);
-  LogMessage& operator<<(char value);
-  LogMessage& operator<<(int value);
-  LogMessage& operator<<(uint value);
-  LogMessage& operator<<(long value);
-  LogMessage& operator<<(unsigned long value);
-  LogMessage& operator<<(double value);
-
- private:
-  friend class LogFinisher;
-  void Finish();
-
-  LogLevel level_;
-  const char* filename_;
-  int line_;
-  string message_;
-};
-
-// Used to make the entire "LOG(BLAH) << etc." expression have a void return
-// type and print a newline after each message.
-class LIBPROTOBUF_EXPORT LogFinisher {
- public:
-  void operator=(LogMessage& other);
-};
-
-}  // namespace internal
-
 // Undef everything in case we're being mixed with some other Google library
 // which already defined them itself.  Presumably all Google libraries will
 // support the same syntax for these so it should not be a big deal if they
@@ -670,78 +614,25 @@
 #undef GOOGLE_DCHECK_GT
 #undef GOOGLE_DCHECK_GE
 
-#define GOOGLE_LOG(LEVEL)                                                 \
-  ::google::protobuf::internal::LogFinisher() =                           \
-    ::google::protobuf::internal::LogMessage(                             \
-      ::google::protobuf::LOGLEVEL_##LEVEL, __FILE__, __LINE__)
-#define GOOGLE_LOG_IF(LEVEL, CONDITION) \
-  !(CONDITION) ? (void)0 : GOOGLE_LOG(LEVEL)
-
-#define GOOGLE_CHECK(EXPRESSION) \
-  GOOGLE_LOG_IF(FATAL, !(EXPRESSION)) << "CHECK failed: " #EXPRESSION ": "
-#define GOOGLE_CHECK_EQ(A, B) GOOGLE_CHECK((A) == (B))
-#define GOOGLE_CHECK_NE(A, B) GOOGLE_CHECK((A) != (B))
-#define GOOGLE_CHECK_LT(A, B) GOOGLE_CHECK((A) <  (B))
-#define GOOGLE_CHECK_LE(A, B) GOOGLE_CHECK((A) <= (B))
-#define GOOGLE_CHECK_GT(A, B) GOOGLE_CHECK((A) >  (B))
-#define GOOGLE_CHECK_GE(A, B) GOOGLE_CHECK((A) >= (B))
-
-#ifdef NDEBUG
-
-#define GOOGLE_DLOG GOOGLE_LOG_IF(INFO, false)
-
-#define GOOGLE_DCHECK(EXPRESSION) while(false) GOOGLE_CHECK(EXPRESSION)
-#define GOOGLE_DCHECK_EQ(A, B) GOOGLE_DCHECK((A) == (B))
-#define GOOGLE_DCHECK_NE(A, B) GOOGLE_DCHECK((A) != (B))
-#define GOOGLE_DCHECK_LT(A, B) GOOGLE_DCHECK((A) <  (B))
-#define GOOGLE_DCHECK_LE(A, B) GOOGLE_DCHECK((A) <= (B))
-#define GOOGLE_DCHECK_GT(A, B) GOOGLE_DCHECK((A) >  (B))
-#define GOOGLE_DCHECK_GE(A, B) GOOGLE_DCHECK((A) >= (B))
-
-#else  // NDEBUG
-
-#define GOOGLE_DLOG GOOGLE_LOG
-
-#define GOOGLE_DCHECK    GOOGLE_CHECK
-#define GOOGLE_DCHECK_EQ GOOGLE_CHECK_EQ
-#define GOOGLE_DCHECK_NE GOOGLE_CHECK_NE
-#define GOOGLE_DCHECK_LT GOOGLE_CHECK_LT
-#define GOOGLE_DCHECK_LE GOOGLE_CHECK_LE
-#define GOOGLE_DCHECK_GT GOOGLE_CHECK_GT
-#define GOOGLE_DCHECK_GE GOOGLE_CHECK_GE
-
-#endif  // !NDEBUG
-
-typedef void LogHandler(LogLevel level, const char* filename, int line,
-                        const string& message);
-
-// The protobuf library sometimes writes warning and error messages to
-// stderr.  These messages are primarily useful for developers, but may
-// also help end users figure out a problem.  If you would prefer that
-// these messages be sent somewhere other than stderr, call SetLogHandler()
-// to set your own handler.  This returns the old handler.  Set the handler
-// to NULL to ignore log messages (but see also LogSilencer, below).
-//
-// Obviously, SetLogHandler is not thread-safe.  You should only call it
-// at initialization time, and probably not from library code.  If you
-// simply want to suppress log messages temporarily (e.g. because you
-// have some code that tends to trigger them frequently and you know
-// the warnings are not important to you), use the LogSilencer class
-// below.
-LIBPROTOBUF_EXPORT LogHandler* SetLogHandler(LogHandler* new_func);
-
-// Create a LogSilencer if you want to temporarily suppress all log
-// messages.  As long as any LogSilencer objects exist, non-fatal
-// log messages will be discarded (the current LogHandler will *not*
-// be called).  Constructing a LogSilencer is thread-safe.  You may
-// accidentally suppress log messages occurring in another thread, but
-// since messages are generally for debugging purposes only, this isn't
-// a big deal.  If you want to intercept log messages, use SetLogHandler().
-class LIBPROTOBUF_EXPORT LogSilencer {
- public:
-  LogSilencer();
-  ~LogSilencer();
-};
+#define GOOGLE_LOG LOG
+#define GOOGLE_LOG_IF LOG_IF
+
+#define GOOGLE_CHECK CHECK
+#define GOOGLE_CHECK_EQ CHECK_EQ
+#define GOOGLE_CHECK_NE CHECK_NE
+#define GOOGLE_CHECK_LT CHECK_LT
+#define GOOGLE_CHECK_LE CHECK_LE
+#define GOOGLE_CHECK_GT CHECK_GT
+#define GOOGLE_CHECK_GE CHECK_GE
+
+#define GOOGLE_DLOG DLOG
+#define GOOGLE_DCHECK DCHECK
+#define GOOGLE_DCHECK_EQ DCHECK_EQ
+#define GOOGLE_DCHECK_NE DCHECK_NE
+#define GOOGLE_DCHECK_LT DCHECK_LT
+#define GOOGLE_DCHECK_LE DCHECK_LE
+#define GOOGLE_DCHECK_GT DCHECK_GT
+#define GOOGLE_DCHECK_GE DCHECK_GE
 
 // ===================================================================
 // emulates google3/base/callback.h
--- /Users/wangyi/svnclient/protobuf-2.3.0/src/google/protobuf/stubs/common.cc	2010-01-09 03:19:05.000000000 +0800
+++ src/google/protobuf/stubs/common.cc	2010-11-20 22:49:49.000000000 +0800
@@ -99,132 +99,6 @@
 
 }  // namespace internal
 
-// ===================================================================
-// emulates google3/base/logging.cc
-
-namespace internal {
-
-void DefaultLogHandler(LogLevel level, const char* filename, int line,
-                       const string& message) {
-  static const char* level_names[] = { "INFO", "WARNING", "ERROR", "FATAL" };
-
-  // We use fprintf() instead of cerr because we want this to work at static
-  // initialization time.
-  fprintf(stderr, "libprotobuf %s %s:%d] %s\n",
-          level_names[level], filename, line, message.c_str());
-  fflush(stderr);  // Needed on MSVC.
-}
-
-void NullLogHandler(LogLevel level, const char* filename, int line,
-                    const string& message) {
-  // Nothing.
-}
-
-static LogHandler* log_handler_ = &DefaultLogHandler;
-static int log_silencer_count_ = 0;
-
-static Mutex* log_silencer_count_mutex_ = NULL;
-GOOGLE_PROTOBUF_DECLARE_ONCE(log_silencer_count_init_);
-
-void DeleteLogSilencerCount() {
-  delete log_silencer_count_mutex_;
-  log_silencer_count_mutex_ = NULL;
-}
-void InitLogSilencerCount() {
-  log_silencer_count_mutex_ = new Mutex;
-  OnShutdown(&DeleteLogSilencerCount);
-}
-void InitLogSilencerCountOnce() {
-  GoogleOnceInit(&log_silencer_count_init_, &InitLogSilencerCount);
-}
-
-LogMessage& LogMessage::operator<<(const string& value) {
-  message_ += value;
-  return *this;
-}
-
-LogMessage& LogMessage::operator<<(const char* value) {
-  message_ += value;
-  return *this;
-}
-
-// Since this is just for logging, we don't care if the current locale changes
-// the results -- in fact, we probably prefer that.  So we use snprintf()
-// instead of Simple*toa().
-#undef DECLARE_STREAM_OPERATOR
-#define DECLARE_STREAM_OPERATOR(TYPE, FORMAT)                       \
-  LogMessage& LogMessage::operator<<(TYPE value) {                  \
-    /* 128 bytes should be big enough for any of the primitive */   \
-    /* values which we print with this, but well use snprintf() */  \
-    /* anyway to be extra safe. */                                  \
-    char buffer[128];                                               \
-    snprintf(buffer, sizeof(buffer), FORMAT, value);                \
-    /* Guard against broken MSVC snprintf(). */                     \
-    buffer[sizeof(buffer)-1] = '\0';                                \
-    message_ += buffer;                                             \
-    return *this;                                                   \
-  }
-
-DECLARE_STREAM_OPERATOR(char         , "%c" )
-DECLARE_STREAM_OPERATOR(int          , "%d" )
-DECLARE_STREAM_OPERATOR(uint         , "%u" )
-DECLARE_STREAM_OPERATOR(long         , "%ld")
-DECLARE_STREAM_OPERATOR(unsigned long, "%lu")
-DECLARE_STREAM_OPERATOR(double       , "%g" )
-#undef DECLARE_STREAM_OPERATOR
-
-LogMessage::LogMessage(LogLevel level, const char* filename, int line)
-  : level_(level), filename_(filename), line_(line) {}
-LogMessage::~LogMessage() {}
-
-void LogMessage::Finish() {
-  bool suppress = false;
-
-  if (level_ != LOGLEVEL_FATAL) {
-    InitLogSilencerCountOnce();
-    MutexLock lock(log_silencer_count_mutex_);
-    suppress = internal::log_silencer_count_ > 0;
-  }
-
-  if (!suppress) {
-    internal::log_handler_(level_, filename_, line_, message_);
-  }
-
-  if (level_ == LOGLEVEL_FATAL) {
-    abort();
-  }
-}
-
-void LogFinisher::operator=(LogMessage& other) {
-  other.Finish();
-}
-
-}  // namespace internal
-
-LogHandler* SetLogHandler(LogHandler* new_func) {
-  LogHandler* old = internal::log_handler_;
-  if (old == &internal::NullLogHandler) {
-    old = NULL;
-  }
-  if (new_func == NULL) {
-    internal::log_handler_ = &internal::NullLogHandler;
-  } else {
-    internal::log_handler_ = new_func;
-  }
-  return old;
-}
-
-LogSilencer::LogSilencer() {
-  internal::InitLogSilencerCountOnce();
-  MutexLock lock(internal::log_silencer_count_mutex_);
-  ++internal::log_silencer_count_;
-};
-
-LogSilencer::~LogSilencer() {
-  internal::InitLogSilencerCountOnce();
-  MutexLock lock(internal::log_silencer_count_mutex_);
-  --internal::log_silencer_count_;
-};
 
 // ===================================================================
 // emulates google3/base/callback.cc

Change gflags to Make It Work With boost::program_options

November 17, 2010

I changed the gflags-1.4/src/gflags.cc file to make it does not treat “undefined” flags as errors, which cause program fail. The patch content is as follows:

1049,1058c1049,1050
<       // NOTE(yiwang): here the original code invokes
<       //
<       //   undefined_names_[key] = "";    // value isn't actually used
<       //   error_flags_[key] = error_message;
<       //
<       // however, I commented these two lines out, because SplitArgumentLocked
<       // returns NULL only if it does not recognize the flag name, and we do
<       // not want to treat unrecognizable flag names as errors, when we want to
<       // make gflags work together with other flag parsing toolkit such as
<       // boost::program_options.
---
>       undefined_names_[key] = "";    // value isn't actually used
>       error_flags_[key] = error_message;

A test program is as follows:

#include <sys/utsname.h>

#include <iostream>                     // DEBUG
#include <string>
#include <vector>

#include <boost/program_options/option.hpp>
#include <boost/program_options/options_description.hpp>
#include <boost/program_options/variables_map.hpp>
#include <boost/program_options/parsers.hpp>

#include "glog/logging.h"
#include "gflags/gflags.h"
#include "mpi.h"

#include "../strutil/stringprintf.hh"

using std::string;
using std::vector;

DEFINE_string(lda_vocab_file, "",
              "This flag is a gflags flag, useless except test.");

int main(int argc, char** argv) {
  MPI_Init(&argc, &argv);

  std::cout << "Parameters after MPI_Init:\n";
  for (int i = 0; i < argc; ++i) {
    std::cout << "option " << i << " : " << argv[i] << "\n";
  }


  google::ParseCommandLineFlags(&argc, &argv, false);

  std::cout << "Parameters after ParseCommandLineFlags:\n";
  for (int i = 0; i < argc; ++i) {
    std::cout << "option " << i << " : " << argv[i] << "\n";
  }

  // Initialize glog and set log destination file.
  google::InitGoogleLogging(argv[0]);
  struct utsname buf;
  if (0 != uname(&buf)) {
    *buf.nodename = '\0'; // ensure null termination on failure
  }
  google::SetLogDestination(google::INFO,
                            StringPrintf(
                                "/tmp/ohmygod.%s.%s.INFO.",
                                buf.nodename, getenv("USER")).c_str());
  google::SetLogDestination(google::ERROR,
                            StringPrintf(
                                "/tmp/ohmygod.%s.%s.ERROR.",
                                buf.nodename, getenv("USER")).c_str());

  // Parse command line flags left after google::ParseCommandLineFlags.
  namespace po = boost::program_options;

  po::options_description desc("Supported options");
  desc.add_options()
      ("mrml_help",
       "Produce help message.")
      ("mrml_map_only",
       po::value<bool>()->default_value(false),
       "Specify a map-only mapreduce task, and must not specify reduce class.")
      ;

  vector<string> rest_options;
  po::variables_map vm;
  try {
    po::parsed_options parsed = po::command_line_parser(argc, argv).
                                options(desc).allow_unregistered().run();
    po::store(parsed, vm);
    po::notify(vm);
    rest_options =
        po::collect_unrecognized(parsed.options, po::include_positional);
  } catch (const po::error& e) {
    LOG(ERROR) << "Error in parsing command line options: " << e.what();
    return false;
  }

  // Print help message if requested.
  if (vm.count("mrml_help")) {
    std::cout << desc << "\n";
  }

  std::cout << "Set lda_vocab_file = " << FLAGS_lda_vocab_file << "\n";

  LOG(INFO) << "Hello world!";
  return 0;
}

C and C++ String Comparison

November 14, 2010

Thanks to Rick and Charlie, who located and fixed a head-scratching bug in my code.  This bug is because when I compare two wraps of std::string, I compare char by char.  However, the correct way should be comparing unsigned-char by unsigned char.  Since my compare generate different results as std::string::compare, my code work in a wield way.

Here follows how std::string::compare and memcmp works in their GNU implementation:

std::string and std::wstring are realizations of class template basic_string (defined in <header-prefix>/c++/<version>/bits/basic_string.h, and <header-prefix> is usually /usr/include). basic_string::compare invokes char_traits<char>::compare (for std::string) or char_traits<wchar_t>::compare (for std::wstring). The former invokes memcpy and the latter invokes wmemcmp.  So you see the C++ implementation depends on C implementation.

I downloaded the source tarball of glibc-2.12.1. In glibc-2.12.1/string/memcmp.c, there is the implementation of memcmp, which compares two strings byte-by-byte, and byte is defined in the same file as unsigned char. In glibc-2.12.1/wcsmbs/wmemcmp.c, there is wmemcmp, which compares two strings wint_t by wint_t, and wint_t is defined in include/wctype.h as unsigned int (unsigned again).

I also checked glibc-2.12.1/string/strcmp.c, where strcmp also compares unsigned char by unsigned char.

So both C and C++ compare string by treating each character an unsigned value.


Disappointing Search Feature in WordPress

November 14, 2010

I tried to search a post titled “Graphical SVN Diff” in my blog using keywords “graphical svn diff”.  However, WordPress found nothing.  I changed to “svn”, nothing; and “diff”, nothing. Then I Googled “graphical svn diff site:cxwangyi.wordpress.com”, the first result is what I want.


Follow

Get every new post delivered to your Inbox.