Is Word Segmentation Parallelizable?

I know little about NLP, I read nothing about word segmentation algorithm; however, given my knowledge on hidden markov model and high-order markov model, I though word segmentation is, given

  1. a sequence of characters S,
  2. a set of words (each word is a sequence of characters), and
  3. a language model (first-order transitions between words) P(w_j|w_i)

to infer n (n\geq{}0) segments in S, and get \{w_1,\ldots,w_n\}.

If the optimal segmentation is the one maximizing O=P(w_1,\ldots,w_n), then it can be solved using dynamic programming strategy.  Use the following notations:

  • M: the maximal length of a word,
  • L(S): the length of string S,
  • sos: the special word denoting the start of a sentence,
  • eos: the special word denoting the end of a sentence.

the dynamic programming solution is as the following recursive process:

  1. Start: \max_{1\leq l \leq M} P(S_{0,l-1} | sos) O(S_{l,L(S)} | S_{0,l-1})
  2. Recursion: O(S|w) = \max_{1\leq l \leq M} P(S_{0,l-1}|w) O(S_{l,L(S)} | S_{0,l-1})
  3. Stop: if l>L(S), then S_{l,L(S)}\equiv eos.

As a tail-recursion, this recursive process can be expressed as an equivalent iteration form: starting by checking the M possible words near the end of the sentence, for each of them, check possible word segmentation at the end of rest sentence.

Alternatively, above procedure can be expressed in an equivalent form: segment the first word at the end of the sentence. This alternative expression results in an iteration starting word segmentation from the beginning of the sentence.

If each processor has the whole language model, it would be trivial to parallelize above procedure. For example, in the Start step, we need to compute M branches before taking \max. These M computations can be distributed on M processors.

But if the language model is huge and is distributed on multiple processors, it seems difficult (?) to divide-and-conquer the segmentation over these processors with an efficient sharding of both computational structural (iteration steps) and storage structural (parts of language model).

The final question is: usually the sentence S is not long, it might not be worth to distribute it over processors for distributed segmentation.