11.2 Fine-grain multithread programming with MassiveThreads

SML# provides MassiveThreads-based multithread support by default. MassiveThreads is a user-level lightweight thread library in C provided by the University of Tokyo. The SML#’s direct C interface and unobtrusive concurrent garbage collection enable SML# programs to call MassiveThreads directly. MassiveThreads allows us to create millions of user threads, say 1,000,000 threads, that runs concurrently on multicore processors.

By default, time SML# runtime is restricted to use only one CPU core for the performance of single-thread programs. To enable MassiveThreads on multicore processors, specify at least one MYTH_* environment variable as a configuration of MassiveThreads. A typical one is MYTH_NUM_WORKERS that specifies the number of worker threads, i.e., the number of CPU cores the program uses. For example, do the following command to start an interactive session:

$ MYTH_NUM_WORKERS=0 smlsharp

In Linux, setting MYTH_NUM_WORKERS to 0 means using all available CPU cores.

SML# provides Myth structure that is a direct binding of MassiveThreads library in SML#. In Myth.Thread structure, you find basic functions for thread management. Its primary functions are the following:

•

User thread creation.

Myth.Thread.create : (unit -> int) -> Myth.thread

$\mbox{\tt create}\ f$ creates a new user thread that computes $f\ \mbox{\tt()}$ . The created user thread will be scheduled to an appropriate CPU core by the MassiveThreads task scheduler. Its scheduling policy is non-preemptive; a thread occupies a CPU core until either it calls a thread control function of MassiveThreads (Myth.Thread.yield) or it terminates.
•

User thread join.

Myth.Thread.join : Myth.thread -> int

$\mbox{\tt join}\ t$ waits for the completion of thread $t$ and returns the result of the computation of $t$ . Any user thread created by create must be joined sometime in the future. Note that this Myth.Thread structure is just a direct binding of C functions, as in C, the resource of the created threads must be freed explicitly.
•

User thread scheduling.

Myth.Thread.yield : unit -> unit

$\mbox{\tt yield}\ \mbox{\tt()}$ yields the CPU core to other threads.

As an introduction to MassiveThreads programming, let us write a task parallel program. Roughly speaking, you can write a task parallel program by the following steps:

1.

Write a recursive function that performs divide-and-conquer.
2.

Surround each recursive call with a pair of create and join so that each recursive call is evaluated in a different thread.
3.

To prevent from creating very short threads, set a threshold (cut-off) to stop thread creation and do recursive calls in the same thread. The threshold must be decided so that sequential wall-clock time of a user thread is sufficiently longer than the overhead of thread creation. In practice, 3–4 microseconds for a user thread is good enough.

For example, let us write a program that compute fib 40 recursively. The following is a typical definition of recursive fib function:

fun fib 0 = 0
| fib 1 = 1
| fib n = fib (n - 1) + fib (n - 2)
val result = fib 40

To compute fib (n - 1) and fib (n - 2) in parallel, surround one of them with create and join:

fun fib 0 = 0
  | fib 1 = 1
  | fib n =
    let
      val t2 = Myth.Thread.create (fn () => fib (n - 2))
    in
      fib (n - 1) + Myth.Thread.join t2
    end
val result = fib 40

This is not a goal; unfortunately, if n is very small, the computation cost of fib n is apparently much smaller than the overhead of thread creation. To avoid this, we introduce a threshold so that it computes sequentially if n is smaller than 10.

val cutOff = 10
fun fib 0 = 0
  | fib 1 = 1
  | fib n =
    if n < cutOff
    then fib (n - 1) + fib (n - 2)
    else
      let
        val t2 = Myth.Thread.create (fn () => fib (n - 2))
      in
        fib (n - 1) + Myth.Thread.join t2
      end
val result = fib 40

Now it is all done! Running this program, it eventually generates 3,524,577 user threads in total.