11.2 Fine-grain multithread programming with MassiveThreads
SML# provides MassiveThreads-based multithread support by default. MassiveThreads is a user-level lightweight thread library in C provided by the University of Tokyo. The SML#’s direct C interface and unobtrusive concurrent garbage collection enable SML# programs to call MassiveThreads directly. MassiveThreads allows us to create millions of user threads, say 1,000,000 threads, that runs concurrently on multicore processors.
By default, time SML# runtime is restricted to use
only one CPU core for the performance of single-thread programs.
To enable MassiveThreads on multicore processors,
specify at least one MYTH_*
environment variable as a configuration
of MassiveThreads.
A typical one is MYTH_NUM_WORKERS
that specifies the
number of worker threads, i.e., the number of CPU cores the program
uses.
For example, do the following command to start an interactive
session:
$ MYTH_NUM_WORKERS=0 smlsharp
In Linux, setting MYTH_NUM_WORKERS
to 0 means
using all available CPU cores.
SML# provides Myth structure that is a direct binding of MassiveThreads library in SML#. In Myth.Thread structure, you find basic functions for thread management. Its primary functions are the following:
-
•
User thread creation.
Myth.Thread.create : (unit -> int) -> Myth.thread
creates a new user thread that computes . The created user thread will be scheduled to an appropriate CPU core by the MassiveThreads task scheduler. Its scheduling policy is non-preemptive; a thread occupies a CPU core until either it calls a thread control function of MassiveThreads (Myth.Thread.yield) or it terminates.
-
•
User thread join.
Myth.Thread.join : Myth.thread -> int
waits for the completion of thread and returns the result of the computation of . Any user thread created by create must be joined sometime in the future. Note that this Myth.Thread structure is just a direct binding of C functions, as in C, the resource of the created threads must be freed explicitly.
-
•
User thread scheduling.
Myth.Thread.yield : unit -> unit
yields the CPU core to other threads.
As an introduction to MassiveThreads programming, let us write a task parallel program. Roughly speaking, you can write a task parallel program by the following steps:
-
1.
Write a recursive function that performs divide-and-conquer.
-
2.
Surround each recursive call with a pair of create and join so that each recursive call is evaluated in a different thread.
-
3.
To prevent from creating very short threads, set a threshold (cut-off) to stop thread creation and do recursive calls in the same thread. The threshold must be decided so that sequential wall-clock time of a user thread is sufficiently longer than the overhead of thread creation. In practice, 3–4 microseconds for a user thread is good enough.
For example, let us write a program that compute fib 40 recursively. The following is a typical definition of recursive fib function:
fun fib 0 = 0
| fib 1 = 1
| fib n = fib (n - 1) + fib (n - 2)
val result = fib 40
To compute fib (n - 1) and fib (n - 2) in parallel, surround one of them with create and join:
fun fib 0 = 0
| fib 1 = 1
| fib n =
let
val t2 = Myth.Thread.create (fn () => fib (n - 2))
in
fib (n - 1) + Myth.Thread.join t2
end
val result = fib 40
This is not a goal; unfortunately, if n is very small, the computation cost of fib n is apparently much smaller than the overhead of thread creation. To avoid this, we introduce a threshold so that it computes sequentially if n is smaller than 10.
val cutOff = 10
fun fib 0 = 0
| fib 1 = 1
| fib n =
if n < cutOff
then fib (n - 1) + fib (n - 2)
else
let
val t2 = Myth.Thread.create (fn () => fib (n - 2))
in
fib (n - 1) + Myth.Thread.join t2
end
val result = fib 40
Now it is all done! Running this program, it eventually generates 3,524,577 user threads in total.