snow Simplified

Many computationally demanding statistical procedures, such as Bootstrapping and Markov Chain Monte Carlo, can be speeded up significantly by using several connected computers in parallel.

The package snow (an acronym for Simple Network Of Workstations) provides a high-level interface for using a workstation cluster for parallel computations in R. snow relies on the Master / Slave model of communication in which one device or process (known as the master) controls one or more other devices or processes (known as slaves).

In Los Angeles, officials pointed out that such terms as "master" and "slave" are unacceptable and offensive, and equipment manufacturers were asked not to use these terms. Some manufacturers changed the wording to primary / secondary (Source: CNN).

snow Simplified is an adaptation of an article by Anthony Rossini, Luke Tierney and Na Li, 'Simple parallel statistical computing in R'.
It has been made by Sigal Blay (sblaysfu.ca).
This work has been made possible by the Statistical Genetics Working Group at the Department of Statistics and Actuarial Science, SFU.


snow implements an interface to three different low level mechanisms for creating a virtual connection between processes:

  • Socket
  • PVM (Parallel Virtual Machine)
  • MPI (Message Passing Interface)

Starting and Stopping clusters

The way to Initialize slave R processes depends on your system configuration. If MPI is installed and there are two computation nodes, use:

> cl <- makeCluster(2, type = "MPI") 
Shut down the cluster and clean up any remaining connections between machines:
> stopCluster(cl)

clusterCall(cl, fun, ...)

cl: the computer cluster created with makeCluster.
fun: the function to be applied.
clusterCall calls a specified function with identical arguments on each node in the cluster.
The arguments to clusterCall are evaluated on the master, their values transmitted to the slave nodes which execute the function call.

> myfunc <- function(x=2){x+1}
> myfunc_argument <- 5
> clusterCall(cl, myfunc, myfunc_argument) 
is the same as
> clusterCall(cl, function(x=2){x+1}, 5) 
and the result:
[[1]]
[1] 6

[[2]]
[1] 6 

An alternative:

> clusterCall(cl, eval, myfunc(arg1,arg2,...)) 

clusterEvalQ(cl, expr)

cl: the computer cluster created with makeCluster.
expr: an expression, typically a function call.
clusterEvalQ evaluates a literal expression on each cluster node.
'expr' is treated on the master as a character string.
The expression is evaluated on the slave nodes.

Finding the host name for each cluster node:

>  clusterEvalQ(cl, Sys.getenv("HOST"))
[[1]]
     HOST
"stat-db"

[[2]]
     HOST
"stat-db1" 
Loading the boot package on all cluster nodes:
> clusterEvalQ(cl, library(boot)) 
clusterEvalQ is the cluster version of the R function 'evalq'.

clusterApply(cl, seq, fun, ...)

clusterApply takes a cluster, a sequence of arguments (can be a vector or a list), and a function, and calls the function with the first element of the list on the first node, with the second element of the list on the second node, and so on. The list of arguments must have at most as many elements as there are nodes in the cluster.

> clusterApply(cl, 1:2, sum, 3)
[[1]]
[1] 4

[[2]]
[1] 5 

clusterApplyLB(cl, seq, fun, ...)

clusterApplyLB is a load balancing version of clusterApply. It hands in a balanced work load to slave nodes when the length of seq is greater than the number of cluster nodes. Doesn`t work if cluster Type=Socket.

> length(cl)
[1] 2

>  clusterApplyLB(cl, 1:3, sum, 3)
[[1]]
[1] 4

[[2]]
[1] 5

[[3]]
[1] 6 
Using clusterApplyLB can result in better cluster utilization than using clusterApply. However, increasing communication can reduce performance. Furthermore, the node that executes a particular job is nondeterministic, which can complicate ensuring reproducibility in simulations.

clusterExport(cl, list)

Assigns the global values on the master of the variables named in 'list' to variables of the same names in the global environments of each node.

> myvar <- 666
> clusterExport(cl, "myvar") 

clusterSplit(cl, seq)

clusterSplit splits 'seq' into one consecutive piece for each cluster and returns the result as a list with length equal to the number of cluster nodes. The pieces are chosen to be close to equal in length.

> clusterSplit(cl, 1:6)
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5 6 

parApply(cl, X, MARGIN, fun, ...)

X: the array to be used.
MARGIN: a vector giving the subscripts which the function will be applied over. '1' indicates rows, '2' indicates columns, 'c(1,2)' indicates rows and columns.
fun: the function to be applied.
parApply is the parallel version of `the R function apply.

>A<-matrix(1:10, 5, 2)
> A
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

> parApply(cl, A, 1, sum)
[1]  7  9 11 13 15 

parRapply(cl, X, fun, ...)

Row parallel 'apply' function for matrix X.
May be slightly more efficient than parApply.

parCapply(cl, X, fun, ...)

Column parallel 'apply' function for matrix X.
May be slightly more efficient than parApply.

parLapply(cl, X, fun, ...)

parLapply is the parallel version of the R function 'lapply'.
'lapply' returns a list of the same length as 'X'. Each element of which is the result of applying 'fun' to the corresponding element of 'X'.

Create a list of two sequences of numbers:

> x <- list(alpha = 1:10, beta = exp(-3:3))
> x
$alpha
 [1]  1  2  3  4  5  6  7  8  9 10

$beta
[1]  0.04978707  0.13533528  0.36787944  1.00000000
[5]  2.71828183  7.38905610  20.08553692 
Calculate quantiles for each sequence - parLapply returns a list:
> parLapply(cl, x, quantile)
$alpha
   0%   25%   50%   75%  100% 
 1.00  3.25  5.50  7.75 10.00 

$beta
         0%         25%         50%         75%        100% 
 0.04978707  0.25160736  1.00000000  5.05366896 20.08553692 

parSapply(cl, X, fun, ..., simplify=TRUE, USE.NAMES=TRUE)

parSapply is the parallel version of `the R function 'sapply'.
'sapply' is a "user-friendly" version of 'lapply'.
It returns a vector or matrix with 'dimnames' if appropriate.

Calculate quantiles for each sequence - parSapply returns a matrix:

> parSapply(cl, x, quantile)
     alpha        beta
0%    1.00  0.04978707
25%   3.25  0.25160736
50%   5.50  1.00000000
75%   7.75  5.05366896
100% 10.00 20.08553692 

parMM(cl, A, B)

A, B are the matices to be multiplied.

How much faster is using snow?

To compare performace, use the command system.time().
Here is an example:

Create a matrix of random normaly distributed numbers:

> A <- matrix(rnorm(1000000), 1000) 
Check CPU time for unparallel matrix multiplication:
> system.time(A %*% A)
[1] 5.33 0.02 5.36 0.00 0.00 
elapesed time was 5.36 seconds

... And for parallel matrix multiplication:

> system.time(parMM(cl, A, A))
[1] 2.49 2.09 9.16 0.00 0.00 
elapsed time was 9.16 seconds.

Even in dedicated clusters with high speed networking, communication is orders of magnitude slower than computation.
In order to reduce communication costs, eliminate unnecessary data transfer.

References

Rossini, A., Tierney, L., and Li, N. (2003). Simple parallel statistical computing. in R. UW Biostatistics working paper series, Paper 193, University of Washington.

Tierney, L., Rossini, A., Li, N., and Sevcikova, H. (2004). The snow Package: Simple Network of Workstations. Version 0.2-1.



Terms Dictionary

Workstation
A workstation is a computer intended for individual use that is faster and more capable than a personal computer. i.e. have a faster microprocessor, a large amount of random access memory (RAM), and special features such as high-speed graphics adapters (Source: whatis.com).
Cluster
In a computer system, a cluster is a group of servers and other resources that act like a single system and enable high availability and, in some cases, load balancing and parallel processing (Source: whatis.com).
Parallel Computing
Parallel Computing is the simultaneous use of more than one computer to execute a program. The objective: scalability - the capability to handle larger workloads reducing execution time (Source: bitpipe).