snow Simplified
~~~~~~~~~~~~~~~
Many computationally demanding statistical procedures, such
as Bootstrapping and Markov Chain Monte Carlo, can be speeded
up significantly by using several connected computers in
parallel.
The package snow (an acronym for Simple Network Of
Workstations) provides a high-level interface for using a
workstation* cluster** for parallel computations*** in R.
snow relies on the Master / Slave model of communication in
which one device or process (known as the master) controls
one or more other devices or processes (known as slaves).
In Los Angeles, officials pointed out that such terms as
"master" and "slave" are unacceptable and offensive, and
equipment manufacturers were asked not to use these terms.
Some manufacturers changed the wording to primary / secondary
(Source: CNN).
snow Simplified is an adaptation of an article by Anthony
Rossini, Luke Tierney and Na Li, 'Simple parallel statistical
computing in R'.
It has been made by Sigal Blay ( http://www.sfu.ca:/~sblay ).
This work has been made possible by the Statistical Genetics
Working Group at the Department of Statistics and Actuarial
Science, SFU.
-----------------------------------------------------------------
snow implements an interface to three different low level
mechanisms for creating a virtual connection between processes:
Socket
PVM (Parallel Virtual Machine)
MPI (Message Passing Interface)
--- Starting and Stopping clusters ---
The way to Initialize slave R processes depends on your system
configuration. If MPI is installed and there are two computation
nodes, use:
> cl <- makeCluster(2, type = "MPI")
Shut down the cluster and clean up any remaining connections
between machines:
> stopCluster(cl)
--- clusterCall(cl, fun, ...) ---
cl: the computer cluster created with makeCluster.
fun: the function to be applied.
clusterCall calls a specified function with identical arguments
on each node in the cluster.
The second argument is a function object.
The arguments to clusterCall are evaluated on the master,
their values transmitted to the slave nodes which execute the
function call.
> myfunc <- function(x=2){x+1}
> myfunc_argument <- 5
> clusterCall(cl, myfunc, myfunc_argument)
is the same as
> clusterCall(cl, function(x=2){x+1}, 5)
and the result:
[[1]]
[1] 6
[[2]]
[1] 6
--- clusterEvalQ(cl, expr) ---
cl: the computer cluster created with makeCluster.
expr: an expression, typically a function call.
clusterEvalQ evaluates a literal expression on each cluster node.
'expr' is treated on the master as a character string.
The expression is evaluated on the slave nodes.
Finding the host name for each cluster node:
> clusterEvalQ(cl, Sys.getenv("HOST"))
[[1]]
HOST
"stat-db"
[[2]]
HOST
"stat-db1"
Loading the boot package on all cluster nodes:
> clusterEvalQ(cl, library(boot))
clusterEvalQ is the cluster version of the R function 'evalq'.
--- clusterApply(cl, seq, fun, ...) ---
clusterApply takes a cluster, a sequence of arguments (can be a
vector or a list), and a function, and calls the function with
the first element of the list on the first node, with the second
element of the list on the second node, and so on. The list of
arguments must have at most as many elements as there are nodes
in the cluster.
> clusterApply(cl, 1:2, sum, 3)
[[1]]
[1] 4
[[2]]
[1] 5
--- clusterApplyLB(cl, seq, fun, ...) ---
clusterApplyLB is a load balancing version of clusterApply.
It hands in a balanced work load to slave nodes when
the length of seq is greater than the number of cluster nodes.
Doesn`t work if cluster Type=Socket.
> length(cl)
[1] 2
> clusterApplyLB(cl, 1:3, sum, 3)
[[1]]
[1] 4
[[2]]
[1] 5
[[3]]
[1] 6
Using clusterApplyLB can result in better cluster utilization
than using clusterApply. However, increasing communication can
reduce performance. Furthermore, the node that executes a
particular job is nondeterministic, which can complicate
ensuring reproducibility in simulations.
--- clusterExport(cl, list) ---
Assigns the global values on the master of the variables named in
list to variables of the same names in the global environments of
each node.
> myvar <- 666
> clusterExport(cl, "myvar")
--- clusterSplit(cl, seq) ---
clusterSplit splits seq into one consecutive piece for each
cluster and returns the result as a list with length equal to the
number of cluster nodes. The pieces are chosen to be close to
equal in length.
> clusterSplit(cl, 1:6)
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5 6
--- parApply(cl, X, MARGIN, fun, ...) ---
X: the array to be used.
MARGIN: a vector giving the subscripts which the function will be
applied over. '1' indicates rows, '2' indicates columns,
'c(1,2)' indicates rows and columns.
fun: the function to be applied.
parApply is the parallel version of `the R function apply.
A<-matrix(1:10, 5, 2)
> A
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> parApply(cl, A, 1, sum)
[1] 7 9 11 13 15
--- parRapply(cl, X, fun, ...) ---
row parallel 'apply' function for matrix X.
May be slightly more efficient than parApply.
--- parCapply(cl, X, fun, ...) ---
Column parallel 'apply' function for matrix X.
May be slightly more efficient than parApply.
--- parLapply(cl, X, fun, ...) ---
parLapply is the parallel version of the R function 'lapply'.
'lapply' returns a list of the same length as 'X'. Each element
of which is the result of applying 'fun' to the corresponding
element of 'X'.
Create a list of two sequences of numbers:
> x <- list(alpha = 1:10, beta = exp(-3:3))
> x
$alpha
[1] 1 2 3 4 5 6 7 8 9 10
$beta
[1] 0.04978707 0.13533528 0.36787944 1.00000000 2.71828183
[7] 7.38905610 20.08553692
Calculate quantiles for each sequence - parLapply returns a list:
> parLapply(cl, x, quantile)
$alpha
0% 25% 50% 75% 100%
1.00 3.25 5.50 7.75 10.00
$beta
0% 25% 50% 75% 100%
0.04978707 0.25160736 1.00000000 5.05366896 20.08553692
--- parSapply(cl, X, fun, ..., simplify=TRUE, USE.NAMES=TRUE) ---
parSapply is the parallel version of `the R function 'sapply'.
'sapply' is a "user-friendly" version of 'lapply'.
It returns a vector or matrix with 'dimnames' if appropriate.
Calculate quantiles for each sequence - parSapply returns a matrix:
> parSapply(cl, x, quantile)
alpha beta
0% 1.00 0.04978707
25% 3.25 0.25160736
50% 5.50 1.00000000
75% 7.75 5.05366896
100% 10.00 20.08553692
--- parMM(cl, A, B) ---
parMM performs parallel matrix multiplication.
A, B are the matices to be multiplied.
--- How much faster is using snow? ---
To compare performace, use the command system.time().
Here is an example:
Create a matrix of random normaly distributed numbers:
> A <- matrix(rnorm(1000000), 1000)
Check CPU time for unparallel matrix multiplication:
> system.time(A %*% A)
[1] 5.33 0.02 5.36 0.00 0.00
elapesed time was 5.36 seconds
... And for parallel matrix multiplication:
> system.time(parMM(cl, A, A))
[1] 2.49 2.09 9.16 0.00 0.00
elapsed time was 9.16 seconds.
Even in dedicated clusters with high speed networking,
communication is orders of magnitude slower than computation.
In order to reduce communication costs, eliminate unnecessary data
transfer.
--- Random Number Generation ---
The default random number generators are likely to be correlated:
> clusterCall(cl, runif, 3)
[[1]]
[1] 0.4960831 0.5095497 0.2971236
[[2]]
[1] 0.4960831 0.5095497 0.2971236
A way of addressing this is random seeding:
clusterApply(cl, runif(cl, max=100000), set.seed)
References:
Rossini, A., Tierney, L., and Li, N. (2003). Simple parallel
statistical computing in R. UW Biostatistics working paper series,
Paper 193, University of Washington.
http://www.bepress.com/uwbiostat/paper193.
Tierney, L., Rossini, A., Li, N., and Sevcikova, H. (2004).
The snow Package: Simple Network of Workstations. Version 0.2-1.
http://cran.r-project.org/doc/packages/snow.pdf
--------------------------------------------------------------------
Workstation - A workstation is a computer intended for individual
use that is faster and more capable than a personal computer.
i.e. have a faster microprocessor, a large amount of random access
memory (RAM), and special features such as high-speed graphics
adapters.
Source: whatis.com
http://whatis.techtarget.com/
Cluster - In a computer system, a cluster is a group of servers
and other resources that act like a single system and enable high
availability and, in some cases, load balancing and parallel
processing.
Source: whatis.com
http://whatis.techtarget.com/
Parallel Computing / Parallel Processing -
The simultaneous use of more than one computer to execute a program.
Objectives: scalability - the capability to hadle larger workloads
reducing execution time
Source: bitpipe
http://www.bitpipe.com/