Execute a queue of system commands in parallel.

executeMultiProcess(
  commandQueue,
  finishHandler,
  timeoutHandler = function(...) TRUE,
  errorHandler = defMultiProcErrorHandler,
  prepareHandler = NULL,
  cacheName = NULL,
  setHash = NULL,
  procTimeout = NULL,
  printOutput = FALSE,
  printError = FALSE,
  logSubDir = NULL,
  showProgress = TRUE,
  waitTimeout = 50,
  batchSize = 1,
  delayBetweenProc = 0,
  method = NULL
)

Arguments

commandQueue

A list with commands. Should contain command (scalar string) and args (character vector). More user defineds fields are allowed and useful to attach command information that can be used in the finish, timeout and error handlers.

finishHandler

A function that is called when a command has finished. This function is typically used to process any results generated by the command. The function is called right after spawning a new process, hence processing results can occur while the next command is running in the background. The function signature should be function(cmd) where cmd is the queue data (from commandQueue) of the command that has finished.

timeoutHandler

A function that is called whenever a timeout for a command occurs. Should return TRUE if execution of the command should be retried. The function signature should be function(cmd, retries) where cmd is the queue data for that command and retries the number of times the command has been retried.

errorHandler

Similar to timeoutHandler, but called whenever a command has failed. The signature should be function(cmd, exitStatus, retries). The exitStatus argument is the exit code of the command (may be NA in rare cases this is unknown). Other arguments are as timeoutHandler. The return value should be as timeoutHandler or a character with an error message which will be thrown with stop.

prepareHandler

A function that is called prior to execution of the command. The function signature should be function(cmd) where cmd is the queue data (from commandQueue) of the command to be started. The return value must be (an updated) cmd.

cacheName, setHash

Used for caching results. Set to NULL to disable caching.

procTimeout

The maximum time a process may consume before a timeout occurs (in seconds). Set to NULL to disable timeouts. Ignored if patRoon.MP.method="future".

printOutput, printError

Set to TRUE to print stdout/stderr output to the console. Ignored if patRoon.MP.method="future".

logSubDir

The sub-directory used for log files. The final log file path is constructed from patRoon.MP.logPath, logSubDir and logFile set in the commandQueue.

showProgress

Set to TRUE to display a progress bar. Ignored if patRoon.MP.method="future".

waitTimeout

Number of milliseconds to wait before checking if a new process should be spawned. Ignored if patRoon.MP.method="future".

batchSize

Number of commands that should be executed in sequence per processes. See details. Ignored if patRoon.MP.method="future".

delayBetweenProc

Minimum number of milliseconds to wait before spawning a new process. Might be needed to workaround errors. Ignored if patRoon.MP.method="future".

method

Overrides patRoon.MP.method if not NULL.

Details

This function executes a given queue with system commands in parallel to speed up computation. Commands are executed in the background using the processx package. A configurable maximum amount of processes are created to execute multiple commands in parallel.

Multiple commands may be executed in sequence that are launched from a single parent process (as part of a batch script on Windows or combined with the shell AND operator otherwise). Note that in this scenario still multiple processes are spawned. Each of these processes will manage a chunk of the command queue (size defined by batchSize argument). This approach is typically suitable for fast running commands: the overhead of spawning a new process for each command from R would in this case be significant enough to loose most of the speedup otherwise gained with parallel execution. Note that the actual batch size may be adjusted to ensure that a maximum number of processes are running simultaneously.

Other functionalities of this function include timeout and error handling.