Workflows for distributed computing

Mathieu Leclaire
Jonathan Passerat-Palmbach
Romain Reuillon

Naturally parallel methods

  • Data reconstruction
  • Parameter estimation
  • Sensitivity analysis
  • Optimisation
  • Replication
  • ...
Execution of the same program with different parameters and/or datasets.

But it's slow...


Enters distributed computing / HPC!


HPC user experience


PrototypeScale-up for Free

OpenMOLE articulates around 3 orthogonal concepts

... and an expressive workflow formalism for distributed computing.

1 - Model?



Stuff you can run, taking inputs and producing outputs

Zero deployment approach

  • User code is automatically deployed at runtime
  • No prior knowledge of remote environment needed
  • No installation required on any machine

Transparent access

  • No preliminary step
  • Access as the user would do it
  • With user credential

Any application

Packaging (non JVM) application with Care*


*https://github.com/proot-me/PRoot

Packaging (non JVM) application with Care

Applications have dependencies:
  • Shared libraries
  • Packages (Python, R, ...)
  • Low level system calls
  • Environment variables
  • ...

Capture these dependencies and transfer along with the application from Linux to Linux

Packaging (non JVM) application with Care

Distributed execution of (almost) any program to (pretty much) any computing environment
  1. Package it with CARE
  2. Write your OpenMOLE workflow
  3. Click the run button
  4. Write your paper

3 - Execution environments?


Today

Tomorrow

And now: examples!

Native code

care -o hello.tgz.bin python hello.py 42 test.txt

val arg = Val[Int]
val output = Val[File]

val pythonTask =
  CARETask(
      workDirectory / "hello.tgz.bin",
      "python hello.py ${arg} output.txt") set (
    inputs += arg,
    outputFiles += ("output.txt", output),
    outputs += arg
  )

val exploration = ExplorationTask(arg in (0 to 10))

val copy = CopyFileHook(output, workDirectory / "hello${arg}.txt")

exploration -< (pythonTask hook copy)
      

Terminology



XNAT to OpenMOLE

From data set discovery to large scale batch processing

  • discover dataset with XNAT WebUI
  • retrieve data and experiment small with REST API / Python clients
  • organise a processing pipeline with OpenMOLE

CARETask detailed



Useful Links

Documentation www.openmole.org
Development version next.openmole.org
Source code github.com/openmole
Market place github.com/openmole-market

Thanks!

romain.reuillon@iscpif.fr
mathieu.leclaire@iscpif.fr
j.passerat-palmbach@imperial.ac.uk