Workflows for distributed computing

Mathieu Leclaire
Jonathan Passerat-Palmbach
Romain Reuillon

Naturally parallel methods

  • Data reconstruction
  • Parameter estimation
  • Sensitivity analysis
  • Optimisation
  • Replication
  • ...
Execution of the same program with different parameters and/or datasets.

But it's slow...


Enters distributed computing / HPC!


HPC user experience


PrototypeScale-up for Free

OpenMOLE articulates around 3 orthogonal concepts

... and an expressive workflow formalism for distributed computing.

1 - Model?



Stuff you can run, taking inputs and producing outputs

Zero deployment approach

  • User code is automatically deployed at runtime
  • No prior knowledge of remote environment needed
  • No installation required on any machine

Transparent access

  • No preliminary step
  • Access as the user would do it
  • With user credential

Any application

Packaging (non JVM) application with Care*


*https://github.com/proot-me/PRoot

Packaging (non JVM) application with Care

Applications have dependencies:
  • Shared libraries
  • Packages (Python, R, ...)
  • Low level system calls
  • Environment variables
  • ...

Capture these dependencies and transfer along with the application from Linux to Linux

Packaging (non JVM) application with Care

Distributed execution of (almost) any program to (pretty much) any computing environment
  1. Package it with CARE
  2. Write your OpenMOLE workflow
  3. Click the run button
  4. Write your paper

3 - Execution environments?


Today

Tomorrow

And now: examples!

Useful Links

Documentation www.openmole.org
Development version next.openmole.org
Source code github.com/openmole
Market place github.com/openmole-market

Thanks!

romain.reuillon@iscpif.fr
mathieu.leclaire@iscpif.fr
j.passerat-palmbach@imperial.ac.uk
2 - Methods?

Map/reduce

Grid Search

Random sampling

Latin Hypercube

Parallel data processing

...

Master/slave

Example: genetic algorithm

The terminology