Workflows for distributed computing

Mathieu Leclaire
Jonathan Passerat-Palmbach
Romain Reuillon

You're using naturally parallel methods daily on your laptops:
  • Data reconstruction
  • Parameter estimation
  • Sensitivity analysis
  • Optimisation
  • Replication
  • ...
Execution on the same program with different parameters and/or datasets.

But it's slow...


Enters distributed computing / HPC!


How to make it simple?
Prototype Small, Scale for Free
OpenMOLE articulates around 3 orthogonal concepts

... and an expressive workflow formalism for distributed computing.

1 - Model?



Stuff that you can launch, taking inputs and producing outputs

Zero deployment approach

  • User code is automatically deployed at runtime
  • No prior knowledge of remote environment needed
  • No installation required on any machine
Works with almost any language / plateform running on Linux

Packaging (non JVM) application with Care*


*https://github.com/proot-me/PRoot

Packaging (non JVM) application with Care

Applications have dependencies:
  • Shared libraries
  • Packages (Python, R, ...)
  • Low level system calls
  • Environment variables
  • ...

Capture these dependencies and transfer along with the application from Linux to Linux

Packaging (non JVM) application with Care

Distributed execution of (almost) any program to (pretty much) any computing environment in 3 simple steps
  • Package it with CARE ⇒ execute it on linux
  • Write your OpenMOLE workflow
  • Click the run button

3 - Execution environments?


Today

Tomorrow

And now: examples!

Useful Links

Documentation www.openmole.org
Development version next.openmole.org
Source code github.com/openmole
Market place github.com/openmole-market

Thanks!

romain.reuillon@iscpif.fr
mathieu.leclaire@iscpif.fr
j.passerat-palmbach@imperial.ac.uk
2 - Methods?

Map/reduce

Grid Search

Random sampling

Latin Hypercube

Parallel data processing

...

Master/slave

Example: genetic algorithm

The terminology