Workflows for distributed computing
Mathieu Leclaire
Jonathan Passerat-Palmbach
Romain Reuillon
Naturally parallel methods
- Data reconstruction
- Parameter estimation
- Sensitivity analysis
- Optimisation
- Replication
- ...
Execution of the same program with different parameters and/or datasets.
But it's slow...

Enters distributed computing / HPC!

HPC user experience

Prototype → Scale-up for Free
OpenMOLE articulates around 3 orthogonal concepts

... and an expressive workflow formalism for distributed computing.
1 - Model?
Stuff you can run, taking inputs and producing outputs
Zero deployment approach
- User code is automatically deployed at runtime
- No prior knowledge of remote environment needed
- No installation required on any machine
Transparent access
- No preliminary step
- Access as the user would do it
- With user credential
Any application
Packaging (non JVM) application with Care
Applications have dependencies:
- Shared libraries
- Packages (Python, R, ...)
- Low level system calls
- Environment variables
- ...
Capture these dependencies and transfer along with the application from Linux to Linux
Packaging (non JVM) application with Care
Distributed execution of (almost) any program to (pretty much) any computing environment
- Package it with CARE
- Write your OpenMOLE workflow
- Click the run button
- Write your paper
3 - Execution environments?

Today
Tomorrow
And now: examples!
2 - Methods?

Map/reduce
Grid Search
Random sampling
Latin Hypercube
Parallel data processing
...
Master/slave
Example: genetic algorithm
The terminology