Workflows for distributed computing
Mathieu Leclaire
Jonathan Passerat-Palmbach
Romain Reuillon
You're using naturally parallel methods daily on your laptops:
- Data reconstruction
- Parameter estimation
- Sensitivity analysis
- Optimisation
- Replication
- ...
Execution on the same program with different parameters and/or datasets.
But it's slow...

Enters distributed computing / HPC!

How to make it simple?
Prototype Small, Scale for Free
OpenMOLE articulates around 3 orthogonal concepts

... and an expressive workflow formalism for distributed computing.
1 - Model?
Stuff that you can launch, taking inputs and producing outputs
Zero deployment approach
- User code is automatically deployed at runtime
- No prior knowledge of remote environment needed
- No installation required on any machine
Works with almost any language / plateform running on Linux
Packaging (non JVM) application with Care
Applications have dependencies:
- Shared libraries
- Packages (Python, R, ...)
- Low level system calls
- Environment variables
- ...
Capture these dependencies and transfer along with the application from Linux to Linux
Packaging (non JVM) application with Care
Distributed execution of (almost) any program to (pretty much) any computing environment in 3 simple steps
- Package it with CARE ⇒ execute it on linux
- Write your OpenMOLE workflow
- Click the run button
3 - Execution environments?

Today
Tomorrow
And now: examples!
2 - Methods?

Map/reduce
Grid Search
Random sampling
Latin Hypercube
Parallel data processing
...
Master/slave
Example: genetic algorithm
The terminology