Workflows for model exploration

Mathieu Leclaire
Jonathan Passerat-Palmbach
Romain Reuillon

Context

  • Complex-system community
  • Various scientific fields
  • No standard practices, language, plateform..
  • Thematician PhDs (Geographers, Biologist, Scociologist...) with no technical support
They use naturally parallel methods daily on their laptop:
  • Data reconstruction
  • Parameter estimation
  • Sensitivity analysis
  • Optimisation
  • Replication
  • ...
Execution on the same program with different parameters and/or datasets.
How to bring Distributed Computing to theses researchers
Prototype Small, Scale for Free
OpenMOLE articulate 3 orthogonal concepts

... and an expressive workflow formalism for distributed computing.

1 - Model?



Stuff that you can launch, taking inputs and producing outputs

Zero deployment approach

  • User code is automatically deployed at runtime
  • No prior knowledge of remote environment needed
  • No installation required on any machine
Works with almost any language / plateform running on Linux

Packaging (non JVM) application with Care

Applications have dependencies:
  • Shared libraries
  • Packages (Python, R, ...)
  • Low level system calls
  • Environment variables
  • ...

Capture these dependencies and transfer along with the application from Linux to Linux
Distributed execution of (almost) any program to (pretty much) any computing environment in 3 simple steps
  • Package it with CARE ⇒ execute it on linux
  • Write your OpenMOLE workflow
  • Click the run button
2 - Method?

Map/reduce


Map/reduce

Grid Search

Random sampling

Latin Hypercube

Parallel data processing

...

Master/slave

Example: genetic algorithm

3 - Execution environment?

Today

Multi-thread
Delegation through SSH
PBS (on ssh)
SLURM (on ssh)
Condor (on ssh)
SGE (on ssh)
OAR (on ssh)
DIRAC
Adhoc Desktop Grid

Tomorow

Commercial cloud providers
Academic cloud

Volonteer computing?
Combination of environments?
Next docker-based computing plateform?

Transparent access

Access as the user would do it

Use user credential

No preliminary step

Automatic data transfers
(+ Replica management)

Files and folders transfers are handled transparently by OpenMOLE

And now: examples!

The terminology



A workflow


val i   = Val[Double]        
val res = Val[Double]        

val exploration = ExplorationTask ( i in (0.0 to 10.0 by 1.0) )

val model = 
  ScalaTask ("val res = i * 2") set (
    inputs  += i,
    outputs += (i, res)
  )
  
val env = LocalEnvironment(5)

val ex = exploration -< (model on env) start
        

Same workflow on the Grid !


val i   = Val[Double]        
val res = Val[Double]        

val exploration = ExplorationTask ( i in (0.0 to 10.0 by 1.0) )

val model = 
  ScalaTask ("val res = i * 2") set (
    inputs  += i,
    outputs += (i, res)
  )
  
val env = EGIEnvironment("biomed")

val ex = exploration -< (model on env) start
        

That's what we call usable HPC :)

Native code

care -o hello.tgz.bin python hello.py 42 test.txt
val arg = Val[Int]
val output = Val[File]

val pythonTask =
  CARETask(
      workDirectory / "hello.tgz.bin",
      "python hello.py ${arg} output.txt") set (
    inputs += arg,
    outputFiles += ("output.txt", output),
    outputs += arg
  )

val exploration = ExplorationTask(arg in (0 to 10))

val copy = CopyFileHook(output, workDirectory / "hello${arg}.txt")

exploration -< (pythonTask hook copy)
val algorithm = 
  NSGA2(
    mu = 200,
    genome =
      Seq(
        rMax in (2.0, 50000.0),
        distanceDecay in (0.0, 4.0),
        pCreation in (0.0, 0.01),
        pDiffusion in (0.0, 0.01),
        innovationImpact in (0.0, 2.0),
        innovationLife in (1.0, 4001.0)
      ),
    objectives = Seq(ksValue, deltaPop, deltaTime)
  )
                
val evolution =
  SteadyStateEvolution(
    algorithm = algorithm,
    evaluation = evaluateModel,
    termination = 15 minutes
  )

val island = 
  IslandEvolution(
    evolution,
    parallelism = 1000,
    termination = 200000
  )

val savePopulation = 
  SavePopulationHook(
    island,
    workDirectory / "populations"
  )

val grid = EGIEnvironment("vo.complex-systems.eu")

(island on grid hook savePopulation)

Advanced methods



Profiles



Flat Profile



PSE



PSE



Useful Links

Documentation www.openmole.org
Development version next.openmole.org
Source code github.com/openmole
Market place github.com/openmole-market

Thanks!

romain.reuillon@iscpif.fr
mathieu.leclaire@iscpif.fr
j.passerat-palmbach@imperial.ac.uk