Workflows for distributed computing

Mathieu Leclaire
Jonathan Passerat-Palmbach
Romain Reuillon

Context

  • Complex-system community
  • Various scientific fields
  • No standard practices, language, plateform..
  • Thematician PhDs (Geographers, Biologist, Scociologist...) with no technical support
They use naturally parallel methods daily on their laptop:
  • Data reconstruction
  • Parameter estimation
  • Sensitivity analysis
  • Optimisation
  • Replication
  • ...
Execution on the same program with different parameters and/or datasets.
How to bring Distributed Computing to theses researchers
Prototype Small, Scale for Free

1 - Model?



Stuff that you can launch, taking inputs and producing outputs

Zero deployment approach

  • User code is automatically deployed at runtime
  • No prior knowledge of remote environment needed
  • No installation required on any machine
Works with almost any language / plateform running on Linux

2 - Methods

Map/reduce


Map/reduce

Grid Search

Random sampling

Latin Hypercube

Parallel data processing

...

Home made algorithms : Profiles

Home made algorithms : Inverse problems

Home made algorithms : output diversity

3 - Execution environment

Today

Multi-thread
Delegation through SSH
PBS (on ssh)
SLURM (on ssh)
Condor (on ssh)
SGE (on ssh)
OAR (on ssh)
EGI Grid (trough DIRAC)
Adhoc Desktop Grid

Grid Computing

Tomorow

Commercial cloud providers
Academic cloud

Volonteer computing?
Combination of environments?
Next docker-based computing plateform?

The terminology



A workflow


val i   = Val[Double]        
val res = Val[Double]        

val exploration = ExplorationTask ( i in (0.0 to 10.0 by 1.0) )

val model = 
  ScalaTask ("val res = i * 2") set (
    inputs  += i,
    outputs += (i, res)
  )
  
val env = LocalEnvironment(5)

val ex = exploration -< (model on env) start
        

Same workflow on the Grid !


val i   = Val[Double]        
val res = Val[Double]        

val exploration = ExplorationTask ( i in (0.0 to 10.0 by 1.0) )

val model = 
  ScalaTask ("val res = i * 2") set (
    inputs  += i,
    outputs += (i, res)
  )
  
val env = EGIEnvironment("biomed")

val ex = exploration -< (model on env) start
        

Native code

care -o hello.tgz.bin python hello.py 42 test.txt
val arg = Val[Int]
val output = Val[File]

val pythonTask =
  CARETask(
      workDirectory / "hello.tgz.bin",
      "python hello.py ${arg} output.txt") set (
    inputs += arg,
    outputFiles += ("output.txt", output),
    outputs += arg
  )

val exploration = ExplorationTask(arg in (0 to 10))

val copy = CopyFileHook(output, workDirectory / "hello${arg}.txt")

exploration -< (pythonTask hook copy)

Useful Links

Documentation www.openmole.org
Development version next.openmole.org
Source code github.com/openmole
Market place github.com/openmole-market

2 Février, 17h

Thanks!

romain.reuillon@iscpif.fr
mathieu.leclaire@iscpif.fr
j.passerat-palmbach@imperial.ac.uk