Mathieu Leclaire
Romain Reuillon
Jonathan Passerat-Palmbach


Model construction steps
Many scientists use naturally parallel methods daily:
  • Data reconstruction
  • Parameter estimation
  • Sensitivity analysis
  • Optimisation
  • Replication
  • ...
Execution on the same program with different parameters and/or datasets.

These methods are time consuming, but generally executed in a sequential manner.

OpenMOLE fills the gap. It provides an easy way to describe and distribute naturally parallel processes.

Zero deployment approach

  • The code is the user's code, not a web service
  • User code is automatically deployed at runtime
  • Ships to remote environment
  • No prior knowledge of remote environment needed
  • No installation required on any machine

Portable code ... or not

Codes running on JVM (Java, Scala, Netlogo)

And others ... C, C++, Python, R, Fortran, Octave, Scilab, Haskell, OCaml, etc

Port (almost) any program to the grid in 3 simple steps


  • Archive it with CARE* <=> execute it on linux
  • Write your OpenMOLE workflow
  • Click the run button


* http://reproducible.io

Packaging an application with Care

Applications have dependencies:
  • Shared libraries
  • Packages (Python, R, ...)
  • Low level system calls
  • Environment variables
  • ...

Capture these dependencies and transfer along with the application from Linux to Linux

Naturally parallel formalism to design experiments:a workflow



Automatic data transfers
and
Replica management

Files and folders transfers are handled transparently by OpenMOLE

What OpenMOLE does


Data parallelism

What OpenMOLE does not



Parallelisation by message / Task Parallelism
Example: Spark, MPI, ...

Powered by EGI

(and others)

A workflow


val i   = Val[Double]        
val res = Val[Double]        

val exploration = ExplorationTask ( i in (0.0 to 10.0 by 1.0) )

val model = 
  ScalaTask ("val res = i * 2") set (
    inputs  += i,
    outputs += (i, res)
  )
  
val env = LocalEnvironment(5)

val ex = exploration -< (model on env) start
        

The same workflow on the Grid !


val i   = Val[Double]        
val res = Val[Double]        

val exploration = ExplorationTask ( i in (0.0 to 10.0 by 1.0) )

val model = 
  ScalaTask ("val res = i * 2") set (
    inputs  += i,
    outputs += (i, res)
  )
  
val env = EGIEnvironment("biomed")

val ex = exploration -< (model on env) start
        

Switching to the Grid Environment is so easy !

Web Application

Console interface


Towards a P2P web platform


Towards graphical workflows


Market place

OpenMOLE through 4 examples

OpenMOLE is neither dedicated to a scientific field nor to a language

  • Chromosome structuring: Neuro Sciences, C++
  • The SimTRAP project: Social Sciences, Netlogo
  • The SimPOP project: Geography, Scala
  • The BioEmergence project: Biology, C

Chromosome structuring


C++ model
2 days per simulation for 1,600 simulations
= 8.5 years CPU time

The SimTRAP project


Netlogo model
5 mins per simulation for 100,000 simulations
= 1 year CPU time

The Simpop project


Scala model
2 secs per simulation for
500,000,000 simulations
= 30 years CPU time

The Bioemergences project


C model - Portal access
Daily production
10,000 hours / day

Useful Links

Documentation www.openmole.org
Mailing-list list.openmole.org
Development version next.openmole.org
Source code github.com/openmole
Market place github.com/openmole-market

Extra notions

Hooks

Tasks are mute
Hooks extract content from the dataflow

exploration -< (model on env) >- (average hook ToStringHook())
        

Grouping

Short jobs don't cope well with distributed computing
Group multiple tasks in the same batch job

exploration -< (model on env by 10) >- (average hook ToStringHook())
        

Thanks!

romain.reuillon@iscpif.fr
mathieu.leclaire@iscpif.fr
j.passerat-palmbach@imperial.ac.uk

OpenMOLE advanced

January, 14th !





Sensitivity analysis: Profiles



Reverse problems: Genetic algorithms



Pattern Space Exploration