Docker without Docker: How to run containers on HPC environments without any prerequisites!

 HPC Summer Course - 20 Sep 2017

Who am I?

Jonathan Passerat-Palmbach <j.passerat-palmbach@imperial.ac.uk>

Computational research in 3 steps

Idea

Experiment

How do I run it on HPC?

Ask the HPC team!

Here comes cluster computing

(from HPC team slides)

Python code

import socket
print("Hello, from {}".format(socket.gethostname()))

Submission script

#!/bin/sh
#PBS -l walltime=0:10:00
#PBS -l select=1:ncpus=1:mem=600mb
#PBS -N class

python /work/jpassera/hellocx1.py

Easy!!

My laptop

python hellocx1.py
>>> 5 / 2 == 2.5          
True
CX1

qsub hellocx1.pbs
>>> 5 / 2 == 2.5          
False

HPC user experience

Classic problems when porting applications to HPC

  • Software tool missing
  • Dependency missing (high-level package)
  • Library missing (low-level shared library)
  • Version incompatibility
  • Silent errors (numerically diverging results)

How do I make sure my app will run on HPC?

Bring all my dependencies along with my app!

Packaging applications with Docker

What is Docker again?

Demo Time

Execution in Docker

Docker on HPC

View from Imperial’s CX1:

-bash-4.2$ docker ps
-bash: docker: command not found

Let’s install docker then?

-bash-4.2$ sudo apt install docker

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

[sudo] password for jpassera: 

Spoiler alert: not happening

Do we really need Docker?

This is a Docker image

How does it work?

Merge layers

Build launcher (see more at proot-me)

Demo Time

Docker on HPC (cx1)

Welcome to OpenMOLE

Naturally parallel methods

  • Data reconstruction
  • Parameter estimation
  • Sensitivity analysis
  • Optimisation
  • Replication

Execution of the same program with different parameters and/or datasets.

But it’s slow…

Enters distributed computing / HPC!

PrototypeScale-up for Free

OpenMOLE articulates around 3 orthogonal concepts

… and an expressive workflow formalism for distributed computing.

Model?

Stuff you can run, taking inputs and producing outputs

Zero deployment approach

  • User code is automatically deployed at runtime
  • No prior knowledge of remote environment needed
  • No installation required on any machine

Transparent access

  • No preliminary step
  • Access as the user would do it
  • With user credential

Any application

Packaging (non JVM) application with Care

https://github.com/proot-me/PRoot

Packaging (non JVM) application with Care

Applications have dependencies:

  • Shared libraries
  • Packages (Python, R, …)
  • Low level system calls
  • Environment variables

Capture these dependencies and transfer along with the application from Linux to Linux

Packaging (non JVM) application with Care

Distributed execution of (almost) any program to (pretty much) any computing environment

  1. Package it with CARE
  2. Write your OpenMOLE workflow
  3. Click the run button
  4. Write your paper

Native code

care -o hello.tgz.bin python hello.py 42 test.txt
val arg = Val[Int]
val output = Val[File]

val pythonTask =
  CARETask(
      workDirectory / "hello.tgz.bin",
      "python hello.py ${arg} output.txt") set (
    inputs += arg,
    outputFiles += ("output.txt", output),
    outputs += arg
  )

val exploration = ExplorationTask(arg in (0 to 10))

val copy = CopyFileHook(output, workDirectory / "hello${arg}.txt")

exploration -< (pythonTask hook copy)
      

Execution environments?

Supported Today

Summary - OpenMOLE

  • Do you really need HPC?
  • Or rather HTC instead?

  • Scientific platform to explore models
  • (Hyper)Parameter tuning
  • Transparent use of HTC / distributed computing (GridScale)
  • Genetic-Algorithm based optimisation methods (MGO)

Thanks!

romain.reuillon@iscpif.fr paul.chapron@iscpif.fr
mathieu.leclaire@iscpif.fr guillaume.cherel@iscpif.fr
j.passerat-palmbach@imperial.ac.uk you?