Sandboxing data crunches, Chapter 2: clone processes

It costs 1–2 seconds between when Renderer calls subprocess.Popen() and when Step sandboxes itself. Importing Pandas, Numpy and Pyarrow takes too long.
clone() duplicates a process; execve() wipes its memory and makes it run something else
  1. clone() — duplicate all the RAM of the current process. Now there are two processes: the Renderer process and its near-exact clone, the Step process. The Renderer process continues; in it, clone()returns the Step process ID (“pid”). At the same time, the Step process continues at the exact same place; but there, clone() returns 0. Aside from that single integer difference, the two processes are identical.
  2. The process that judges itself to be a child (because clone() returned 0) calls execve("/path/to/program", …). This erases all the unwanted — and sensitive! — Renderer data from memory and starts the Step program.
Fast but unacceptable
This is essentially Python’s “multiprocessing.forkserver” design
  1. Renderer signals Spawner to invoke clone().
  2. Spawner clones itself to create a Step process, costing less than 1 millisecond. (Since Step is a clone of Spawner, its memory doesn’t contain secrets or user data.)
  3. The Step process stops behaving like the Spawner it cloned; it begins sandboxing instead. See — this Step process started instantly!
  4. Meanwhile, Spawner returns a subprocess handle to Renderer and then awaits Renderer’s next request. Spawner can do this all day….
clone(CLONE_PARENT) makes Step is a child of Renderer
Calling clone() from Python

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Hooper

Adam Hooper

Journalist, ex software engineer