Too big to fail

01 Apr 2013

Image from Learn You Some Erlang for Great Good!

I’m proud to announce a new extension to Erlang. After twelve years of discussion, we’ve added the too_big_to_fail flag to processes.

Here a usage example:

test() ->
    spawn(fun() ->
          process_flag(too_big_to_fail, true)
      end).

Semantics

Too big to fail processes behave like regular processes until they get too big and memory congestion occurs. If a memory allocation error is triggered when a too_big_to_fail process needs more memory, then a random smaller process is killed, and the system reattempts memory allocation for the too_big_to_fail process.

An interesting situation can occur if the too big to fail process has killed all other processes and still cannot get enough memory.

In this case the node running the process tries to memory steal from other nodes.

Memory Stealing

If a too big to fail process runs out of memory after killing all the smaller processes on the current node, it can spawn a clone of itself on other nodes and, these can allocate additional memory.

This is what happens:

  • A too_big_to_fail_process runs out of memory on node one
  • A “helper” process is allocated on node two
  • The helper node allocates as much memory as it can
  • node one reduces it’s memory footprint, using the additional memory made by stealing memory from node two.
  • This if recursively repeated. So If node two runs out of memory, it tries to steal memory from node three, and so.

If all too_big_to_fail_nodes run out of memory, then they request additional memory from the memory allocator of last resort.

The Memory Allocator of Last Resort

If all the too_big_to_fail processes have failed, and run out of memory then they can make a request to the “Memory Allocator of Last Resort.”. The memory allocator of last resort has the ability to print new memory. Exactly how it does this and what the consequences are is little understood. Fortunately if new memory is consumed slower than it is printed everything returns to normal. But if new memory is consumed faster that it can be printed then the system will crash and a reboot is required.

Release Schedule

too_big_to_fail_processes will be released in OTP-R18B. The semantics of the memory allocator of last resort are still being studied and will be the subject of a forthcoming EEP.

Tagged: banks   failure  


Comments: