Table of Contents

  1. Cloud Workflows
  2. The {m}brace Runtime
  3. The {m}brace Shell

Cloud Workflows

In order to better understand cloud programming with {m}brace, one need think about asynchronous programming. Asynchrony in .NET and other frameworks has traditionally been associated with callback programming, a pattern that involves initializing asynchronous operations paired with the definition of a callback delegate to be invoked upon their completion. This pattern however has been known to often result in complex and hard to read code.

A Prelude: F# Asynchronous Workflows

F# Asynchronous Workflows enable the definition asynchronous operation without the need to explicitly use callbacks. With the help of F# computation expressions, declaring an async workflow is done in the illusion of sequential programming, when in actuality it is run asynchronously, suspending and resuming computation as required.

let download (url : string) = async {
    let http = new System.Net.WebClient()
    let! html = http.AsyncDownloadString(Uri url)
    return html.Split('\n')
}

The above workflow asynchronously downloads the content of a web page and resumes to split it into lines once the download has been completed. Async operations are composed with the special let! keyword, which can be thought as syntactic sugar for the callback delegate of the right-hand-side operation.

Asynchronous workflows can also be utilized in scenarios where parallelism is required. The Async.Parallel combinator combines a given enumeration of workflows into a single asynchronous workflow that executes the inputs in a fork/join pattern.

let workflow = async {
    let! results = 
        Async.Parallel [ download "http://www.m-brace.net" ; download "http://www.nessos.gr" ]

    return Array.concat results |> Array.length
}

This workflow will download the two pages asynchronously and resume computation when both have completed.

Async workflows have defered execution semantics. They need to be initiated as follows:

Async.Start workflow

Workflows are managed by a scheduler that transparently allocates pending jobs to the .NET thread pool. A typical executing async expression will make jumps between multiple threads as it progresses.

F# asynchronous workflows are successful enough that they have been adopted by other languages such as C# 5.0 and Python.

Cloud Workflows

The programming paradigm of {m}brace follows very much in the style of F# asynchronous workflows. We introduce cloud workflows (also known as a cloud monad) as a means of declaratively specifying distributed computation. Building on our previous declaration of download, we could define

cloud {
    let jobs : ICloud<string []> [] = 
        Array.map (download >> Cloud.OfAsync) [| "http://www.m-brace.net" ; "http://www.nessos.gr" |]
    let! results = Cloud.Parallel jobs
    return Array.concat results |> Array.length
}

This is a direct translation of the async snippet presented above. The principal difference between cloud and asynchronous workflows is that jobs are allocated to worker machines in the data center, rather than threads in the thread pool. Like async, cloud workflows have defered execution semantics and have to be sent to an {m}brace runtime for evaluation.

Cloud workflows can be used to create user defined higher-order functions. For example, we can define a distributed variant of the filter combinator:

[<Cloud>]
let filter (f : 'a -> bool) (xs : 'a []) =
    cloud {
        let jobs = Array.map (fun x -> cloud { return if f x then Some x else None }) xs
        let! results = Cloud.Parallel jobs
        return Array.choose id results
    }

Defining MapReduce

Cloud workflows make it possible to define various flavors of MapReduce-like workflows at the library level. A simple variant can be declared as follows:

[<Cloud>]
let rec mapReduce (map: 'T -> ICloud<'R>) 
                  (reduce: 'R -> 'R -> ICloud<'>) 
                  (identity: 'R) 
                  (input: 'T list) =
    cloud {
        match input with
        | [] -> return identity
        | [value] -> return! map value
        | _ ->
            let left, right = List.split input

            let! l, r = 
                (mapReduce map reduce identity left)
                    <||>
                (mapReduce map reduce identity right)

            return! reduce l r
    }

The workflow splits the list into halves and passes them recursively to the workflow using the binary parallel operator <||>.

Cloud Refs

The cloud workflow API offers access to persistable and distributed data entities known as cloud refs. Cloud refs very much resemble references in the ML family of languages but are “monadic” in nature. In other words, their declaration entails a scheduling decision by the runtime. The following workflow stores the downloaded content of a web page and returns a cloud ref to it:

[<Cloud>]
let getRef () =
    cloud {
        let! lines = Cloud.OfAsync <| download "http://www.m-brace.net"
        let! ref = CloudRef.New lines
        return ref
    }

Cloud refs can subsequently be dereferenced either in the context of the client or a future cloud computation. Cloud refs come in immutable (managed) or in mutable (unmanaged) flavors.

Cloud refs make it possible to define distributed data structures in the cloud. For example,

type CloudTree<'T> = 
    | Leaf 
    | Node of 'T * ICloudRef<CloudTree<'T>> * ICloudRef<CloudTree<'T>>

defines a distributable binary tree. Additionally, {m}brace offers an assortment of cloud data primitives, each optimized for specific usage patterns. Cloud sequences offer on-demand fetching of stored collections of values, while Cloud Files enable direct access to binary blobs.

The {m}brace Runtime

The {m}brace runtime is the essential core of the framework. The runtime, among others, is responsible for the following things:

  • Manages and monitors machines that participate in the {m}brace cluster.
  • Executes and schedules cloud workflows, allocating worker nodes as appropriate.
  • Allows for the concurrent execution of multiple, cloud processes, that can be managed much in the sense of a distributed OS.
  • Enables elastic deployments, adding or removing nodes from an active runtime on the fly.
  • Enforces fault tolerance, cloud computations continue regardless of machine failure.
  • Offers pluggable integration with a range of distributed storage providers.

Architectural Overview

{m}brace employs a purpose built distributed actor framework as foundation for the runtime. Machines participating in the runtime’s cluster are called nodes. Every instance of a runtime has a unique master node which monitors the health of adjacent nodes and manages the cluster. The master node is assisted by alternative master nodes that replicate the master’s state and take over in case of failure. All other nodes are identified as slave nodes.

As mentioned above, each runtime is capable of executing multiple concurrent cloud processes. The runtime enforces a policy of isolation, in which work items related to different cloud computations are run in separate CLI instances in every node of the cluster. This enables a more robust runtime and more efficient memory management. The unit of cloud computation isolation is called a Process Domain.

Every running cloud computation utilizes a scheduler and a set of workers than operate within the context of a process domain. The scheduler unfolds a cloud workflow and allocates pending jobs to available worker nodes as required. The state of the scheduler is replicated, so computation can fall back at any point in case of machine failure.

Storage Providers

The {m}brace runtime offers pluggable support for a range of distributed store providers. Storage providers are essential for a runtime to function, since they are used for internal caching and make possible the definition of cloud refs. {m}brace comes with support for FileSystem, SQL and Azure storage providers, while providing user-defined custom implementations is also possible.

The {m}brace Shell

{m}brace comes with a comprehensive set of client tooling that includes a rich library of cloud expression building combinators as well as an API for remote control and management of {m}brace runtimes. Using a remote runtime is as simple as doing the following:

let runtime = MBrace.Connect "mbrace://grothendieck:2675"

runtime.Run <@ cloud_computation @>

The MBrace.Connect method returns a handle to the runtime which can be used for various functions, while the Run method tells the runtime to execute a given cloud workflow. The API can be used to perform many other operations:

// print cluster information
do runtime.ShowInfo()

// print cloud process info
do runtime.ShowProcessInfo()

// reset the runtime
do runtime.Reboot()

{m}brace comes with a modified version of the open source F# interactive environment that is bundled with the F# compiler, the {m}brace Shell. The {m}brace Shell is capable of sending REPL-defined types and cloud computations directly to the runtime. This considerably shortens the edit-compile-debug cycles, making cloud programming an entirely interactive experience. It also enables static analysis specific to cloud workflows. Just like F# interactive, the {m}brace Shell is integratable with Microsoft’s Visual Studio, hence building libraries for your distributed algorithms, managing the runtime and submitting cloud computations can all be done from within the same environment.