DFM Class

class DFM(C, E)[source]

Distributed Free Monoid = A list of something.

All method calls are parallel, and must be called by every rank.

C

Reference to the Context object

E

List of local elements.

collect(root=0)[source]

Collect all the elements to the root rank.

This is equivalent to reduce(extend, [], distribute=False), but uses MPI_Gather.

Note

Even though non-root ranks receive None, they still have call this function. Check the dataset size before calling this, since this may cause a memory error.

Parameters:

root – The rank receiving the collected results. if root is None, then all ranks receive the result.

Returns:

List of elems if rank == root, None otherwise.

filter(f)[source]

Filter, removing some elements.

Parameters:

f – function of type = elem -> bool

Returns:

new DFM

flatMap(f)[source]

Map over elements and concatenate all results.

Parameters:

f – function of type = elem -> [new elem]

Returns:

new DFM

group(f, concat, N)[source]

Group elements into N partitions.

Each element, e, is assumed to represent a collection of things. The function, f, returns a dictionary mapping the new element number to an intermediate collection, e’.

For example, if e is a list of names, the function def f(e, out):

for name in e:

i = ord(name[0].upper())-ord(‘A’) if i not in out:

out[i] = []

out[i].append(name)

would break it up into groups by first-letter.

Group will then communicate these new values as needed so that each output element has a list of its component blocks.

The function concat will do the final join of all blocks composing each output element.

Parameters:
  • f – function of type = e, *{i: [e’]} -> () modifying the dictionary to append all elements belonging to each key Note: this requires 0 <= i < N

  • concat – function of type = [e’] -> new elem building the final output elements

  • N – the number of output elements

Returns:

DFM of new elems

head(n=10)[source]

Distribute the first n elements to all ranks (useful for interactive debugging)

Returns:

first n values, [elem]

len()[source]

Number of elements in DFM (returned to all ranks)

Returns:

total size

Return type:

int

map(f)[source]

Map over elements.

Parameters:

f – function of type = elem -> new elem

Returns:

new DFM

nodeMap(f)[source]

map over the MPI ranks.

The input function, f, is called once on every rank with arguments: * rank : int * E : list containing all local elements

Note

f must return a list

Parameters:

f – function of type = int, [elem] -> [new elems]

Returns:

DFM

reduce(f, x0, distribute=True)[source]

Reduce the dataset to a value.

Apply an associative, pairwise reduction to the dataset. The result is sent to all ranks if distribute=True otherwise, it is present only on the root rank (rank 0).

Yhe function f may operate in-place, storing its result on the left, for example:

def f(a,b):
    a[0] = b[0]
    return a

This is indicated in the function’s type with *elem.

Each rank calls x0 = f(x0, e) on all its elements, then does a fan-in reduction on x0. You can technically implement a different function for each phase by detecting whether e has the type of x0 or the type of an element. In fact, this is the only way to get around the restriction that type(x0) == type(e).

Note

x0 may be updated in-place. Even if you do this, x0 will contain an undefined value on return.

x0 must be initialized to a value representing the starting value for a single MPI rank.

Parameters:
  • f – a function of type = *elem, elem -> *elem It is permissable to modify the left argument in-place and return it.

  • x0 – the “zero” value of the first argument

  • distribute – Distribute the answer from rank 0 to all ranks?

Returns:

elem

repartition(llen, split, concat, N)[source]

Repartition into N “equally distributed” items.

Each element, e, is assumed to represent a collection of llen(e) items. The function split should split e up into blocks spanning the specified index ranges.

Repartition will then communicate these blocks as needed so that each output partition has a list of its component blocks.

The function concat will do the final join of all blocks composing each output element.

Parameters:
  • llen – function of type = e -> int returning the internal length of each element

  • split – function of type = elem, [(i0,i1)] -> [e’] creating intermediate blocks to communicate

  • concat – function of type = [e’] -> new elem building the final output elements

Returns:

DFM of new elems

scan(f)[source]

Perform a parallel prefix-scan on the dataset.

Parameters:

f – an associative, pairwise function of type = elem, elem -> elem

Returns:

DFM containing [e0, f(e0,e1), f(e0,f(e1,e2)), …]