DFM Class¶
-
class
DFM(C, E)[source]¶ Distributed Free Monoid = A list of something.
All method calls are parallel, and must be called by every rank.
-
C¶ Reference to the Context object
-
E¶ List of local elements.
-
collect(root=0)[source]¶ Collect all the elements to the root rank.
This is equivalent to reduce(extend, [], distribute=False), but uses MPI_Gather.
Note
Even though non-root ranks receive None, they still have call this function. Check the dataset size before calling this, since this may cause a memory error.
- Parameters
root – The rank receiving the collected results. if root is None, then all ranks receive the result.
- Returns
List of elems if rank == root, None otherwise.
-
filter(f)[source]¶ Filter, removing some elements.
- Parameters
f – function of type = elem -> bool
- Returns
new DFM
-
flatMap(f)[source]¶ Map over elements and concatenate all results.
- Parameters
f – function of type = elem -> [new elem]
- Returns
new DFM
-
group(f, concat, N)[source]¶ Group elements into N partitions.
Each element, e, is assumed to represent a collection of things. The function, f, returns a dictionary mapping the new element number to an intermediate collection, e’.
For example, if e is a list of names, the function def f(e, out):
- for name in e:
i = ord(name[0].upper())-ord(‘A’) if i not in out:
out[i] = []
out[i].append(name)
would break it up into groups by first-letter.
Group will then communicate these new values as needed so that each output element has a list of its component blocks.
The function
concatwill do the final join of all blocks composing each output element.- Parameters
f – function of type = e, *{i: [e’]} -> () modifying the dictionary to append all elements belonging to each key Note: this requires 0 <= i < N
concat – function of type = [e’] -> new elem building the final output elements
N – the number of output elements
- Returns
DFM of new elems
-
head(n=10)[source]¶ Distribute the first n elements to all ranks (useful for interactive debugging)
- Returns
first n values, [elem]
-
map(f)[source]¶ Map over elements.
- Parameters
f – function of type = elem -> new elem
- Returns
new DFM
-
nodeMap(f)[source]¶ map over the MPI ranks.
The input function, f, is called once on every rank with arguments: * rank : int * E : list containing all local elements
Note
f must return a list
- Parameters
f – function of type = int, [elem] -> [new elems]
- Returns
DFM
-
reduce(f, x0, distribute=True)[source]¶ Reduce the dataset to a value.
Apply an associative, pairwise reduction to the dataset. The result is sent to all ranks if distribute=True otherwise, it is present only on the root rank (rank 0).
Yhe function f may operate in-place, storing its result on the left, for example:
def f(a,b): a[0] = b[0] return a
This is indicated in the function’s type with *elem.
Each rank calls x0 = f(x0, e) on all its elements, then does a fan-in reduction on x0. You can technically implement a different function for each phase by detecting whether e has the type of x0 or the type of an element. In fact, this is the only way to get around the restriction that type(x0) == type(e).
Note
x0 may be updated in-place. Even if you do this, x0 will contain an undefined value on return.
x0 must be initialized to a value representing the starting value for a single MPI rank.
-
repartition(llen, split, concat, N)[source]¶ Repartition into N “equally distributed” items.
Each element, e, is assumed to represent a collection of llen(e) items. The function split should split e up into blocks spanning the specified index ranges.
Repartition will then communicate these blocks as needed so that each output partition has a list of its component blocks.
The function
concatwill do the final join of all blocks composing each output element.- Parameters
llen – function of type = e -> int returning the internal length of each element
split – function of type = elem, [(i0,i1)] -> [e’] creating intermediate blocks to communicate
concat – function of type = [e’] -> new elem building the final output elements
- Returns
DFM of new elems
-