﻿ wolfram-mathematica - Efficient alternative to Outer on sparse arrays in Mathematica? - oipapio - oipapio.com oipapio

# Efficient alternative to Outer on sparse arrays in Mathematica?

Suppose I have two very large lists {a1, a2, …} and {b1, b2, …} where all ai and bj are large sparse arrays. For the sake of memory efficiency I store each list as one comprehensive sparse array.

Now I would like to compute some function f on all possible pairs of ai and bj where each result f[ai, bj] is a sparse array again. All these sparse arrays have the same dimensions, by the way.

While

``````Flatten[Outer[f, {a1, a2, ...}, {b1, b2, ...}, 1], 1]
``````

returns the desired result (in principle) it appears to consume excessive amounts of memory. Not the least because the return value is a list of sparse arrays whereas one comprehensive sparse array turns out much more efficient in my cases of interest.

Is there an efficient alternative to the above use of `Outer`?

More specific example:

``````{SparseArray[{{1, 1, 1, 1} -> 1, {2, 2, 2, 2} -> 1}],
SparseArray[{{1, 1, 1, 2} -> 1, {2, 2, 2, 1} -> 1}],
SparseArray[{{1, 1, 2, 1} -> 1, {2, 2, 1, 2} -> 1}],
SparseArray[{{1, 1, 2, 2} -> -1, {2, 2, 1, 1} -> 1}],
SparseArray[{{1, 2, 1, 1} -> 1, {2, 1, 2, 2} -> 1}],
SparseArray[{{1, 2, 1, 2} -> 1, {2, 1, 2, 1} -> 1}],
SparseArray[{{1, 2, 2, 1} -> -1, {2, 1, 1, 2} -> 1}],
SparseArray[{{1, 2, 2, 2} -> 1, {2, 1, 1, 1} -> 1}]};
ByteCount[%]

list = SparseArray[%%]
ByteCount[%]

Flatten[Outer[Dot, list, list, 1], 1];
ByteCount[%]
list1x2 = SparseArray[%%]
ByteCount[%]

Flatten[Outer[Dot, list1x2, list, 1], 1];
ByteCount[%]
list1x3 = SparseArray[%%]
ByteCount[%]
``````

etc. Not only are the raw intermediate results of `Outer` (lists of sparse arrays) extremely inefficient, `Outer` seems to consume way too much memory during the computation itself, too.

1. 2019-11-13

I will propose a solution which is rather complex but allows one to only use about twice as much memory during the computation as is needed to store the final result as a `SparseArray`. The price to pay for this will be a much slower execution.

## The code

### Sparse array construction / deconstruction API

Here is the code. First, a slightly modified (to address higher-dimensional sparse arrays) sparse array construction - deconstruction API, taken from this answer:

``````ClearAll[spart, getIC, getJR, getSparseData, getDefaultElement,
makeSparseArray];
HoldPattern[spart[SparseArray[s___], p_]] := {s}[[p]];
getIC[s_SparseArray] := spart[s, 4][[2, 1]];
getJR[s_SparseArray] := spart[s, 4][[2, 2]];
getSparseData[s_SparseArray] := spart[s, 4][[3]];
getDefaultElement[s_SparseArray] := spart[s, 3];
makeSparseArray[dims_List, jc_List, ir_List, data_List, defElem_: 0] :=
SparseArray @@ {Automatic, dims, defElem, {1, {jc, ir}, data}};
``````

### Iterators

The following functions produce iterators. Iterators are a good way to encapsulate the iteration process.

``````ClearAll[makeTwoListIterator];
makeTwoListIterator[fname_Symbol, a_List, b_List] :=
With[{indices = Flatten[Outer[List, a, b, 1], 1]},
With[{len  = Length[indices]},
Module[{i = 0},
ClearAll[fname];
fname[] := With[{ind = ++i}, indices[[ind]] /; ind <= len];
fname[] := Null;
fname[n_] :=
With[{ind = i + 1}, i += n;
indices[[ind ;; Min[len, ind + n - 1]]] /; ind <= len];
fname[n_] := Null;
]]];
``````

Note that I could have implemented the above function more memory - efficiently and not use `Outer` in it, but for our purposes this won't be the major concern.

Here is a more specialized version, which produces interators for pairs of 2-dimensional indices.

``````ClearAll[make2DIndexInterator];
make2DIndexInterator[fname_Symbol, i : {iStart_, iEnd_}, j : {jStart_, jEnd_}] :=
makeTwoListIterator[fname, Range @@ i, Range @@ j];
make2DIndexInterator[fname_Symbol, ilen_Integer, jlen_Integer] :=
make2DIndexInterator[fname, {1, ilen}, {1, jlen}];
``````

Here is how this works:

``````In[14]:=
makeTwoListIterator[next,{a,b,c},{d,e}];
next[]
next[]
next[]

Out[15]= {a,d}
Out[16]= {a,e}
Out[17]= {b,d}
``````

We can also use this to get batch results:

``````In[18]:=
makeTwoListIterator[next,{a,b,c},{d,e}];
next[2]
next[2]

Out[19]= {{a,d},{a,e}}
Out[20]= {{b,d},{b,e}}
``````

, and we will be using this second form.

### SparseArray - building function

This function will build a `SparseArray` object iteratively, by getting chunks of data (also in `SparseArray` form) and gluing them together. It is basically code used in this answer, packaged into a function. It accepts the code piece used to produce the next chunk of data, wrapped in `Hold` (I could alternatively make it `HoldAll`)

``````Clear[accumulateSparseArray];
accumulateSparseArray[Hold[getDataChunkCode_]] :=
Module[{start, ic, jr, sparseData, dims, dataChunk},
start = getDataChunkCode;
ic = getIC[start];
jr = getJR[start];
sparseData = getSparseData[start];
dims = Dimensions[start];
While[True, dataChunk = getDataChunkCode;
If[dataChunk === {}, Break[]];
ic = Join[ic, Rest@getIC[dataChunk] + Last@ic];
jr = Join[jr, getJR[dataChunk]];
sparseData = Join[sparseData, getSparseData[dataChunk]];
dims[[1]] += First[Dimensions[dataChunk]];
];
makeSparseArray[dims, ic, jr, sparseData]];
``````

### Putting it all together

This function is the main one, putting it all together:

``````ClearAll[sparseArrayOuter];
sparseArrayOuter[f_, a_SparseArray, b_SparseArray, chunkSize_: 100] :=
Module[{next, wrapperF, getDataChunkCode},
make2DIndexInterator[next, Length@a, Length@b];
wrapperF[x_List, y_List] := SparseArray[f @@@ Transpose[{x, y}]];
getDataChunkCode :=
With[{inds = next[chunkSize]},
If[inds === Null, Return[{}]];
wrapperF[a[[#]] & /@ inds[[All, 1]], b[[#]] & /@ inds[[All, -1]]]
];
accumulateSparseArray[Hold[getDataChunkCode]]
];
``````

Here, we first produce the iterator which will give us on demand portions of index pair list, used to extract the elements (also `SparseArrays`). Note that we will generally extract more than one pair of elements from two large input `SparseArray`-s at a time, to speed up the code. How many pairs we process at once is governed by the optional `chunkSize` parameter, which defaults to `100`. We then construct the code to process these elements and put the result back into `SparseArray`, where we use an auxiliary function `wrapperF`. The use of iterators wasn't absolutely necessary (could use `Reap`-`Sow` instead, as with other answers), but allowed me to decouple the logic of iteration from the logic of generic accumulation of sparse arrays.

## Benchmarks

First we prepare large sparse arrays and test our functionality:

``````In[49]:=
arr = {SparseArray[{{1,1,1,1}->1,{2,2,2,2}->1}],SparseArray[{{1,1,1,2}->1,{2,2,2,1}->1}],
SparseArray[{{1,1,2,1}->1,{2,2,1,2}->1}],SparseArray[{{1,1,2,2}->-1,{2,2,1,1}->1}],
SparseArray[{{1,2,1,1}->1,{2,1,2,2}->1}],SparseArray[{{1,2,1,2}->1,{2,1,2,1}->1}]};

In[50]:= list=SparseArray[arr]
Out[50]= SparseArray[<12>,{6,2,2,2,2}]

In[51]:= larger = sparseArrayOuter[Dot,list,list]
Out[51]= SparseArray[<72>,{36,2,2,2,2,2,2}]

In[52]:= (large= sparseArrayOuter[Dot,larger,larger])//Timing
Out[52]= {0.047,SparseArray[<2592>,{1296,2,2,2,2,2,2,2,2,2,2}]}

In[53]:= SparseArray[Flatten[Outer[Dot,larger,larger,1],1]]==large
Out[53]= True

In[54]:= MaxMemoryUsed[]
Out[54]= 21347336
``````

Now we do the power tests

``````In[55]:= (huge= sparseArrayOuter[Dot,large,large,2000])//Timing
Out[55]= {114.344,SparseArray[<3359232>,{1679616,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}]}

In[56]:= MaxMemoryUsed[]
Out[56]= 536941120

In[57]:= ByteCount[huge]
Out[57]= 262021120

In[58]:= (huge1 = Flatten[Outer[Dot,large,large,1],1]);//Timing
Out[58]= {8.687,Null}

In[59]:= MaxMemoryUsed[]
Out[59]= 2527281392
``````

For this particular example, the suggested method is 5 times more memory-efficient than the direct use of `Outer`, but about 15 times slower. I had to tweak the `chunksize` parameter (default is `100`, but for the above I used `2000`, to get the optimal speed / memory use combination). My method only used as a peak value twice as much memory as needed to store the final result. The degree of memory-savings as compared to `Outer`- based method will depend on the sparse arrays in question.

2. 2019-11-13

If `lst1` and `lst2` are your lists,

``````Reap[
Do[Sow[f[#1[[i]], #2[[j]]]],
{i, 1, Length@#1},
{j, 1, Length@#2}
] &[lst1, lst2];
] // Last // Last
``````

does the job and may be more memory-efficient. On the other hand, maybe not. Nasser is right, an explicit example would be useful.

EDIT: Using Nasser's randomly-generated arrays, and for `len=200`, `MaxMemoryUsed[]` indicates that this form needs 170MB while the `Outer` form in the question takes 435MB.

3. 2019-11-13

Using your example `list` data, I believe that you will find the ability to `Append` to a `SparseArray` quite helpful.

``````acc = SparseArray[{}, {1, 2, 2, 2, 2, 2, 2}]

Do[AppendTo[acc, i.j], {i, list}, {j, list}]

Rest[acc]
``````

I need `Rest` to drop the first zero-filled tensor in the result. The second argument of the seed `SparseArray` must be the dimensions of each of your elements with a prefixed `1`. You may need to explicitly specify a background for the seed `SparseArray` to optimize performance.