How to create a matrix from several other matrices? - python

I have a problem with joining several matrices. Say I have four matrices A,B,C,D. And I want to join them in a way to obtain a new one
M = A B
C D
How can I do this using python numpy?

Considering all have same dimension, here's an example:
>>> A = np.arange(4).reshape(2,2)
>>> B, C, D = A*2, (A+1), (A+2)
>>> M = np.array([[A,B],[C,D]])
>>> M
array([[[[0, 1],
[2, 3]],
[[0, 2],
[4, 6]]],
[[[1, 2],
[3, 4]],
[[2, 3],
[4, 5]]]])
Or if you want something like this:
>>> M = np.concatenate([np.concatenate([A,B],axis = 1),np.concatenate([C,D],axis = 1)],axis = 0)
>>> M
array([[0, 1, 0, 2],
[2, 3, 4, 6],
[1, 2, 2, 3],
[3, 4, 4, 5]])

Using NumPy's hstack and vstack functions, you can generate M from any A, B, C, D as long as each two number of rows and/or number of columns match. See the following example:
import numpy as np
A = np.random.rand(3, 2) # (3 x 2)
B = np.random.rand(3, 4) # (3 x 4)
C = np.random.rand(4, 3) # (4 x 3)
D = np.random.rand(4, 3) # (4 x 3)
print(A, '\n')
print(B, '\n')
print(C, '\n')
print(D, '\n')
M = np.vstack((np.hstack((A, B)), np.hstack((C, D))))
# (3 x 6) (4 x 6)
# (7 x 6)
print(M)
Output:
[[0.60220154 0.77067838]
[0.7623169 0.54727146]
[0.20570341 0.56939493]]
[[0.322524 0.35260186 0.6581785 0.55662823]
[0.32034862 0.68664386 0.96432518 0.03410233]
[0.72779584 0.6705618 0.66460412 0.104223 ]]
[[0.20194483 0.49971436 0.50618483]
[0.89040491 0.25118623 0.67831283]
[0.30631334 0.69515443 0.70941023]
[0.41324506 0.23127909 0.29241595]]
[[0.0015009 0.43205507 0.08500188]
[0.48506546 0.46448833 0.61393518]
[0.51163779 0.81914233 0.21293481]
[0.33713576 0.33953848 0.9909197 ]]
[[0.60220154 0.77067838 0.322524 0.35260186 0.6581785 0.55662823]
[0.7623169 0.54727146 0.32034862 0.68664386 0.96432518 0.03410233]
[0.20570341 0.56939493 0.72779584 0.6705618 0.66460412 0.104223 ]
[0.20194483 0.49971436 0.50618483 0.0015009 0.43205507 0.08500188]
[0.89040491 0.25118623 0.67831283 0.48506546 0.46448833 0.61393518]
[0.30631334 0.69515443 0.70941023 0.51163779 0.81914233 0.21293481]
[0.41324506 0.23127909 0.29241595 0.33713576 0.33953848 0.9909197 ]]
You can switch vstack and hstack, if the number of columns of A/C and B/D match, and the number of rows of these concatenations are equal.
Hope that helps!

Related

Generating an array of arrays in Python

I want to multiply each element of B to the whole array A to obtain P. The current and desired outputs are attached. The desired output is basically an array consisting of 2 arrays since there are two elements in B.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P=B*A
print(P)
It currently produces an error:
ValueError: operands could not be broadcast together with shapes (2,) (3,3)
The desired output is
array(([[0.02109, 0.04218, 0.06327],
[0.08436, 0.10545, 0.12654],
[0.14763, 0.16872, 0.18981]]),
([[0.00775858, 0.01551716, 0.02327574],
[0.03103432, 0.0387929 , 0.04655148],
[0.05431006, 0.06206864, 0.06982722]]))
You can do this by:
B.reshape(-1, 1, 1) * A
or
B[:, None, None] * A
where -1 or : refer to B.shape[0] which was 2 and 1, 1 or None, None add two additional dimensions to B to get the desired result shape which was (2, 3, 3).
The easiest way i can think of is using list comprehension and then casting back to numpy.ndarray
np.asarray([A*i for i in B])
Answer :
array([[[0.02109 , 0.04218 , 0.06327 ],
[0.08436 , 0.10545 , 0.12654 ],
[0.14763 , 0.16872 , 0.18981 ]],
[[0.00775858, 0.01551715, 0.02327573],
[0.03103431, 0.03879289, 0.04655146],
[0.05431004, 0.06206862, 0.0698272 ]]])
There are many possible ways for this:
Here is an overview on their runtime for the given array (bare in mind these will change for bigger arrays):
reshape: 0.000174 sec
tensordot: 0.000550 sec
einsum: 0.000196 sec
manual loop: 0.000326 sec
See the implementation for each of these:
numpy reshape
Find documentation here:
Link
Gives a new shape to an array without changing its data.
Here we reshape the array B so we can later multiply it:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = B.reshape(-1, 1, 1) * A
print(P)
numpy tensordot
Find documentation here:
Link
Given two tensors, a and b, and an array_like object containing two
array_like objects, (a_axes, b_axes), sum the products of a’s and b’s
elements (components) over the axes specified by a_axes and b_axes.
The third argument can be a single non-negative integer_like scalar,
N; if it is such, then the last N dimensions of a and the first N
dimensions of b are summed over.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.tensordot(B, A, 0)
print(P)
numpy einsum (Einstein summation)
Find documentation here:
Link
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.einsum('ij,k', A, B)
print(P)
Note: A has two dimensions, we assign ij for their indexes. B has one dimension, we assign k to its index
manual loop
Another simple approach would be a loop (is faster than tensordot for the given input). This approach could be made "numpy free" if you dont want to use numpy for some reason. Here is the version with numpy:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
products = []
for b in B:
products.append(b*A)
P = np.array(products)
print(P)
#or the same as one-liner: np.asarray([A * elem for elem in B])

Why does calling np.array() on this list comprehension produce a 3d array instead of 2d?

I have a script produces the first several iterations of a Markov matrix multiplying a given set of input values. With the matrix stored as A and the start values in the column u0, I use this list comprehension to store the output in an array:
out = np.array([ ( (A**n) * u0).T for n in range(10) ])
The output has shape (10,1,6), but I want the output in shape (10,6) instead. Obviously, I can fix this with .reshape(), but is there a way to avoid creating the extra dimension in the first place, perhaps by simplifying the list comprehension or the inputs?
Here's the full script and output:
import numpy as np
# Random 6x6 Markov matrix
n = 6
A = np.matrix([ (lambda x: x/x.sum())(np.random.rand(n)) for _ in range(n)]).T
print(A)
#[[0.27457312 0.20195133 0.14400801 0.00814027 0.06026188 0.23540134]
# [0.21526648 0.17900277 0.35145882 0.30817386 0.15703758 0.21069114]
# [0.02100412 0.05916883 0.18309142 0.02149681 0.22214047 0.15257011]
# [0.17032696 0.11144443 0.01364982 0.31337906 0.25752732 0.1037133 ]
# [0.03081507 0.2343255 0.2902935 0.02720764 0.00895182 0.21920371]
# [0.28801424 0.21410713 0.01749843 0.32160236 0.29408092 0.07842041]]
# Random start values
u0 = np.matrix(np.random.randint(51, size=n)).T
print(u0)
#[[31]
# [49]
# [44]
# [29]
# [10]
# [ 0]]
# Find the first 10 iterations of the Markov process
out = np.array([ ( (A**n) * u0).T for n in range(10) ])
print(out)
#[[[31. 49. 44. 29. 10.
# 0. ]]
#
# [[25.58242101 41.41600236 14.45123543 23.00477134 26.08867045
# 32.45689942]]
#
# [[26.86917065 36.02438292 16.87560159 26.46418685 22.66236879
# 34.10428921]]
#
# [[26.69224394 37.06346073 16.59208202 26.48817955 22.56696872
# 33.59706504]]
#
# [[26.68772374 36.99727159 16.49987315 26.5003184 22.61130862
# 33.7035045 ]]
#
# [[26.68766363 36.98517264 16.50532933 26.51717543 22.592951
# 33.71170797]]
#
# [[26.68695152 36.98895204 16.50314718 26.51729716 22.59379049
# 33.70986161]]
#
# [[26.68682195 36.98848867 16.50286371 26.51763013 22.59362679
# 33.71056876]]
#
# [[26.68681128 36.98850409 16.50286036 26.51768807 22.59359453
# 33.71054167]]
#
# [[26.68680313 36.98851046 16.50285038 26.51769497 22.59359219
# 33.71054886]]]
print(out.shape)
#(10, 1, 6)
out = out.reshape(10,n)
print(out)
#[[31. 49. 44. 29. 10. 0. ]
# [25.58242101 41.41600236 14.45123543 23.00477134 26.08867045 32.45689942]
# [26.86917065 36.02438292 16.87560159 26.46418685 22.66236879 34.10428921]
# [26.69224394 37.06346073 16.59208202 26.48817955 22.56696872 33.59706504]
# [26.68772374 36.99727159 16.49987315 26.5003184 22.61130862 33.7035045 ]
# [26.68766363 36.98517264 16.50532933 26.51717543 22.592951 33.71170797]
# [26.68695152 36.98895204 16.50314718 26.51729716 22.59379049 33.70986161]
# [26.68682195 36.98848867 16.50286371 26.51763013 22.59362679 33.71056876]
# [26.68681128 36.98850409 16.50286036 26.51768807 22.59359453 33.71054167]
# [26.68680313 36.98851046 16.50285038 26.51769497 22.59359219 33.71054886]]
I think your confusion lies with how arrays can be joined.
Start with a simple 1d array (in numpy 1d is a real thing, not just a 'row vector' or 'column vector'):
In [288]: arr = np.arange(6)
In [289]: arr
Out[289]: array([0, 1, 2, 3, 4, 5])
np.array joins element arrays along a new 1st dimension:
In [290]: np.array([arr,arr])
Out[290]:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])
np.stack with the default axis value does the same thing. Read its docs.
We can make a 2d array, a column vector:
In [291]: arr1 = arr[:,None]
In [292]: arr1
Out[292]:
array([[0],
[1],
[2],
[3],
[4],
[5]])
In [293]: arr1.shape
Out[293]: (6, 1)
Using np.array on its transpose the (1,6) arrays:
In [294]: np.array([arr1.T, arr1.T])
Out[294]:
array([[[0, 1, 2, 3, 4, 5]],
[[0, 1, 2, 3, 4, 5]]])
In [295]: _.shape
Out[295]: (2, 1, 6)
Note the middle size 1 dimension, that bothered you.
np.vstack joins the arrays along the existing 1st dimension. It does not add one:
In [296]: np.vstack([arr1.T, arr1.T])
Out[296]:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])
Or we could join the arrays horizontally, on the 2nd dimension:
In [297]: np.hstack([arr1, arr1])
Out[297]:
array([[0, 0],
[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]])
That is (6,2) which can be transposed to (2,6):
In [298]: np.hstack([arr1, arr1]).T
Out[298]:
array([[0, 1, 2, 3, 4, 5],
[0, 1, 2, 3, 4, 5]])
If you use np.array() for input and # for matrix multiplication, it works as expected.
# Random 6x6 Markov matrix
n = 6
A = np.array([ (lambda x: x/x.sum())(np.random.rand(n)) for _ in range(n)]).T
# Random start values
u0 = np.random.randint(51, size=n).T
# Find the first 10 iterations of the Markov process
out = np.array([ ( np.linalg.matrix_power(A,n) # u0).T for n in range(10) ])
print(out)
#[[29. 24. 5. 12. 10. 32. ]
# [15.82875119 13.53436868 20.61648725 19.22478172 20.34082205 22.45478912]
# [21.82434718 10.06037119 14.29281935 20.75271393 18.76134538 26.30840297]
# [20.77484848 10.1379821 15.47488423 19.4965479 20.05618311 26.05955418]
# [21.02944236 10.09401438 15.24263478 19.48662616 19.95767996 26.18960236]
# [20.96887722 10.11647819 15.30729334 19.44261102 20.00089222 26.16384802]
# [20.98086362 10.11522779 15.29529799 19.44899285 19.99137187 26.16824587]
# [20.97795615 10.11606978 15.29817734 19.44798612 19.99293494 26.16687566]
# [20.97858032 10.11591954 15.29752865 19.44839852 19.99245389 26.16711909]
# [20.97844343 10.11594666 15.29766432 19.4483417 19.99254284 26.16706104]]
I made a few changes to the code, although I'm not 100% certain that the result is still the same (I am not familiar with Markov chains).
import numpy as np
n = 6
num_proc_iters = 10
rand_nums_arr = np.random.random_sample((n, n))
rand_nums_arr = np.transpose(rand_nums_arr / rand_nums_arr.sum(axis=1))
u0 = np.random.randint(51, size=n)
res_arr = np.concatenate([np.linalg.matrix_power(rand_nums_arr, curr) # u0 for curr in range(num_proc_iters)])
I would love to hear if anyone can think of any further improvements.

pandas dataframe exponential decay summation

I have a pandas dataframe,
[[1, 3],
[4, 4],
[2, 8]...
]
I want to create a column that has this:
1*(a)^(3) # = x
1*(a)^(3 + 4) + 4 * (a)^4 # = y
1*(a)^(3 + 4 + 8) + 4 * (a)^(4 + 8) + 2 * (a)^8 # = z
...
Where "a" is some value.
The stuff: 1, 4, 2, is from column one, the repeated 3, 4, 8 is column 2
Is this possible using some form of transform/apply?
Essentially getting:
[[1, 3, x],
[4, 4, y],
[2, 8, z]...
]
Where x, y, z is the respective sums from the new column (I want them next to each other)
There is a "groupby" that is being done on the dataframe, and this is what I want to do for a given group
If I'm understanding your question correctly, this should work:
df = pd.DataFrame([[1, 3], [4, 4], [2, 8]], columns=['a', 'b'])
a = 42
new_lst = []
for n in range(len(lst)):
z = 0
i = 0
while i <= n:
z += df['a'][i]*a**(sum(df['b'][i:n+1]))
i += 1
new_lst.append(z)
df['new'] = new_lst
Update:
Saw that you are using pandas and updated with dataframe methods. Not sure there's an easy way to do this with apply since you need a mix of values from different rows. I think this for loop is still the best route.

Encode array of integers into unique int

I have a fixed amount of int arrays of the form:
[a,b,c,d,e]
for example:
[2,2,1,1,2]
where a and b can be ints from 0 to 2, c and d can be 0 or 1, and e can be ints from 0 to 2.
Therefore there are: 3 * 3 * 2 * 2 * 3: 108 possible arrays of this form.
I would like to assign to each of those arrays a unique integer code from 0 to 107.
I am stuck, i thought of adding each numbers in the array, but two arrays such as:
[0,0,0,0,1] and [1,0,0,0,0]
would both add to 1.
Any suggestion?
Thank you.
You could use np.ravel_multi_index:
>>> np.ravel_multi_index([1, 2, 0, 1, 2], (3, 3, 2, 2, 3))
65
Validation:
>>> {np.ravel_multi_index(j, (3, 3, 2, 2, 3)) for j in itertools.product(*map(range, (3,3,2,2,3)))} == set(range(np.prod((3, 3, 2, 2, 3))))
True
Going back the other way:
>>> np.unravel_index(65, dims=(3, 3, 2, 2, 3))
(1, 2, 0, 1, 2)
Just another way, similar to Horner's method for polynomials:
>>> array = [1, 2, 0, 1, 2]
>>> ranges = (3, 3, 2, 2, 3)
>>> reduce(lambda i, (a, r): i * r + a, zip(array, ranges), 0)
65
Unrolled that's ((((0 * 3 + 1) * 3 + 2) * 2 + 0) * 2 + 1) * 3 + 2 = 65.
This is a little like converting digits from a varying-size number base to a standard integer. In base-10, you could have five digits, each from 0 to 9, and then you would convert them to a single integer via i = a*10000 + b*1000 + c*100 + d*10 + e*1.
Equivalently, for the decimal conversion, you could write i = np.dot([a, b, c, d, e], bases), where bases = [10*10*10*10, 10*10*10, 10*10, 10, 1].
You can do the same thing with your bases, except that your positions introduce multipliers of [3, 3, 2, 2, 3] instead of [10, 10, 10, 10, 10]. So you could set bases = [3*2*2*3, 2*2*3, 2*3, 3, 1] (=[36, 12, 6, 3, 1]) and then use i = np.dot([a, b, c, d, e], bases). Note that this will always give answers in the range of 0 to 107 if a, b, c, d, and e fall in the ranges you specified.
To convert i back into a list of digits, you could use something like this:
digits = []
remainder = i
for base in bases:
digit, remainder = divmod(remainder, base)
digits.append(digit)
On the other hand, to keep your life simple, you are probably better off using Paul Panzer's answer, which pretty much does the same thing. (I never thought of an n-digit number as the coordinates of a cell in an n-dimensional grid before, but it turns out they're mathematically equivalent. And np.ravel is an easy way to assign a serial number to each cell.)
This data is small enough that you may simply enumerate them:
>>> L = [[a,b,c,d,e] for a in range(3) for b in range(3) for c in range(2) for d in range(2) for e in range(3)]
>>> L[0]
[0, 0, 0, 0, 0]
>>> L[107]
[2, 2, 1, 1, 2]
If you need to go the other way (from the array to the integer) make a lookup dict for it so that you will get O(1) instead of O(n):
>>> lookup = {tuple(x): i for i, x in enumerate(L)}
>>> lookup[1,1,1,1,1]
58
getting dot-product of your vectors as following:
In [210]: a1
Out[210]: array([2, 2, 1, 1, 2])
In [211]: a2
Out[211]: array([1, 0, 1, 1, 0])
In [212]: a1.dot(np.power(10, np.arange(5,0,-1)))
Out[212]: 221120
In [213]: a2.dot(np.power(10, np.arange(5,0,-1)))
Out[213]: 101100
should produce 108 unique numbers - use their indices...
If the array lenght is not very huge, you can calculate out the weight first, then use simple math formula to get the ID.
The code will be like:
#Test Case
test1 = [2, 2, 1, 1, 2]
test2 = [0, 2, 1, 1, 2]
test3 = [0, 0, 0, 0, 2]
def getUniqueID(target):
#calculate out the weights first;
#When Index=0; Weight[0]=1;
#When Index>0; Weight[Index] = Weight[Index-1]*(The count of Possible Values for Previous Index);
weight = [1, 3, 9, 18, 36]
return target[0]*weight[0] + target[1]*weight[1] + target[2]*weight[2] + target[3]*weight[3] + target[4]*weight[4]
print 'Test Case 1:', getUniqueID(test1)
print 'Test Case 2:', getUniqueID(test2)
print 'Test Case 3:', getUniqueID(test3)
#Output
#Test Case 1: 107
#Test Case 2: 105
#Test Case 3: 72
#[Finished in 0.335s]

Way of easily finding the average of every nth element over a window of size k in a pandas.Series? (not the rolling mean)

The motivation here is to take a time series and get the average activity throughout a sub-period (day, week).
It is possible to reshape an array and take the mean over the y axis to achieve this, similar to this answer (but using axis=2):
Averaging over every n elements of a numpy array
but I'm looking for something which can handle arrays of length N%k != 0 and does not solve the issue by reshaping and padding with ones or zeros (e.g numpy.resize), i.e takes the average over the existing data only.
E.g Start with a sequence [2,2,3,2,2,3,2,2,3,6] of length N=10 which is not divisible by k=3. What I want is to take the average over columns of a reshaped array with mis-matched dimensions:
In: [[2,2,3],
[2,2,3],
[2,2,3],
[6]], k =3
Out: [3,2,3]
Instead of:
In: [[2,2,3],
[2,2,3],
[2,2,3],
[6,0,0]], k =3
Out: [3,1.5,2.25]
Thank you.
You can use a masked array to pad with special values that are ignored when finding the mean, instead of summing.
k = 3
# how long the array needs to be to be divisible by 3
padded_len = (len(in_arr) + (k - 1)) // k * k
# create a np.ma.MaskedArray with padded entries masked
padded = np.ma.empty(padded_len)
padded[:len(in_arr)] = in_arr
padded[len(in_arr):] = np.ma.masked
# now we can treat it an array divisible by k:
mean = padded.reshape((-1, k)).mean(axis=0)
# if you need to remove the masked-ness
assert not np.ma.is_masked(mean), "in_arr was too short to calculate all means"
mean = mean.data
You can easily do it by padding, reshaping and calculating by how many elements to divide each row:
>>> import numpy as np
>>> a = np.array([2,2,3,2,2,3,2,2,3,6])
>>> k = 3
Pad data
>>> b = np.pad(a, (0, k - a.size%k), mode='constant').reshape(-1, k)
>>> b
array([[2, 2, 3],
[2, 2, 3],
[2, 2, 3],
[6, 0, 0]])
Then create a mask:
>>> c = a.size // k # 3
>>> d = (np.arange(k) + c * k) < a.size # [True, False, False]
The first part of d will create an array that contains [9, 10, 11], and compare it to the size of a (10), generating the mentioned boolean mask.
And divide it:
>>> b.sum(0) / (c + 1.0 * d)
array([ 3., 2., 3.])
The above will divide the first column by 4 (c + 1 * True) and the rest by 3. This is vectorized numpy, thus, it scales very well to large arrays.
Everything can be written shorter, I just show all the steps to make it more clear.
Flatten the list In by unpacking and chaining. Create a new list that arranges the flattened list lst by columns, then use the map function to calculate the average of each column:
from itertools import chain
In = [[2, 2, 3], [2, 2, 3], [2, 2, 3], [6]]
lst = chain(*In)
k = 3
In_by_cols = [lst[i::k] for i in range(k)]
# [[2, 2, 2, 6], [2, 2, 2], [3, 3, 3]]
Out = map(lambda x: sum(x)/ float(len(x)), In_by_cols)
# [3.0, 2.0, 3.0]
Using float on the length of each sublist will provide a more accurate result on python 2.x as it won't do integer truncation.

Categories