Given a numpy 3-d array
[[[1][4]][[7][10]]]
let's say the first row is 1 4 and the second row is 7 10. If I have a multiplier of 3, the first through third rows would become 1 1 1 4 4 4 and the 4th through 6th rows would become 7 7 7 10 10 10, that is:
[[[1][1][1][4][4][4]][[1][1][1][4][4][4]][[1][1][1][4][4][4]][[7][7][7][10][10][10]][[7][7][7][10][10][10]][[7][7][7][10][10][10]]]
Is there a quick way to do this in numpy? The actual array I'm using has 3 or 4 elements instead of 1 at the bottom level so [1][1][1] could be [1,8,7][1,8,7][1,8,7] instead, but I simplified it here.
numpy.repeat sounds like what you want.
Here are some examples:
>>> a = numpy.array( [[[1,2,3],[4,5,6]], [[10,20,30],[40,50,60]]] )
>>> a
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[10, 20, 30],
[40, 50, 60]]])
>>>
>>> a.repeat( 3, axis=0 )
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 1, 2, 3],
[ 4, 5, 6]],
[[10, 20, 30],
[40, 50, 60]],
[[10, 20, 30],
[40, 50, 60]],
[[10, 20, 30],
[40, 50, 60]]])
>>>
>>> a.repeat( 3, axis=1 )
array([[[ 1, 2, 3],
[ 1, 2, 3],
[ 1, 2, 3],
[ 4, 5, 6],
[ 4, 5, 6],
[ 4, 5, 6]],
[[10, 20, 30],
[10, 20, 30],
[10, 20, 30],
[40, 50, 60],
[40, 50, 60],
[40, 50, 60]]])
>>>
>>> a.repeat( 3, axis=2 )
array([[[ 1, 1, 1, 2, 2, 2, 3, 3, 3],
[ 4, 4, 4, 5, 5, 5, 6, 6, 6]],
[[10, 10, 10, 20, 20, 20, 30, 30, 30],
[40, 40, 40, 50, 50, 50, 60, 60, 60]]])
Depending on the desired shape of your output, you may wish to chain multiple calls to .repeat() with different axis values.
Related
I have an mxn A matrix and an nxr B matrix that I want to multiply in a specific way to get an mxr matrix. I want to multiply every element in the ith column of A as a scalar to the ith row of B and the sum the n matrices
For example
a = [[0, 1, 2],
[3, 4, 5],
b = [[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]]
The product would be
a*b = [[0, 0, 0, 0], + [[4, 5, 6, 7], + [[16, 18, 20, 22], = [[20, 23, 26, 29],
[0, 3, 6, 9]] [16, 20, 24, 28]] [40, 45, 50, 55]] [56, 68, 80, 92]]
I can't use any loops so I'm pretty sure I have to use broadcasting but I don't know how. Any help is appreciated
Your input matrices are of shape (2, 3) and (3, 4) respectively and the result you want is of shape (2, 4).
What you need is just a dot product of your two matrices as
a = np.array([[0, 1, 2],
[3, 4, 5]])
b = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
print (np.dot(a,b))
# array([[20, 23, 26, 29],
# [56, 68, 80, 92]])
I have a bunch of matrices that I stored in a big dataframe. Let's say here is my dataframe.
data = pd.DataFrame([[13, 1, 3, 4, 0, 0], [0, 2, 6, 2, 0, 0], [3, 1, 5, 2, 2, 0], [0, 0, 10, 11, 6, 0], [5, 5, 21, 25, 41, 0],
[11, 1, 3, 2, 0, 1], [3, 1, 7, 3, 1, 1], [1, 1, 6, 5, 3, 1], [1, 1, 6, 7, 6, 1], [6, 6, 21, 24, 42, 1],
[17, 1, 7, 0, 0, 2], [1, 1, 6, 1, 1, 2], [2, 4, 6, 2, 1, 2], [0, 2, 11, 7, 8, 2], [5, 6, 17, 16, 46, 2],
[11, 1, 10, 2, 1, 3], [2, 2, 7, 1, 1, 3], [0, 0, 14, 4, 1, 3], [0, 0, 7, 7, 5, 3], [5, 1, 20, 18, 48, 3],
[16, 3, 7, 1, 2, 4], [1, 2, 4, 1, 0, 4], [2, 4, 7, 5, 3, 4], [3, 0, 4, 4, 7, 4], [7, 2, 13, 12, 58, 4]],
columns=['1', '2', '3', '4', '5', 'iteration'])
print(pd.DataFrame(data))
Each data['iteration'] is a matrix on its own. So, as you can see there are 5 matrices here (iteration-0 to 4). I want to add them all, like in basic matrix addition, to get one single matrix.
I have tried the following, but there's something wrong with it. It doesn't work.
matrix = data[['1','2','3','4','5']]
print(np.sum([matrix[matrix_list['iteration']==i] for i in range(0,9)], axis=0))
How do I do this the right way?
You can use:
In [98]: d = data.set_index('iteration')
In [99]: np.sum(d.loc[i].values for i in d.index.drop_duplicates().values)
Out[99]:
array([[ 68, 7, 30, 9, 3],
[ 7, 8, 30, 8, 3],
[ 8, 10, 38, 18, 10],
[ 4, 3, 38, 36, 32],
[ 28, 20, 92, 95, 235]])
Or alternatively, use groupby():
np.sum(e[1].iloc[:, :-1].values for e in data.groupby('iteration'))
array([[ 68, 7, 30, 9, 3],
[ 7, 8, 30, 8, 3],
[ 8, 10, 38, 18, 10],
[ 4, 3, 38, 36, 32],
[ 28, 20, 92, 95, 235]])
I have three Pandas columns where element are list. For combining these lists, I can do by explicitly write the name of column and + them together
df = pd.DataFrame({'allmz':([[1,2,3],[2,4,5],[2,5,5],[2,3,5],[1,4,5]]),'allint':([[11,31,31],[21,41,51],[41,51,51],[11,31,51],[1,51,11]]), 'allx':([[6,7,3],[2,4,5],[2,5,5],[2,9,5],[3,4,5]])})
df['new'] = df['allmz'] + df['allint'] + df['allint']
print df
allint allmz allx new
0 [11, 31, 31] [1, 2, 3] [6, 7, 3] [1, 2, 3, 11, 31, 31, 11, 31, 31]
1 [21, 41, 51] [2, 4, 5] [2, 4, 5] [2, 4, 5, 21, 41, 51, 21, 41, 51]
2 [41, 51, 51] [2, 5, 5] [2, 5, 5] [2, 5, 5, 41, 51, 51, 41, 51, 51]
3 [11, 31, 51] [2, 3, 5] [2, 9, 5] [2, 3, 5, 11, 31, 51, 11, 31, 51]
4 [1, 51, 11] [1, 4, 5] [3, 4, 5] [1, 4, 5, 1, 51, 11, 1, 51, 11]
However, if I have too many column names to write each of them, is there a way to do it by looping (or not looping) the list of column name:
columns = ['allmz','allint','allx'] instead?
Option 1
Slice on the columns and call sum along the first axis.
df['new'] = df[['allmz','allint','allx']].sum(axis=1)
df
allint allmz allx new
0 [11, 31, 31] [1, 2, 3] [6, 7, 3] [1, 2, 3, 11, 31, 31, 6, 7, 3]
1 [21, 41, 51] [2, 4, 5] [2, 4, 5] [2, 4, 5, 21, 41, 51, 2, 4, 5]
2 [41, 51, 51] [2, 5, 5] [2, 5, 5] [2, 5, 5, 41, 51, 51, 2, 5, 5]
3 [11, 31, 51] [2, 3, 5] [2, 9, 5] [2, 3, 5, 11, 31, 51, 2, 9, 5]
4 [1, 51, 11] [1, 4, 5] [3, 4, 5] [1, 4, 5, 1, 51, 11, 3, 4, 5]
Option 2
Another option with np.concatenate:
v = df[['allmz','allint','allx']].values.tolist()
df['new'] = np.concatenate(v, axis=0).reshape(len(df), -1).tolist()
df
allint allmz allx new
0 [11, 31, 31] [1, 2, 3] [6, 7, 3] [1, 2, 3, 11, 31, 31, 6, 7, 3]
1 [21, 41, 51] [2, 4, 5] [2, 4, 5] [2, 4, 5, 21, 41, 51, 2, 4, 5]
2 [41, 51, 51] [2, 5, 5] [2, 5, 5] [2, 5, 5, 41, 51, 51, 2, 5, 5]
3 [11, 31, 51] [2, 3, 5] [2, 9, 5] [2, 3, 5, 11, 31, 51, 2, 9, 5]
4 [1, 51, 11] [1, 4, 5] [3, 4, 5] [1, 4, 5, 1, 51, 11, 3, 4, 5]
You can use Python's builtin sum function.
df['new'] = sum([df[col] for col in df], [])
If you are having a large set of column's name then an easy way to solve this problem is shown below :
col = df.loc[: , "allint":"allx"]
where "allint" is the start column name and "allx" is the end column name
df['new'] = col.sum(axis=1)
df
This will give you the same result you got after writing the name of each columns.
>>> a = array([[10, 50, 20, 30, 40],
... [50, 30, 40, 20, 10],
... [30, 20, 20, 10, 50]])
>>> some_np_expression(a)
array([[1, 3, 1, 3, 2],
[3, 2, 3, 2, 1],
[2, 1, 2, 1, 3]])
What is some_np_expression? Don't care about how ties are settled so long as the ranks are distinct and sequential.
Double argsort is a standard (but inefficient!) way to do this:
In [120]: a
Out[120]:
array([[10, 50, 20, 30, 40],
[50, 30, 40, 20, 10],
[30, 20, 20, 10, 50]])
In [121]: a.argsort(axis=0).argsort(axis=0) + 1
Out[121]:
array([[1, 3, 1, 3, 2],
[3, 2, 3, 2, 1],
[2, 1, 2, 1, 3]])
With some more code, you can avoid sorting twice. Note that I'm using a different a in the following:
In [262]: a
Out[262]:
array([[30, 30, 10, 10],
[10, 20, 20, 30],
[20, 10, 30, 20]])
Call argsort once:
In [263]: s = a.argsort(axis=0)
Use s to construct the array of rankings:
In [264]: i = np.arange(a.shape[0]).reshape(-1, 1)
In [265]: j = np.arange(a.shape[1])
In [266]: ranked = np.empty_like(a, dtype=int)
In [267]: ranked[s, j] = i + 1
In [268]: ranked
Out[268]:
array([[3, 3, 1, 1],
[1, 2, 2, 3],
[2, 1, 3, 2]])
Here's the less efficient (but more concise) version:
In [269]: a.argsort(axis=0).argsort(axis=0) + 1
Out[269]:
array([[3, 3, 1, 1],
[1, 2, 2, 3],
[2, 1, 3, 2]])
Now Scipy offers a function to rank data with an axis argument - you can set along what axis you want to rank the data.
from scipy.stats.mstats import rankdata
a = array([[10, 50, 20, 30, 40],
[50, 30, 40, 20, 10],
[30, 20, 20, 10, 50]])
ranked_vertical = rankdata(a, axis=0)
from scipy.stats.mstats import rankdata
import numpy as np
a = np.array([[10, 50, 20, 30, 40],
[50, 30, 40, 20, 10],
[30, 20, 20, 10, 50]])
rank = (rankdata(a, axis=0)-1).astype(int)
The output will be as follows.
array([[0, 2, 0, 2, 1],
[2, 1, 2, 1, 0],
[1, 0, 0, 0, 2]])
I tested the following python code on Spyder IDE. Thinking it would output 2d array q as increasing number as 0..31 from q[0][0] to q[3][7]. But it actually returns q as:
[[24, 25, 26, 27, 28, 29, 30, 31], [24, 25, 26, 27, 28, 29, 30, 31], [24, 25, 26, 27, 28, 29, 30, 31], [24, 25, 26, 27, 28, 29, 30, 31]].
The code:
q=[[0]*8]*4
for i in range(4):
for j in range(8):
q[i][j] = 8*i+j
print q
Any idea of what's happening here? I debugged step by step. It shows the updates of every row will sync with all other rows, quite different from my experience of other programing languages.
q=[somelist]*4
creates a list with four identical items, the list somelist. So, for example, q[0] and q[1] reference the same object.
Thus, in the nested for loop q[i] is referencing the same list regardless of the value of i.
To fix:
q = [[0]*8 for _ in range(4)]
The list comprehension evaluates [0]*8 4 distinct times, resulting in 4 distinct lists.
Here is a quick demonstration of this pitfall:
In [14]: q=[[0]*8]*4
You might think you are updating only the first element in the second row:
In [15]: q[1][0] = 100
But you really end up altering the first element in every row:
In [16]: q
Out[16]:
[[100, 0, 0, 0, 0, 0, 0, 0],
[100, 0, 0, 0, 0, 0, 0, 0],
[100, 0, 0, 0, 0, 0, 0, 0],
[100, 0, 0, 0, 0, 0, 0, 0]]
As explained the problem is caused due to * operation on lists, which create more references to the same object. What you should do is to use append:
q=[]
for i in range(4):
q.append([])
for j in range(8):
q[i].append(8*i+j)
print q
[[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31]]
When you do something like l = [x]*8 you are actually creating 8 references to the same list, not 8 copies.
To actually get 8 copies, you have to use l = [[x] for i in xrange(8)]
>>> x=[1,2,3]
>>> l=[x]*8
>>> l
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
>>> l[0][0]=10
>>> l
[[10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3]]
>>> l = [ [x] for i in xrange(8)]
>>> l
[[[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]]]
>>> l[0][0] = 1
>>> l
[[1], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]]]