Simple python code about double loop - python

I tested the following python code on Spyder IDE. Thinking it would output 2d array q as increasing number as 0..31 from q[0][0] to q[3][7]. But it actually returns q as:
[[24, 25, 26, 27, 28, 29, 30, 31], [24, 25, 26, 27, 28, 29, 30, 31], [24, 25, 26, 27, 28, 29, 30, 31], [24, 25, 26, 27, 28, 29, 30, 31]].
The code:
q=[[0]*8]*4
for i in range(4):
for j in range(8):
q[i][j] = 8*i+j
print q
Any idea of what's happening here? I debugged step by step. It shows the updates of every row will sync with all other rows, quite different from my experience of other programing languages.

q=[somelist]*4
creates a list with four identical items, the list somelist. So, for example, q[0] and q[1] reference the same object.
Thus, in the nested for loop q[i] is referencing the same list regardless of the value of i.
To fix:
q = [[0]*8 for _ in range(4)]
The list comprehension evaluates [0]*8 4 distinct times, resulting in 4 distinct lists.
Here is a quick demonstration of this pitfall:
In [14]: q=[[0]*8]*4
You might think you are updating only the first element in the second row:
In [15]: q[1][0] = 100
But you really end up altering the first element in every row:
In [16]: q
Out[16]:
[[100, 0, 0, 0, 0, 0, 0, 0],
[100, 0, 0, 0, 0, 0, 0, 0],
[100, 0, 0, 0, 0, 0, 0, 0],
[100, 0, 0, 0, 0, 0, 0, 0]]

As explained the problem is caused due to * operation on lists, which create more references to the same object. What you should do is to use append:
q=[]
for i in range(4):
q.append([])
for j in range(8):
q[i].append(8*i+j)
print q
[[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31]]

When you do something like l = [x]*8 you are actually creating 8 references to the same list, not 8 copies.
To actually get 8 copies, you have to use l = [[x] for i in xrange(8)]
>>> x=[1,2,3]
>>> l=[x]*8
>>> l
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
>>> l[0][0]=10
>>> l
[[10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3]]
>>> l = [ [x] for i in xrange(8)]
>>> l
[[[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]]]
>>> l[0][0] = 1
>>> l
[[1], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]]]

Related

Select non-consecutive row and column indices from 2d numpy array

I have an array a
a = np.arange(5*5).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
and want to select the last two columns from row one and two, and the first two columns of row three and four.
The result should look like this
array([[3, 4, 10, 11],
[8, 9, 15, 16]])
How to do that in one go without indexing twice and concatenation?
I tried using take
a.take([[0,1,2,3], [3,4,0,1]])
array([[0, 1, 2, 3],
[3, 4, 0, 1]])
ix_
a[np.ix_([0,1,2,3], [3,4,0,1])]
array([[ 3, 4, 0, 1],
[ 8, 9, 5, 6],
[13, 14, 10, 11],
[18, 19, 15, 16]])
and r_
a[np.r_[0:2, 2:4], np.r_[3:5, 0:2]]
array([ 3, 9, 10, 16])
and a combination of ix_ and r_
a[np.ix_([0,1,2,3], np.r_[3:4, 0:1])]
array([[ 3, 0],
[ 8, 5],
[13, 10],
[18, 15]])
Using integer advanced indexing, you can do something like this
index_rows = np.array([
[0, 0, 2, 2],
[1, 1, 3, 3],
])
index_cols = np.array([
[-2, -1, 0, 1],
[-2, -1, 0, 1],
])
a[index_rows, index_cols]
where you just select directly what elements you want.

Merge columns in rows in a adapted Dataframe that fulfill a condition, while deleting the rows

Background Information
This question is closely related to my previous question. Unfortunately while making up an general example it was not specific enough to be applied to my personal problem. That is why this question is more specific.
Example - Code Snippet
import pandas as pd
import numpy as np
inp = [{'ID_Code':1,'information 1':[10,22,44],'information 2':[1,0,1]},
{'ID_Code':2,'information 1':[400,323],'information 2':[1,1]},
{'ID_Code':2,'information 1':[243],'information 2':[0]},
{'ID_Code':2,'information 1':[333,555],'information 2':[0]},
{'ID_Code':3,'information 1':[12,27,43,54],'information 2':[1,0,1,1]},
{'ID_Code':3,'information 1':[31,42,13,14],'information 2':[1,0,0,0]},
{'ID_Code':3,'information 1':[14,24,34,14],'information 2':[1,0,1,1]},
{'ID_Code':4,'information 1':[15,25,33,44],'information 2':[0,0,0,1]},
{'ID_Code':5,'information 1':[12,12,13,14],'information 2':[1,1,1,0]},
{'ID_Code':5,'information 1':[12,12,13,24],'information 2':[1,0,1,1]},
{'ID_Code':5,'information 1':[21,22,23,14],'information 2':[1,1,1,1]},
{'ID_Code':6,'information 1':[10,12,23,4],'information 2':[1,0,1,0]},
{'ID_Code':7,'information 1':[112,212,143,124],'information 2':[0,0,0,0]},
{'ID_Code':7,'information 1':[211,321],'information 2':[1]},
{'ID_Code':7,'information 1':[431],'information 2':[1,0]},
{'ID_Code':8,'information 1':[1,2,3,4],'information 2':[1,0,0,1]}]
df = pd.DataFrame(inp)
df1=df.groupby("ID_Code")["information 1"].apply(list).to_frame()
df2=df.groupby("ID_Code")["information 2"].apply(list).to_frame()
df3=pd.concat([df1, df2],axis=1, sort=False)
The Output
ID_Code information 1 information 2
1 [[10, 22, 44]] [[1, 0, 1]]
2 [[400, 323], [243], [333, 555]] [[1, 1], [0], [0]]
3 [[12, 27, 43, 54], [31, 42, 13, 14], [14, 24, 34, 14]] [[1, 0, 1, 1], [1, 0, 0, 0], [1, 0, 1, 1]]
4 [[15, 25, 33, 44]] [[0, 0, 0, 1]]
5 [[12, 12, 13, 14], [12, 12, 13, 24], [21, 22, 23, 14]] [[1, 1, 1, 0], [1, 0, 1, 1], [1, 1, 1, 1]]
6 [[10, 12, 23, 4]] [[1, 0, 1, 0]]
7 [[112, 212, 143, 124], [211, 321], [431]] [[0, 0, 0, 0], [1], [1, 0]]
8 [[1, 2, 3, 4]] [[1, 0, 0, 1]]
Where ID_Code is no longer a column but the index. Which is the problem that I hadn't specified in my previous question.
The Task
With the given Dataframe "df3", to get rid of ID_Code = 1 and store its information in ID_Code = 3, and get rid of ID_Code = 5 and ID_Code = 7 and store that information in ID_Code = 2, so that the DataFrame looks like this:
ID_Code information 1 information 2
2 [[400, 323], [243], [333, 555], [12, 12, 13, 14], [12, 12, 13, 24], [21, 22, 23, 14], [112, 212, 143, 124], [211, 321], [431]] [[1, 1], [0], [0], [1, 1, 1, 0], [1, 0, 1, 1], [1, 1, 1, 1], [0, 0, 0, 0], [1], [1, 0]]
3 [[12, 27, 43, 54], [31, 42, 13, 14], [14, 24, 34, 14], [10, 22, 44]] [[1, 0, 1, 1], [1, 0, 0, 0], [1, 0, 1, 1], [1, 0, 1]]
4 [[15, 25, 33, 44]] [[0, 0, 0, 1]]
6 [[10, 12, 23, 4]] [[1, 0, 1, 0]]
8 [[1, 2, 3, 4]] [[1, 0, 0, 1]]
It would be a huge help, if someone could help me solve this.
It worked for me with the answer given by Datanovice from the previous question with some changes to the indexing.
As the Question states, the problem lies in the 'ID_Code' being an Index rather than a column. So my solution involves adding a column with the unique ID_Code's. For that I found two possible approaches.
Solution 1
Using .unique() in combination with pd.Dataframe() as .unique() returns a numpy.ndarray which has to be converted to a Dataframe again.
df4 = pd.DataFrame(df['ID_Code'].unique(),columns=['ID_Code'],index=df['ID_Code'].unique())
df5 = pd.concat([df4,df3],axis=1)
col = 'ID_Code'
cond = [df5[col].eq(1),
df5[col].isin([5,7])]
outputs = [3,2]
df5[col] = np.select(cond,outputs,default=df5[col])
df6 = df5.groupby(col).sum()
Solution 2
Using .reset_index() to move the ID_Code out of the index into a seperate column.
df3 = df3.reset_index()
col = 'ID_Code'
cond = [df3[col].eq(1),
df3[col].isin([5,7])]
outputs = [3,2]
df3[col] = np.select(cond,outputs,default=df3[col])
df4 = df3.groupby(col).sum()

Select different slices from each numpy row

I have a 3d tensor and I want to select different slices from the dim=2. something like a[[0, 1], :, [slice(2, 4), slice(1, 3)]].
a=np.arange(2*3*5).reshape(2, 3, 5)
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
# then I want something like a[[0, 1], :, [slice(2, 4), slice(1, 3)]]
# that gives me np.stack([a[0, :, 2:4], a[1, :, 1:3]]) without a for loop
array([[[ 2, 3],
[ 7, 8],
[12, 13]],
[[16, 17],
[21, 22],
[26, 27]]])
and I've seen this and it is not what I want.
You can use advanced indexing as explained here. You will have to pass the row ids which are [0, 1] in your case and the column ids 2, 3 and 1, 2. Here 2,3 means [2:4] and 1, 2 means [1:3]
import numpy as np
a=np.arange(2*3*5).reshape(2, 3, 5)
rows = np.array([[0], [1]], dtype=np.intp)
cols = np.array([[2, 3], [1, 2]], dtype=np.intp)
aa = np.stack(a[rows, :, cols]).swapaxes(1, 2)
# array([[[ 2, 3],
# [ 7, 8],
# [12, 13]],
# [[16, 17],
# [21, 22],
# [26, 27]]])
Another equivalent way to avoid swapaxes and getting the result in desired format is
aa = np.stack(a[rows, :, cols], axis=2).T
A third way I figured out is by passing the list of indices. Here [0, 0] will correspond to [2,3] and [1, 1] will correspond to [1, 2]. The swapaxes is just to get your desired format of output
a[[[0,0], [1,1]], :, [[2,3], [1,2]]].swapaxes(1,2)
A solution...
import numpy as np
a = np.arange(2*3*5).reshape(2, 3, 5)
np.array([a[0,:,2:4], a[1,:,1:3]])

Multiplying arrays with broadcasting

I have an mxn A matrix and an nxr B matrix that I want to multiply in a specific way to get an mxr matrix. I want to multiply every element in the ith column of A as a scalar to the ith row of B and the sum the n matrices
For example
a = [[0, 1, 2],
[3, 4, 5],
b = [[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]]
The product would be
a*b = [[0, 0, 0, 0], + [[4, 5, 6, 7], + [[16, 18, 20, 22], = [[20, 23, 26, 29],
[0, 3, 6, 9]] [16, 20, 24, 28]] [40, 45, 50, 55]] [56, 68, 80, 92]]
I can't use any loops so I'm pretty sure I have to use broadcasting but I don't know how. Any help is appreciated
Your input matrices are of shape (2, 3) and (3, 4) respectively and the result you want is of shape (2, 4).
What you need is just a dot product of your two matrices as
a = np.array([[0, 1, 2],
[3, 4, 5]])
b = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
print (np.dot(a,b))
# array([[20, 23, 26, 29],
# [56, 68, 80, 92]])

Rank within columns of 2d array

>>> a = array([[10, 50, 20, 30, 40],
... [50, 30, 40, 20, 10],
... [30, 20, 20, 10, 50]])
>>> some_np_expression(a)
array([[1, 3, 1, 3, 2],
[3, 2, 3, 2, 1],
[2, 1, 2, 1, 3]])
What is some_np_expression? Don't care about how ties are settled so long as the ranks are distinct and sequential.
Double argsort is a standard (but inefficient!) way to do this:
In [120]: a
Out[120]:
array([[10, 50, 20, 30, 40],
[50, 30, 40, 20, 10],
[30, 20, 20, 10, 50]])
In [121]: a.argsort(axis=0).argsort(axis=0) + 1
Out[121]:
array([[1, 3, 1, 3, 2],
[3, 2, 3, 2, 1],
[2, 1, 2, 1, 3]])
With some more code, you can avoid sorting twice. Note that I'm using a different a in the following:
In [262]: a
Out[262]:
array([[30, 30, 10, 10],
[10, 20, 20, 30],
[20, 10, 30, 20]])
Call argsort once:
In [263]: s = a.argsort(axis=0)
Use s to construct the array of rankings:
In [264]: i = np.arange(a.shape[0]).reshape(-1, 1)
In [265]: j = np.arange(a.shape[1])
In [266]: ranked = np.empty_like(a, dtype=int)
In [267]: ranked[s, j] = i + 1
In [268]: ranked
Out[268]:
array([[3, 3, 1, 1],
[1, 2, 2, 3],
[2, 1, 3, 2]])
Here's the less efficient (but more concise) version:
In [269]: a.argsort(axis=0).argsort(axis=0) + 1
Out[269]:
array([[3, 3, 1, 1],
[1, 2, 2, 3],
[2, 1, 3, 2]])
Now Scipy offers a function to rank data with an axis argument - you can set along what axis you want to rank the data.
from scipy.stats.mstats import rankdata
a = array([[10, 50, 20, 30, 40],
[50, 30, 40, 20, 10],
[30, 20, 20, 10, 50]])
ranked_vertical = rankdata(a, axis=0)
from scipy.stats.mstats import rankdata
import numpy as np
a = np.array([[10, 50, 20, 30, 40],
[50, 30, 40, 20, 10],
[30, 20, 20, 10, 50]])
rank = (rankdata(a, axis=0)-1).astype(int)
The output will be as follows.
array([[0, 2, 0, 2, 1],
[2, 1, 2, 1, 0],
[1, 0, 0, 0, 2]])

Categories