Translate reshape from Matlab to Python - python

I'm using numpy and I don't know how translate this MATLAB code to python:
C = reshape(A(B.',:).', 6, []).';
I think that the only right thing that I did is:
temp=A[B.transpose(),:]
but I don't know how translate all of the rows.
example of matrix:
A =
1 2
1 3
1 4
1 5
1 6
2 3
2 4
2 5
2 6
B =
1 2 3
1 2 4
1 2 5
1 2 6
1 2 7
1 2 8
1 2 9
C =
1 2 1 3 1 4
1 2 1 3 1 5
1 2 1 3 1 6
1 2 1 3 2 3
1 2 1 3 2 4
1 2 1 3 2 5
1 2 1 3 2 6

This looks like an indexing plus reshaping operation; one thing to keep in mind is that numpy is zero-indexed, while matlab is one-indexed. That means you need to index A with B - 1, and then reshape your result as desired. For example:
import numpy as np
A = np.array([[1, 2],
[1, 3],
[1, 4],
[1, 5],
[1, 6],
[2, 3],
[2, 4],
[2, 5],
[2, 6]])
B = np.array([[1, 2, 3],
[1, 2, 4],
[1, 2, 5],
[1, 2, 6],
[1, 2, 7],
[1, 2, 8],
[1, 2, 9]])
C = A[B - 1].reshape(B.shape[0], -1)
The result is:
>>> C
array([[1, 2, 1, 3, 1, 4],
[1, 2, 1, 3, 1, 5],
[1, 2, 1, 3, 1, 6],
[1, 2, 1, 3, 2, 3],
[1, 2, 1, 3, 2, 4],
[1, 2, 1, 3, 2, 5],
[1, 2, 1, 3, 2, 6]])
One potentially confusing piece: the -1 in the reshape method is a marker that indicates numpy should calculate the appropriate dimension to preserve the size of the array.

Related

Using numpy to select rows based on a condition of one column

I have a file with various columns. Say
1 2 3 4 5 6
2 4 5 6 7 4
3 4 5 6 7 6
2 0 1 5 6 0
2 4 6 8 9 9
I would like to select and save out rows (in each column) in a new file which have the values in column two in the range [0 - 2].
The answer in the new file should be
1 2 3 4 5 6
2 0 1 5 6 0
Kindly assist me. I prefer doing this with numpy in python.
For array a, you can use:
a[(a[:,1] <= 2) & (a[:,1] >= 0)]
Here, the condition filters the values in your second column.
For your example:
>>> a
array([[1, 2, 3, 4, 5, 6],
[2, 4, 5, 6, 7, 4],
[3, 4, 5, 6, 7, 6],
[2, 0, 1, 5, 6, 0],
[2, 4, 6, 8, 9, 9]])
>>> a[(a[:,1] <= 2) & (a[:,1] >= 0)]
array([[1, 2, 3, 4, 5, 6],
[2, 0, 1, 5, 6, 0]])

Python dataframe repeat column data in each cell as a list

I am trying to repeat the whole data in a column in each each cell of the column.
My code:
df3=pd.DataFrame({
'x':[1,2,3,4,5],
'y':[10,20,30,20,10],
'z':[5,4,3,2,1]
})
df3 =
x y z
0 1 10 5
1 2 20 4
2 3 30 3
3 4 20 2
4 5 10 1
df3['z'] = df['z'].agg(lambda x: list(x))
Present output:
KeyError: 'z'
Expected output:
df=
x y z
0 1 10 [5, 4, 3, 2, 1]
1 2 20 [5, 4, 3, 2, 1]
2 3 30 [5, 4, 3, 2, 1]
3 4 20 [5, 4, 3, 2, 1]
4 5 10 [5, 4, 3, 2, 1]
Another way is to list(df.column.values)
df3.assign(z=[list(df3.z.values)]*len(df3))
x y z
0 5 10 [5, 4, 3, 2, 1]
1 4 20 [5, 4, 3, 2, 1]
2 3 30 [5, 4, 3, 2, 1]
3 2 20 [5, 4, 3, 2, 1]
4 1 10 [5, 4, 3, 2, 1]
Check with
df3['new_z']=[df3.z.tolist()]*len(df3)
More safe
df3['new_z']=[df3.z.tolist() for x in df.index]

reshaping 2-d array using specific block shape [duplicate]

This question already has answers here:
Flatten or group array in blocks of columns - NumPy / Python
(6 answers)
Closed 3 years ago.
I've got problem with reshaping simple 2-d array into another.
Let`s assume matrix :
[[4 1 2 1 2 4 1 2 4]
[2 3 0 3 0 2 3 0 2]
[5 5 1 5 1 5 5 1 5]
[6 6 6 6 6 6 6 6 6]]
What I want to do is to reshape it to (12, 3) matrix, but using (4, 3) block. What I meant to do is to get matrix like:
[[4 1 2
2 3 0
5 5 1
6 6 6
1 2 4
3 0 2
5 1 5
6 6 6
1 2 4
3 0 2
5 1 5
6 6 6]]
I have highlighted the "egde" of cutting this matrix by additional newline.
I`ve tried numpy reshape (with all available order parameter value), but still I get array with "mixed" values.
You can always do this manually for custom reshapes:
import numpy as np
data = [[4, 1, 2, 1, 2, 4, 1, 2, 4],
[2, 3, 0, 3, 0, 2, 3, 0, 2],
[5, 5, 1, 5, 1, 5, 5, 1, 5],
[6, 6, 6, 6, 6, 6, 6, 6, 6]]
X = np.array(data)
Z = np.r_[X[:, 0:3], X[:, 3:6], X[:, 6:9]]
print(Z)
yields
array([[4, 1, 2],
[2, 3, 0],
[5, 5, 1],
[6, 6, 6],
[1, 2, 4],
[3, 0, 2],
[5, 1, 5],
[6, 6, 6],
[1, 2, 4],
[3, 0, 2],
[5, 1, 5],
[6, 6, 6]])
note the special np.r_ operator that concatenates arrays on rows (first axis). It is just a handy alias for np.concatenate.

python pandas data-frame - duplicate rows according to a column value

I want to duplicate the rows of dataframe "this" according to 2 column values and save them as a new dataframe named "newThis":
this = pd.DataFrame(columns=['a','b','c'], index=[1,2,3])
this.a = [1, 2, 0]
this.b = [5, 0, 4]
this.c = [2, 3, 2]
newThis = []
for i in range(len(this)):
if int(this.iloc[i, 1]) != 0:
that = np.array([this.iloc[i,:]] * int(this.iloc[i, 1]))
elif int(this.iloc[i, 1]) == 0:
that = np.array([this.iloc[i,:]])
if int(this.iloc[i, 2]) != 0:
those = np.array([this.iloc[i,:]] * int(this.iloc[i, 2]))
elif int(this.iloc[i, 2]) == 0:
those = np.array([this.iloc[i,:]])
newThis.append(that)
newThis.append(those)
I want one big array of concatenated rows, but Instead I get this mess:
[array([[1, 5, 2],
[1, 5, 2],
[1, 5, 2],
[1, 5, 2],
[1, 5, 2]], dtype=int64), array([[1, 5, 2],
[1, 5, 2]], dtype=int64), array([[2, 0, 3]], dtype=int64), array([[2, 0, 3],
[2, 0, 3],
[2, 0, 3]], dtype=int64), array([[0, 4, 2],
[0, 4, 2],
[0, 4, 2],
[0, 4, 2]], dtype=int64), array([[0, 4, 2],
[0, 4, 2]], dtype=int64)]
Thanks
IIUC:
Source DF:
In [213]: this
Out[213]:
a b c
1 1 5 2
2 2 0 3
3 0 4 2
Solution:
In [211]: newThis = pd.DataFrame(np.repeat(this.values,
this['b'].replace(0,1).tolist(),
axis=0),
columns=this.columns)
In [212]: newThis
Out[212]:
a b c
0 1 5 2
1 1 5 2
2 1 5 2
3 1 5 2
4 1 5 2
5 2 0 3
6 0 4 2
7 0 4 2
8 0 4 2
9 0 4 2
It looks like you're confusing multiplying an np.array with a list.
Remember:
[np.int32(1)] * 2 == [np.int32(1), np.int32(1)]
But:
np.array([1]) * 2 == np.array([2])
You probably need to change this:
np.array([this.iloc[i,:]] * int(this.iloc[i, 1]))
to this:
np.array([this.iloc[i,:]]) * int(this.iloc[i, 1])

pandas equivalent to R series of multiple repeated numbers

I want to create a simple vector of many repeated values. This is easy in R:
> numbers <- c(rep(1,5), rep(2,4), rep(3,3))
> numbers
[1] 1 1 1 1 1 2 2 2 2 3 3 3
However, if I try to do this in Python using pandas and numpy, I don't quite get the same thing:
numbers = pd.Series([np.repeat(1,5), np.repeat(2,4), np.repeat(3,3)])
numbers
0 [1, 1, 1, 1, 1]
1 [2, 2, 2, 2]
2 [3, 3, 3]
dtype: object
What's the R equivalent in Python?
Just adjust how you use np.repeat
np.repeat([1, 2, 3], [5, 4, 3])
array([1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3])
Or with pd.Series
pd.Series(np.repeat([1, 2, 3], [5, 4, 3]))
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 2
9 3
10 3
11 3
dtype: int64
That said, the purest form to replicate what you've done in R is to use np.concatenate in conjunction with np.repeat. It just isn't what I'd recommend doing.
np.concatenate([np.repeat(1,5), np.repeat(2,4), np.repeat(3,3)])
array([1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3])
Now you can use the same syntax in python:
>>> from datar.base import c, rep
>>>
>>> numbers = c(rep(1,5), rep(2,4), rep(3,3))
>>> print(numbers)
[1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3]
I am the author of the datar package. Feel free to submit issues if you have any questions.

Categories