So I found this:
When converting MATLAB code it might be necessary to first reshape a
matrix to a linear sequence, perform some indexing operations and then
reshape back. As reshape (usually) produces views onto the same
storage, it should be possible to do this fairly efficiently.
Note that the scan order used by reshape in Numpy defaults to the 'C'
order, whereas MATLAB uses the Fortran order. If you are simply
converting to a linear sequence and back this doesn't matter. But if
you are converting reshapes from MATLAB code which relies on the scan
order, then this MATLAB code:
z = reshape(x,3,4);
should become
z = x.reshape(3,4,order='F').copy()
in Numpy.
I have a multidimensional 16*2 array called mafs, when I do in MATLAB:
mafs2 = reshape(mafs,[4,4,2])
I get something different than when in python I do:
mafs2 = reshape(mafs,(4,4,2))
or even
mafs2 = mafs.reshape((4,4,2),order='F').copy()
Any help on this? Thank you all.
Example:
MATLAB:
>> mafs = [(1:16)' (17:32)']
mafs =
1 17
2 18
3 19
4 20
5 21
6 22
7 23
8 24
9 25
10 26
11 27
12 28
13 29
14 30
15 31
16 32
>> reshape(mafs,[4 4 2])
ans(:,:,1) =
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
ans(:,:,2) =
17 21 25 29
18 22 26 30
19 23 27 31
20 24 28 32
Python:
>>> import numpy as np
>>> mafs = np.c_[np.arange(1,17), np.arange(17,33)]
>>> mafs.shape
(16, 2)
>>> mafs[:,0]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
>>> mafs[:,1]
array([17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32])
>>> r = np.reshape(mafs, (4,4,2), order="F")
>>> r.shape
(4, 4, 2)
>>> r[:,:,0]
array([[ 1, 5, 9, 13],
[ 2, 6, 10, 14],
[ 3, 7, 11, 15],
[ 4, 8, 12, 16]])
>>> r[:,:,1]
array([[17, 21, 25, 29],
[18, 22, 26, 30],
[19, 23, 27, 31],
[20, 24, 28, 32]])
I was having a similar issue myself, as I am also trying to make the transition from MATLAB to Python. I was finally able to convert a numpy matrix, given in depth, row, col, format to a single sheet of column vectors (per image).
In MATLAB I would have done something like:
output = reshape(imStack,[row*col,depth])
In Python this seems to translate to:
import numpy as np
output=np.transpose(imStack)
output=output.reshape((row*col, depth), order='F')
Related
I want to extract some members from many large numpy arrays. A simple example is
A = np.arange(36).reshape(6,6)
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
I want to extract the members in a shifted windows in each row with minimum stride 2 and maximum stride 4. For example, in the first row, I would like to have
[2, 3, 4] # A[i,i+2:i+4+1] where i == 0
In the second row, I want to have
[9, 10, 11] # A[i,i+2:i+4+1] where i == 1
In the third row, I want to have
[16, 17, 0]
[[2, 3, 4],
[9, 10, 11],
[16 17, 0],
[23, 0, 0]]
I want to know efficient ways to do this. Thanks.
you can extract values in an array by providing a list of indices for each dimension. For example, if you want the second diagonal, you can use arr[np.arange(0, len(arr)-1), np.arange(1, len(arr))]
For your example, I would do smth like what's in the code below. although, I did not account for different strides. If you want to account for strides, you can change the behaviour of the index list creation. If you struggle with adding the stride functionality, comment and I'll edit this answer.
import numpy as np
def slide(arr, window_len = 3, start = 0):
# pad array to get zeros out of bounds
arr_padded = np.zeros((arr.shape[0], arr.shape[1]+window_len-1))
arr_padded[:arr.shape[0], :arr.shape[1]] = arr
# compute the number of window moves
repeats = min(arr.shape[0], arr.shape[1]-start)
# create index lists
idx0 = np.arange(repeats).repeat(window_len)
idx1 = np.concatenate(
[np.arange(start+i, start+i+window_len)
for i in range(repeats)])
return arr_padded[idx0, idx1].reshape(-1, window_len)
A = np.arange(36).reshape(6,6)
print(f'A =\n{A}')
print(f'B =\n{slide(A, start = 2)}')
output:
A =
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
B =
[[ 2. 3. 4.]
[ 9. 10. 11.]
[16. 17. 0.]
[23. 0. 0.]]
I make bins out of my column using pandas' pd.qcut(). I would like to, then apply smoothing by corresponding bin's mean value.
I generate my bins with something like
pd.qcut(col, 3)
For example,
Given the column values [4, 8, 15, 21, 21, 24, 25, 28, 34]
and the generated bins
Bin1 [4, 15]: 4, 8, 15
Bin2 [21, 24]: 21, 21, 24
Bin3 [25, 34]: 25, 28, 34
I would like to replace the values with the following means
Mean of Bin1 (4, 8, 15) = 9
Mean of Bin2 (21, 21, 24) = 22
Mean of Bin3 (25, 28, 34) = 29
Therefore:
Bin1: 9, 9, 9
Bin2: 22, 22, 22
Bin3: 29, 29, 29
making the final dataset: [9, 9, 9, 22, 22, 22, 29, 29, 29]
How can one also add a column with closest bin boundaries?
Bin1: 4, 4, 15
Bin2: 21, 21, 24
Bin3: 25, 25, 34
making the final dataset: [4, 4, 15, 21, 21, 24, 25, 25, 34]
very similar to this question which is for R
It's exactly as you laid out. Using this technique to get nearest
df = pd.DataFrame({"col":[4, 8, 15, 21, 21, 24, 25, 28, 34]})
df2 = df.assign(bin=pd.qcut(df.col, 3),
colbmean=lambda dfa: dfa.groupby("bin").transform("mean"),
colbin=lambda dfa: dfa.apply(lambda r: min([r.bin.left,r.bin.right], key=lambda x: abs(x-r.col)), axis=1))
col
bin
colbmean
colbin
0
4
(3.999, 19.0]
9
3.999
1
8
(3.999, 19.0]
9
3.999
2
15
(3.999, 19.0]
9
19
3
21
(19.0, 24.333]
22
19
4
21
(19.0, 24.333]
22
19
5
24
(19.0, 24.333]
22
24.333
6
25
(24.333, 34.0]
29
24.333
7
28
(24.333, 34.0]
29
24.333
8
34
(24.333, 34.0]
29
34
You'll find below the solution I came up with to answer your problem.
There is still a limitation, pandas.qcut does not return closed intervals, for this matter the results are not exactly the one you described.
import pandas as pd
df = pd.DataFrame({'value': [4, 8, 15, 21, 21, 24, 25, 28, 34]})
df['bin'] = pd.qcut(df['value'], 3)
df = df.join(df.groupby('bin')['value'].mean(), on='bin', rsuffix='_average_in_bin')
df['bin_left'] = df['bin'].apply(lambda x: x.left)
df['bin_right'] = df['bin'].apply(lambda x: x.right)
df['nearest_boundary'] = df.apply(lambda x: x['bin_left'] if abs(x['value'] - x['bin_left']) < abs(x['value'] - x['bin_right']) else x['bin_right'], axis=1)
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(seq[i],end ="\t")
How do I get my output table to look like this?
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
one of many ways is this, you make iterate over the seq list by a step of 6 and print the element between those margins
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(0, len(seq), 6):
print(*seq[i:i+6], sep=' ')
output
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
You probably want to make use of string formatting. Below, f"{seq[i]:<4d}" means "A string of length 4, left-aligned, containing the string representation of seq[i]". If you want to right-align, just remove <.
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(f"{seq[i]:<4d}", end = "")
if not (i+1) % 6:
print("")
print("")
Output:
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
The simplest relevant technique is padding
for i in range(0, len(seq), 6):
print(" ".join[str(k).ljust(2, " ") for k in seq[i: i + 6]]
but string formatting as in Printing Lists as Tabular Data will make is a more sophisticated solution
I have a pandas dataframe like:
I need to style it using a list of lists like:
[[3, 7, 4, 5],
[6, 17, 5, 10, 13, 16],
[7, 22, 6, 17, 19, 12],
[12, 26, 24, 25, 23, 18, 20],
[21, 20, 18, 27, 25]]
If R1 values are in first list color blue, if R2 values are in second list color blue and so on.
In other words color numbers of each column if value is in the correspondent list.
I have tried:
def posclass(val):
color = 'black'
for i in range(5):
if (val in list[i]):
color = 'blue'
return 'color: %s' % color
df.style.applymap(posclass, subset=['R1','R2','R3','R4','R5'])
But this is not working properly applying each list to each column.
The desired result is a dataframe with colored numbers (those that matches in each column with each list).
Try something like this:
df = pd.DataFrame(np.arange(40).reshape(-1,4), columns=[f'R{i}' for i in range(1,5)])
Input df:
R1 R2 R3 R4
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
5 20 21 22 23
6 24 25 26 27
7 28 29 30 31
8 32 33 34 35
9 36 37 38 39
and
list_l = [[3, 7, 4, 5],
[6, 17, 5, 10, 13, 16],
[7, 22, 6, 17, 19, 12],
[12, 26, 24, 25, 23, 18, 20],
[21, 20, 18, 27, 25]]
Then:
def f(x):
colpos = df.columns.get_loc(x.name)
return ['color: blue' if n in list_l[colpos] else '' for n in x]
df.style.apply(f)
Output:
Any one know why I cannot use "|" to concatenate multiple numpy.flatiter object after converting it inot set? I try to look for all display number
from all row 11, all column 1 and section from (2,2) to (3,3) if I use np.concatenate I can get the right answer but after I use "|" I have empty set? or if there is a better way to write it?
import numpy as np
matrix = np.matrix(np.arange(36).reshape(6, 6))
rnum = matrix[1, :].flat
cnum = matrix[:, 1].flat
snum = matrix[2:4, 2:4].flat
print(matrix)
print(rnum)
print(set(rnum))
print(set(cnum))
print(set(snum))
print(set(np.concatenate((rnum, cnum, snum))))
print(set(rnum) | set(cnum) | set(snum))
#[[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]
# [24 25 26 27 28 29]
# [30 31 32 33 34 35]]
#<numpy.flatiter object at 0x7faf52966c00>
#{6, 7, 8, 9, 10, 11}
#{1, 7, 13, 19, 25, 31}
#{20, 21, 14, 15}
#{1, 6, 7, 8, 9, 10, 11, 13, 14, 15, 19, 20, 21, 25, 31} => expect result
#set() => why?
The first call of set(rnum) in print(set(rnum)) consumes the iterator rnum. When you use set(rnum) again in set(rnum) | set(cnum) | set(snum), there are no more values left in the iterator rnum, so set(rnum) is the empty set.
Here's a more direct demonstration:
In [621]: matrix = np.matrix(np.arange(36).reshape(6, 6))
In [622]: rnum = matrix[1, :].flat
In [623]: set(rnum)
Out[623]: {6, 7, 8, 9, 10, 11}
In [624]: set(rnum)
Out[624]: set()
Instead of using rnum, you could create another iterator by repeating matrix[1, :].flat:
In [625]: set(matrix[1, :].flat)
Out[625]: {6, 7, 8, 9, 10, 11}
Alternatively, skip the use of numpy.matrix and iterators, and just index into a regular NumPy array:
In [639]: a = np.arange(36).reshape(6, 6)
In [640]: set(a[1,:])
Out[640]: {6, 7, 8, 9, 10, 11}
In [641]: set(a[:,1])
Out[641]: {1, 7, 13, 19, 25, 31}
In [642]: set(a[2:4, 2:4].ravel())
Out[642]: {20, 21, 14, 15}